## Basic Concept of Chi-Square Test & it's application using R

- This test is used to investigate whether the distributions of categorial variable (nominal) differ from each other or not.
- A chi-square statistics compares the counts of categorical responses between two or more independent groups.
- The Chi-Square Test of Independence can only compare categorical variables. It cannot make comparisons between continuous variables (height, weight, etc.).
- The Chi-Square Test of Independence determines whether there is an association between categorical variables (i.e., whether the variables are independent or related). It is a nonparametric test.
- The hypothesis of Chi-Square Test are -

H1= There is a statistical difference between two groups

- This test is used to test the relation among two different groups like gender & type of food consumed. Also, to test the relation within a group like religion of students in a university(to test whether equal no. of students of each religion study or they are unequal in the sample of study).

Examples

- Relationship of Gender w.r.t. different colours (Red, Green, Blue, Yellow)
- Relationship of Dumb & Intelligent students w.r.t. their study schedule in (Morning, Evening or Night).

Chi Square Test in R

Let us use the Lungcapacity dataset as linked here.

> read.table("/Users/vineetkaushik/Desktop/LungCapData.txt",header=T)->vyom

> head(vyom)

LungCap Age Height Smoke Gender Caesarean

1 6.475 6 62.1 no male no

2 10.125 18 74.7 yes female no

3 9.550 16 69.7 no female yes

4 11.125 14 71.0 no male no

5 4.800 5 56.9 no male no

6 6.225 11 58.7 no female no

> tab<-table(Gender,Smoke) #make contingency table of the groups in study

> tab

Smoke

Gender no yes

female 314 44

male 334 33

> barplot(tab,beside = T,legend.text = T) # to visulize the data

> chisq.test(tab)

Pearson's Chi-squared test with Yates' continuity correction

data: tab

X-squared = 1.7443, df = 1, p-value = 0.1866

so we found p>0.05 therefore we accept null hypothesis that there is no relation between gender & smoking and thus both the groups are independent.

Let us use the Lungcapacity dataset as linked here.

> read.table("/Users/vineetkaushik/Desktop/LungCapData.txt",header=T)->vyom

> head(vyom)

LungCap Age Height Smoke Gender Caesarean

1 6.475 6 62.1 no male no

2 10.125 18 74.7 yes female no

3 9.550 16 69.7 no female yes

4 11.125 14 71.0 no male no

5 4.800 5 56.9 no male no

6 6.225 11 58.7 no female no

> tab<-table(Gender,Smoke) #make contingency table of the groups in study

> tab

Smoke

Gender no yes

female 314 44

male 334 33

> barplot(tab,beside = T,legend.text = T) # to visulize the data

> chisq.test(tab)

Pearson's Chi-squared test with Yates' continuity correction

data: tab

X-squared = 1.7443, df = 1, p-value = 0.1866

so we found p>0.05 therefore we accept null hypothesis that there is no relation between gender & smoking and thus both the groups are independent.