## Non-Parametric Statistical Tests - The story behind & analysis in "R"

- When the data is not normal we use non-parametric way of test. To know the nature of data you can plot the data and check the graph for normality. If graph is not possible then other way is to check the skewness & Kurtosis.
- A normal data has no skewness and it is centered & symmetrical in shape. Kurtosis tell that how much of the data is at tail or center.
- The non-parametric test can be used for all data types like ordinal, nominal & interval, or if the data has outliers.
- Average is the good indicator of the middle of the data as average can be easily influenced by the outliers, so median are the good choice to check the middle of the data. In non-parametric test median are considered for the tests.

Test for Normality(Skewness & Kurtosis):

To see whether the data is Normal or not we use technique like Skewness & Kurtosis.

- For Normal Distributed data the value of skewness is around "0" and the value of Kurtosis for Normal Distributed data is around "3". For details check the link below↴ http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm
- To check skewness & kurtosis in R the package used is "moments" after installing we can put any dataset. Let us create a normal dataset and check it for skewness & kurtosis in R in next bullet point ↴
- > n<-rnorm(110,35,3)

> skewness(n)

[1] -0.1621379 # the value is pretty close to zero

> kurtosis(n)

[1] 3.153562 # the value is pretty close to 3.

Non- Parametric tests

- The Mann Whitney is the non parametric alternative to the two sample T- Test.
- Wilcoxon Signed Rank Test is non parametric alternative to Dependent Sample T-Test(Paired Sample T-Test & Single sample T-Test)
- The Kruskal Willis test is the non parametric alternative to the One way ANOVA
- Friedman's ANOVA test is the non parametric alternative to N-way ANOVA

When to use Non-Parametric Tests.

- When there are outliers because an outlier seriously impacts mean not median. So we should use tests that follow evaluation via median and they are non-parametric tests.
- When sample size are below the guidelines than Non-Parametric are the only option left.
- You have Ordinal (ranked data) or Nominal data.
- If Mean represent the centre of your data use Parametric Test & if Median represents the centre of your data that use Non-Parametric Tests.
- A perfect link discussing when to use use Parametric & Non-Parametric Tests:http://blog.minitab.com/blog/adventures-in-statistics-2/choosing-between-a-nonparametric-test-and-a-parametric-test

When to use Parametric Test

- When the data is continuous following normal distribution curve
- When the data is in Interval or Ratio.

Mann-Whitney U Test

- The purpose of the Mann-Whitney U Test is to test whether the two group on which we are working are significantly different from each other. i.e. to test the null hypothesis.
- It is a non-parametric equivalent of the Independent Sample T-Test with no specifity about population distribution (need not be a normal distribution).
- It is used to test two independent samples drawn from the same or identical distributions.
- The plus with this test is that the two samples under consideration need not have same no. of observations.
- The test has a basic concept that "x" no. of one sample & "y" no. of second sample when arranged in increasing order then they exhibit a pattern that provided information about the relationship between both group.
- If the null hypothesis is that the two sample are identical distributed and on observation of both sample two are found much apart than we reject null hypothesis.
- To know the manual way of performing the test here there are two links below, https://www.analyticsvidhya.com/blog/2017/11/a-guide-to-conduct-analysis-using-non-parametric-tests/

- To watch the manual method in the most lucid way see the video below↴ https://www.youtube.com/watch?v=BT1FKd1Qzjw

Let us use a dataset Lungcapacity as attached.

now we will use R console to evaluate the data

> read.table("/Users/vineetkaushik/Desktop/LungCapData.txt",header = T,sep = "\t")->vyom

> head(vyom) #reading the data in as yom

LungCap Age Height Smoke Gender Caesarean

1 6.475 6 62.1 no male no

2 10.125 18 74.7 yes female no

3 9.550 16 69.7 no female yes

4 11.125 14 71.0 no male no

5 4.800 5 56.9 no male no

6 6.225 11 58.7 no female no

to use the Mann-Whitney U Test in R we will do the following

> wilcox.test(LungCap~Smoke,mu=0, alternative="two.sided",conf.int=T,conf.level=0.95,paired=F) # we are calculating LungCap wrt Smoke, mean here is 0, it a two tail test & paired =F because the data are independent, here paired =F is important to mention cons.interval is optional.

I am here attaching classy videos that clearly define these process & can be used in Research Papers.

https://www.youtube.com/watch?v=KroKhtCD9eE

https://www.youtube.com/watch?v=LuWjx0_-VW0

so it is a way of examining a relationship between a numeric outcome variable (Y) which is continuous in nature (like the length of a person, the capacity of lung in volume , the weight lifted by a someone, money spent, no. of hours of study/ or any data that can be taken in decimal form as well) and a categorical variable (X) with 2 levels (like male or female, smoker or non smoker, lightweight or heavy, etc.) when the groups are independent (means every time different participant should be included, one participant should not be repeated).

Brainstorm:

So whenever in the research problem I will see 2 level thing than I can make a relationship of it with another continuous data.

Also, to find whether one group is greater than other use one sided test (method is discussed in next test)

for examples

**Wilcoxon Signed Rank Test.**- It is a non-parametric test equivalent of dependent T-Test (paired T-Test & one sample T-Test).
- The samples are to be random & independent.
- It is a method appropriate for examining the Median difference in observations for 2 populations that are paired or dependent on one another.
- Mostly used where the data is not normal and we need to find before & after for some observations (i.e. the group remains same). Also, can be used when a group is subjected in two different conditions like the difference in weight of a group in two different lightning conditions.
- Null hypothesis in Wilcoxson Test - The median difference between the pair of observation is zero.
- Alternative Hypothesis - The median difference between the pair of observation is not zero.
- It can be used to asses the difference in marks of students in 1st semester & 2nd semester because the subject remains same (semester became the time frame i.e before & after).
- We can use both tailed to test that there is no. difference in marks in 1st & 2nd semester, we can use lower & upper tail test to find whether the marks increased or decreased.
- Both the groups used in analysis should be ordinal or continuous - interval / Ratio

Wilcoxon Signed Ranked Test in R

Let us use the dataset as linked here

My H˚ = there no difference in the blood-pressure before & after the medicine

H1= there is a difference in the blood pressure before & after the medicne.

so, we will use two sided wilcoxon test for this

> wilcox.test(Before,After,mu = 0,alternative = "two.sided",paired = T,conf.int = T)

Wilcoxon signed rank test with continuity correction

data: Before and After

V = 267, p-value = 0.0008655

alternative hypothesis: true location shift is not equal to 0

95 percent confidence interval:

3.499986 12.000043

sample estimates:

(pseudo)median

7.500019

- So, we can see the result above as p=.000865 so it shows Rejection of Null Hypothesis & a significant difference lies in blood pressure before & after the medicine.

One sided test to find which condition is greater or smaller

- Case 1:
- H˚= the blood pressure before taking medicine > than after taking medicne i.e µ˚>µ1

H1 =the blood pressure before taking medicine < than after taking medicne i.e µ˚<µ1

before is always considered as base µ˚ & here H1 says that after taking medicine the blood pressure increase so it is a upper tailed test. To do this in R, we can do the following.

> wilcox.test(After,Before,mu = 0,alternative = "greater",paired = T,conf.int = T)

data: After and Before

V = 33, p-value = 0.9996

alternative hypothesis: true location shift is greater than 0

95 percent confidence interval:

-11.49998 Inf

sample estimates:

(pseudo)median

- -7.500008

so we found p>.05 and thus accept null hypothesis and accept H˚ that blood pressure before taking medicine was > than after taking medicine.

H˚= the blood pressure before taking medicine < than after taking medicne i.e µ˚<µ1

H1 =the blood pressure before taking medicine > than after taking medicne i.e µ˚>µ1

so, we will evaluate according to H1 & here µ1 is less.therefore it is a lower tail test

> wilcox.test(After,Before,mu = 0,alternative = "less",paired = T,conf.int = T)

Wilcoxon signed rank test with continuity correction

data: After and Before

V = 33, p-value = 0.0004327

alternative hypothesis: true location shift is less than 0

95 percent confidence interval:

-Inf -4.000002

sample estimates:

(pseudo)median

-7.500008

- , i.e p < .05 so we reject null hypothesis and accept alternative H1

i.e. blood pressure before taking medicine > than after taking medicine

we can see this using box plot too as below

Conclusion:

Kruskal -Wallis H-Test

Kruskal Wallis H-Test in R:

There are certain perquisties before running the test in

- We can use Hypothesis testing to test whether there is a impact of something(Medicine, exercising, smoking) on a group before and after doing that something OR there is no impact because of doing something (null hypothesis case).
- We can also use Hypothesis Testing to test whether there after doing something (Medicine, exercising , smoking, etc.) the value of after doing something increases or decreases.

Kruskal -Wallis H-Test

- It extends Mann-Whitney U Test where there are more than two groups.
- It is equivalent to the parametric test ANOVA
- The Test determines whether the medians of two or more group are different .
- The dependent variable can be Ordinal , Interval or Ratio Scale.
- The sample in the study should be independent i.e we can not resample the same sample.
- This test tells whether the three or more groups in study are statistically significantly same (H˚) or different (H1) for a given continuous parameter like (height, length, number of success, no. of hours studied etc.
- Also, using a different package in R, the relationship between various groups can also be evaluated. Like which two group have statistically different height etc. & which two group don't have statistically different heights

Kruskal Wallis H-Test in R:

There are certain perquisties before running the test in

- First, the group that contain various categories must be present to R in a factor format.
- Second, the Homogeneity of Variance (http://www.introspective-mode.org/assumption-homogeneity-variance-univariate/)should be checked - As we get data from various groups there are chances that variance of each group is too varied or they groups cannot be compared because of non-matching of scale so homogeneity of variance test should be applied and they should be found homogeneous by accepting the null hypothesis (p>.05) in their respective test for more details to select the test visit the link http://www.cookbook-r.com/Statistical_analysis/Homogeneity_of_variance/
- After above we use Kruskal wallis Test in R the sample code is below
- >kruskal.test(Length~Group, data=dataset), ## if p > 0.05 than null hypothesis accepted and there is no difference in (height, weight, etc..) among the various group, if p < 0.05 than there is a difference among group in term of the parameter like height, weight, etc.. but it will not tell about in which group the difference exists.
- To know the difference in ANOVA we used post hoc test (Tuckey HSD) but here we will use DUNN Test to do it install package "FSA"
- >dunnTest(Length~Group, data=dataset,method="bonferroni") # this way we can analyse the p-value among all the group in the test and can say which two group are significantly different from each other (p<-0.05) & which groups are same i.e no difference(p<0.05). This test also gives Z- Value if Z value is positive than (wallLizard ~ VivoparaousLizard) than wallLizard is greater than VivoparaousLizard, i.e. the first one in comparison is larger if Z value is +ve for the corresponding comparison.
- watch the video to get complete detail about Kruskal Wallis Test https://www.youtube.com/watch?v=Y1qeAFAV5yQ
- The Friedman's Test have similar application too.