## Statistics concepts of Correlation & Regression with some R Programming

**Converting quantitative into Ordinal**when you have lots of quatitative data then in order to analyse them you can make them into categorial one by making Frequency Table . For example you have weight of 100 students of a class then it's a good practice to present it into categorical one by making a frequency table as shown below.

**weight frequency percentages**

40-45 4 5

45-50 7 9

50 -55 14 16

.......

so this way a large quantitative data can be easily anlayzed with more sense of understanding using frequency table & converting into categorical data.

**Types of Graphs for different type of data****For Nominal/Ordinal data:**It is always advisable to make pie-chart & Bar graph for Nominal & Ordinal data

**For Interval & Ratio:**To define the data in this category one need to show their data in Histogram and check for the skewness of the curve.

**Note:**One cannot apply mean and median at Nominal data because it can't be arranged in ascending order For example. Donkey, Monkey, Cat, Rabbit & Mouse can't be arranged in ascending order nor their mean is possible

Note: When there are outliers that can seriously impact the central tendency than mean is not a good choice. In case of outliers we prefer median.

**Range****:**It's a measure of variability in a data more the range the more the variability. To describe the Range in a more informative way we can use boxplot. In boxplot we can use show the variability of data in each quartile.

**Z- Score:**z-score is a number that will tell us that how far a value is from the mean value of the data.

to learn more In detail I will refer to a great website that will discuss the concept in detail.

http://www.statisticshowto.com/probability-and-statistics/z-score/

Pearson's R correlation:

Pearson's correlation test is a test that discuss the linear relation between two variable. The value of "r" lies between 0 &1.The more it is toward 1 the more are the chances of variable to be linearly correlated.

To read in detail about this great concept we can refer to a great article on the topic as linked below

https://statistics.laerd.com/statistical-guides/pearson-correlation-coefficient-statistical-guide.php

To know the basic steps of calculation on can refer the link

http://www.statisticshowto.com/how-to-compute-pearsons-correlation-coefficients

Another method or a direct way to calculate Pearson's r is as below

**Regression Analysis**

__It is the process of estimating relationship among dependent & independent variables__- Used in forecasting & Prediction of dependent variable
- Used to identify the strength of the effect that the independent variables have on a dependent variable

After a & b are calculated we can put them in slope intecept form (y=bx+a), which will be the line of regression. After then for any value of x, y can be evaluated and thus value can be found out.

Correlation tells & discuss the relationship between variables. It lay emphasis on how strong & impactful is the dependent & independent variable on each other. It tells the strength of linear relationship.

while

Regression is the way to evaluate the value of dependent or independent variable using mathematical equations, usually using a line equation of slope intercept form, where if one variable is known other can be found out.

It's again pretty simple, here are the command below

> cor(CO2$conc, CO2$uptake) #correlation between two variable basic method

[1] 0.4851774

> cor.test(CO2$conc, CO2$uptake) #complete method which tell all details relation to correlation

Pearson's product-moment correlation

data: CO2$conc and CO2$uptake

t = 5.0245, df = 82, p-value = 2.906e-06

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.3022189 0.6336595

sample estimates:

cor

0.4851774

Correlation among various variable

> cor(data[,5:9]) # Just choose all the columns which need to be correlated

Expense.Ratio Return.2006 Z.Values X3.Year.Return X5.Year.Return

Expense.Ratio 1.00000000 -0.1335501 NA -0.1099471 -0.05741696

Return.2006 -0.13355013 1.0000000 NA 0.6975430 0.59339807

Z.Values NA NA 1 NA NA

X3.Year.Return -0.10994715 0.6975430 NA 1.0000000 0.83728490

X5.Year.Return -0.05741696 0.5933981 NA 0.8372849 1.00000000

Normality Test

>install.packages("nortest") # Anderson-Darling normality test

>library(nortest)

> ad.test(data[,7])

Anderson-Darling normality test

data: data[, 7]

A = 1.2491, p-value = 0.002968

>library(moments) # install package moments

>skewness(time)

[1] -0.01565162

>kurtosis(time)

[1] 2.301051

**Difference between Co-rrelation & Re-ggression**Correlation tells & discuss the relationship between variables. It lay emphasis on how strong & impactful is the dependent & independent variable on each other. It tells the strength of linear relationship.

while

Regression is the way to evaluate the value of dependent or independent variable using mathematical equations, usually using a line equation of slope intercept form, where if one variable is known other can be found out.

**Calculation of Correlation & Regression in R**It's again pretty simple, here are the command below

> cor(CO2$conc, CO2$uptake) #correlation between two variable basic method

[1] 0.4851774

> cor.test(CO2$conc, CO2$uptake) #complete method which tell all details relation to correlation

Pearson's product-moment correlation

data: CO2$conc and CO2$uptake

t = 5.0245, df = 82, p-value = 2.906e-06

alternative hypothesis: true correlation is not equal to 0

95 percent confidence interval:

0.3022189 0.6336595

sample estimates:

cor

0.4851774

Correlation among various variable

> cor(data[,5:9]) # Just choose all the columns which need to be correlated

Expense.Ratio Return.2006 Z.Values X3.Year.Return X5.Year.Return

Expense.Ratio 1.00000000 -0.1335501 NA -0.1099471 -0.05741696

Return.2006 -0.13355013 1.0000000 NA 0.6975430 0.59339807

Z.Values NA NA 1 NA NA

X3.Year.Return -0.10994715 0.6975430 NA 1.0000000 0.83728490

X5.Year.Return -0.05741696 0.5933981 NA 0.8372849 1.00000000

Normality Test

>install.packages("nortest") # Anderson-Darling normality test

>library(nortest)

> ad.test(data[,7])

Anderson-Darling normality test

data: data[, 7]

A = 1.2491, p-value = 0.002968

**Skewness / kurtosis**>library(moments) # install package moments

>skewness(time)

[1] -0.01565162

>kurtosis(time)

[1] 2.301051

div id="fb-root">