## Basics of Factor Analysis

**About Factor Analysis**

- Factor Analysis is often referred to as data reduction process in which large no. of variables are reduced to fewer no. of factors.
- It is a way of taking large datasets and conscising it to smaller datasets that are easy to manage & understand.
- Factor Analysis is used to find the structure in relationship between variables. Therefore it is also used as structure detection technique.
- Principal Factor Analysis is also called Common Factor Analysis and it aims to identify minimum no. of factors that causes correlation between a set of variables.
- Factor Analysis is a technique that shrinks mass of data into smaller data set that is more easy to manage & easier to understand
- Measuring the unobservable (Brand Success, Love, Anger, Spirit, Quality of life) is Factor Analysis.
- Each Factor has certain amount of the overall variance in the observed variables.
- The factors are always listed as per the order of variation they exhibits
- A FACTOR is a set of observed variables (data that has been observed or recorded also know as manifest variable) that have similar response pattern.
- These set of observed variables are associated by a hidden variable known as confounding variable
- The eigen value is a measure of the variance of the observed variable that a factor explains.
- Any factor with eigenvalue >=1 explains more variance.
- The factor which captures maximum of the variance in two, three or four or more no. of variables will be used in further analyses
- The factor explaining least amount of variance are usually discarded.
- The relationship of each variable to the underlying factor is expressed in Factor Loading. Factor Loading varies from 0 to 1. The more it is toward 1 stronger is the relation.
- To read about factor analysis in little more details via an example visit the link http://www.theanalysisfactor.com/factor-analysis-1-introduction/

**Assumptions & Tips**

- There should be some degree of correlation among the various variables. There is a test to identify the correlation among variables and it is Bartlet Test of Sphericity, we usually want it to be significant and it tells that the variables under study are significantly correlated.
- To conduct factor analysis the sample size should be around 100
- The data should be continuous in interval or ratio.
- One variable in study should have 10 respondents. So, if we have 20 variables than the respondent size should be around 200. Well, 300 is regarded as gold standard & the more the merrier it is.
- We should take more variable because at the time of analysis some of them need to be removed.
- Sometimes it happens that that out of 10 variables 6 got into one factor and the other factor got just one or two variable. To avoid this oddness we use factor rotation technique.

**Eigen values**are the sum of square of all the loading for a particular factor- Also, there is a problem that many variables fall under only one factor, to avoid this we use rotation technique namely orthogonal rotation (perpendicular) or oblique rotation.
**Rotation of the factors**- There are two types of rotation**Orthogonal rotation & Oblique rotation**. We use orthogonal rotation when factors are uncorrelated and are located at 90 degree while we use oblique rotation when factors are correlated and are not at 90 degree.

**Discussion on Factor Loading, Variables & Factors**

**Factor loading**are nothing but the correlation of the variables with the factors- Usually if a variable is loaded high on one factor then generally it is loaded low on other factor.Let us see an example here. Below is an ideal example where all factor loading are evenly distributed with each factor getting two variable. Also each loading is high just once for a variable so it is easy to identify that which variable corresponds to which factor, hence no contradiction.
- F1 F2 F3 F4 Communality(Rˆ2)

**67**.23 .11 .24 .67ˆ2+.23ˆ2+.11ˆ2+.24ˆ2=.507

V2 .10

**.93**.23 .43

V3

**.87**.32 .23 .13

V4 .10 .

**87**.11 .25

V5 .21 .02

**.76**.23

V6 .13 .14 .21

**.98**

V7 .15 .35

**.91**.23

V8 .42 .12 .43

**.76**

- But there is a problem sometimes that one variable show high correlation with two factor. This problem is known as the problem of
**cross loading.** - To overcome the problem of
**cross loading**- See the**communality**of that variable if the communality of that variable is less than 0.5 than remove that variable itself. **Communality**is the sum of square of all loading for a particular variable as calculated in V1 it shows that. Communality is the proportion of variation in that variable explained by the factors. So in above example we got Rˆ2= 0.507, so indicating that 50% of variation in variable V1 is explained by by the factor model (having four factors). If the percentage is less than 60 than the model is not good and the models couldn't explain the variables efficiently. So, we need to find the communality of all the variables and see which variables are getting percentage greater than 60, for those variables of which communality is greater than 60 model is explaining them better than those less than 60. To get more insights visit the link . The variable with lowest communality are usually removed from the study , unless it it not theoretically important http://sites.stat.psu.edu/~ajw13/stat505/fa06/17_factor/07_factor_commun.html**Scree Plot**

**Exploratory Factor Analysis**

**EFA**provides information about the optimal no. of factors required to represent the data set. It helps in exploring the datasets.- It is used to find the structure of large datasets. It reduces data to smaller summary sets.
- EFA is a good choice in beginning if one has no idea about what common factors might exist.
- EFA can generate large no. of possible models of the data. So it helps in exploring more.

**Confirmatory Factor Analysis.**

- If you have an idea as how the model will look like then to test the hypothesis about the data structure CFA is a great approach.

**Latent variable and Manifest Variable**

- I want to see the service of this online store is great or not (Latent Variable) then I will see several factors like a. delivery time; b. the price of the products; c. the quality of the products; d.the ease of handling the websites (all these factors that we can numerically observe are Manifest variable i.e the data collected factors).
- Latent Variable is the variable that we cannot collect like success, failure, sadness, the favourability of a brand; these are all assessed by a Manifest Variable.

**Important**

- Whenever we use Factor Analysis and reduce the no. of many Variables into less no. of Factors then the amount of variance explained by the study also reduces. So, suppose 100 variables are reduced to only 15 factors then there are chances that only 75 percent of the variance is explained. For a research to be successful atleast 60% of the variance should be explained by the factor.