Introduction:
Many research studies have aimed at identifying the most influential variable for the given dependent variable and it very important if the numbers of independent variables are large in number. Always models with large number of independent variables will cause over fitting problem as well as reduced model efficiency. So one can be much interested in knowing the list of most influential variables by which he can draw some meaningful conclusions about the dependent variable.
In the following sections, we briefly explained some of the techniques to identify the most influential variable in the data.
Influential variable by using the study of partial correlation
In the multiple regression study, one can trust on semi partial correlation coefficient and normal correlation coefficient will throw good light on variable importance. The squared semi-partial correlation indicates the unique proportion of variance explained in the outcome variable by the target predictor over and above the other predictors involved in the study.In SPSS we can get the partial correlation directly as shown below.
Using regression Coefficients for influential variable:
In simple or multiple linear regression, the size of the beta coefficient for each independent variable gives you the size of the influence that variable is having on your predicted variable, and the sign on the coefficient (positive or negative) gives you the direction of the effect. In regression with a single independent variable, the coefficient tells you how much the dependent variable is expected to change ( Increase if the coefficient is positive or decrease if the coefficient is negative) when that independent variable increases by one. In regression with multiple independent variables, the coefficient tells you how much the dependent variable is expected to increase when that independent variable increases by single unit, keeping all the other independent variables as constant. Here important point to keep in mind is that the units of measurement of variables. It is assumed here that all the variables are measured in uniform units. Partial R-square Value:
The partial R-square value will give the good idea of how much variability in dependent variable is covered by the each of the independent variable. The greater the value of partial R-square value will gives the impression of most significant variable in the current multiple regression study. The SAS system through stepwise regression provides the partial R-Square value for each of the independent variable in the following form
Number Partial Model
Step Label Vars In R-Square R-Square C(p) F Value Pr > F
1 height 1 0.4873 0.4873 470.186 475.23 <.0001
2 Flow 2 0.0908 0.5781 300.778 107.35 <.0001
3 Speed 3 0.0528 0.6309 203.072 71.23 <.0001
4 Pressure 4 0.0238 0.6546 160.218 34.18 <.0001
Other methods for influential variable:
Many researchers are used several other techniques also to determine the most influential variable in the data depending on the type of study and data availability. Some of them are used principle component analysis to get most important variable in the data. For example, the greatest coefficient in the first principle component will hints out the most influential variable from the given set of independent variable.
In some other cases, one can go for graphical techniques like Added variable plots or partial regression plots to get an idea of most influential variable. But in all of the above cases, the crucial part is the researcher’s knowledge about data and variables and the interpretation skills. No particular technique is suitable for all types of scenarios and hence one can get good idea after practice only.