Monday, March 5, 2012

How to identify the most influential variable in the data


Introduction:
Many research studies have aimed at identifying the most influential variable for the given dependent variable and it very important if the numbers of independent variables are large in number. Always models with large number of independent variables will cause over fitting problem as well as reduced model efficiency. So one can be much interested in knowing the list of most influential variables by which he can draw some meaningful conclusions about the dependent variable.
In the following sections, we briefly explained some of the techniques to identify the most influential variable in the data.
Influential variable by using the study of partial correlation

In the multiple regression study, one can trust on semi partial correlation coefficient and normal correlation coefficient will throw good light on variable importance. The squared semi-partial correlation indicates the unique proportion of variance explained in the outcome variable by the target predictor over and above the other predictors involved in the study.In SPSS we can get the partial correlation directly as shown below.                  
Using regression Coefficients for influential variable:
In simple or multiple linear regression, the size of the beta coefficient for each independent variable gives you the size of the influence that variable is having on your predicted variable, and the sign on the coefficient (positive or negative) gives you the direction of the effect. In regression with a single independent variable, the coefficient tells you how much the dependent variable is expected to change ( Increase if the coefficient is positive or decrease if the coefficient is negative) when that independent variable increases by one. In regression with multiple independent variables, the coefficient tells you how much the dependent variable is expected to increase when that independent variable increases by single unit, keeping all the other independent variables as constant. Here important point to keep in mind is that the units of measurement of variables. It is assumed here that all the variables are measured in uniform units.
Partial R-square Value:
The partial R-square value will give the good idea of how much variability in dependent variable is covered by the each of the independent variable. The greater the value of partial R-square value will gives the impression of most significant variable in the current multiple regression study. The SAS system through stepwise regression provides the partial R-Square value for each of the independent variable in the following form
                  Number   Partial    Model   
     Step  Label   Vars In  R-Square  R-Square   C(p)    F Value  Pr > F

    1   height       1     0.4873    0.4873   470.186   475.23  <.0001
    2   Flow         2     0.0908    0.5781   300.778   107.35  <.0001
    3   Speed        3     0.0528    0.6309   203.072    71.23  <.0001
    4   Pressure     4     0.0238    0.6546   160.218    34.18  <.0001
   

Other methods for influential variable:
Many researchers are used several other techniques also to determine the most influential variable in the data depending on the type of study and data availability. Some of them are used principle component analysis to get most important variable in the data. For example, the greatest coefficient in the first principle component will hints out the most influential variable from the given set of independent variable.
In some other cases, one can go for graphical techniques like Added variable plots or partial regression plots to get an idea of most influential variable. But in all of the above cases, the crucial part is the researcher’s knowledge about data and variables and the interpretation skills. No particular technique is suitable for all types of scenarios and hence one can get good idea after practice only.


Wednesday, July 13, 2011

Frequently asked questions in Common multi variate techniques

Many a times, some of my friends asked about frequently asked questions in multivariate techniques at interviews, conferences etc. But, infact these questions do not have any limited scope but i tried a littel in this way here is my prefered questions.

Multi variate techniques:

Multiple regression analysis:

  • What is difference b/n  multiple regression and multivariate regression
  • How to select inde variables in to the system?
  • What are the measures of efficiency?
  • What is the specific proc in SAS
  • Assumptions underlying and the consequences of their violations.
  • Estimation techniques, adv and dis adv


Logistic regression analysis:

  • Difference b/n logistic and traditional regression
  • Assumptions if any?
  • Estimation method?
  • Efficiency measures?
  • Which domain having major applications
  • Odds ratio implementation
  • Tests of goodness of fit

Descriminant analysis:

  • What is the aim of descriminant analysis
  • Methods of constructing descriminant functions
  • Fisher discriminant function
  • Issue of multi collinearity here
  • What is cluster descriminant
  • Domain applications

MANOVA:

  • Tests of MANOVA
  • Structure of model in manova
  • Assumptions of MANOVA
  • How to use and read in SAS environment
  • Difference b/n manova and multiple regression
  • PROC MANOVA





Factor analysis:

  • What is the difference between PC and PAF (Principle factor analysis?)
  • What is a Simple or Clean Factor Structure?
  • Types of factor analysis
  • Applications in manufacturing
  • PROC PRINCOMP, how to improve the performance
  • Type of conclusions in FA

Multi dimensional scaling:

  • Types of multi dimensional scaling
  • How to decide on what dimensions respondents use when evaluating objects
  • how many dimensions they may use in a particular situation
  • test for the relative importance of each dimension
  • how the objects are related perceptually

Correspondence analysis:

  • How to use it in market research?
  • How to read the parameters?
  • Any significant tests
  • How to apply in SAS

Conjoint analysis:

  • Advantages in market research
  • Types of conjoint analysis
  • Relation with regression and logistic regression
  • Latest developments
  • Steps in the design of the studies

Cluster analysis:

  • Why it comes under multivariate techniques
  • How to choose the variables for the clustering
  • What are the types of clustering?
  • Measures for efficiency of clustering
  • Reports based on clustering


Canonical correlation:

  • Why it is important than usual correlation
  • What is the complexity involved here
  • Application area


Structural equation modeling:

  • Why it is so significant
  • Applications in SAS
  • How to interpret the results

Wednesday, June 22, 2011

Regulatory science-some concepts

Def of Regulatory science:

It is science  dealing with innovative methods and tools to asses the safety, quality and efficiency of FDA products.

Where is the role of regulatory professional lies in:

  • It begins with R&D Phase
  • Moves in to clinical trials analysis
  • It extends to pre market apporvals 

Thursday, March 3, 2011

Steps in data analysis:

The following are general steps in in data analysis:

1.      Requirement analysis
2.     Formulation of hypotheisis
3.     Designing the survey
4.     Data collection/data tabulation.
5.     Performance prescribed analysis and sample data.
6.     Evaluate results and carry analysis on full data.
7.     Tabulate results and conclusions.
8.     Limitations and Assumptions if any.
We will see explnation on them in the next post.

test blog

test blog