|Regression partitions y into what two components?
||SST = SSE + SSR
SS Total (total variation in y) = SS Regression + SS Error
|What is the interpretation of the slope (b1) coefficient?
For a 1-unit increase in x, y increases by (b) units.
|Unstandardized regression coefficient
|What are standardized regression coefficients useful?
||It is useful to standardize the regression coefficients for multiple variables (change b to B).
|Standardized regression coefficient
|Coefficient of determination
||R2 = SSR/SST gives the proportion of variance in y accounted for by x.
Only pertains to linear association
|Why is the sum of the squared residuals, or errors, a minimum in OLS regression?
||Using a least squares procedure guarantees that b0 and b1 will produce estimates of y .
|What is the standard error of estimate (SEE)?
||SEE is standard deviation of the regression; Average distance of any point to the regression line
Also called root mean square error (b/c is
|How is SEE calculated?
||square root of MSE)
SEE = S y/x = SSE/n-p
When correlation between x & y=1, SEE=0 (all points are on the line).
|How does SEE differ from the standard error of the slope (regression coefficient)?
||SE of slope is beta (regression coefficient)
SE(b) measures slope
SEE measures scatter about the regression line
|What is the impact of non-constant error variance on the MSE?
||Non-constant error variance should increase the MSE
MSE = SSE/df
|What is one impact of an inflated (large) MSE?
||Reduces predictive power
Reduces coefficient of determination (R2)
|What departures from OLS regression can be studied using residual plots?
||1. Non-linear regression function
2. Non-constant variance (heteroscedascticity)
3. Correlated error terms (not independent)
4. Distribution of error terms not normal
5. Omitted an important IV from the model
|Departures from OLS regression ACRONYM
|How is a residual (error term) calculated?
||observed value – expected value based on regression equation; y – y hat
|How is a standardized residual calculated?
||Z = ei / SEE
(SEE = square root MSE, MSE = SSE/df)
|What are some of the drawbacks of using ZRESID to identify “unusual” cases?
||Z doesn’t account for unusual cases of x, they “mask” their effects by increasing SSE (&MSE)
|What three dimensions are used to characterize atypical or unusual observations?
||Leverage, Discrepancy, and Influence
||represents how unusually the case is in terms of its x value (extreme in predictor set)
||how unusual a value of y is for a given value of x (conditioned on x)
||how much of an impact each individual observation has on the global regression analysis (DFFITS) and on estimates of the regression coefficients (DFBETA)
|Conceptually, what is an externally studentized residual (SDRESID)?
||It calculates the residual for a point based on that point not being included in the MSE to determine how “unusual” this case is compared to the rest of the data set
Also known as jackknifed residual, studentized deleted residual
||used to check for serial correlations;
Plot residuals against time; there must be no relationship among the residuals for time
D=2 means no serial correlation(Range is 0 to 4)
|Is Durbin-Watson useful for all types of designs having non-constant error variance?
||No, only those where collection is spread over time or if time of collection is a factor
|A plot of ZRESID against the IV is useful for studying which types of departures?
|What can a plot of the residuals against a variable not included in the regression equation tell us?
||If we omitted a key variable (model specification)
It runs as a covarite and tells us what part of the error is associated with that variable. Including ut would therefore reduce MSE.
|How might one diagnose problems with non-constant error variance?
||Use residual plots
|Will the value of the standardized residual be large for all types of outlying observations?
||No, the standardized residual would NOT be large for leveraged outliers.
|Name a residual diagnostic that can be used to detect outlying x values.
|What is discrepancy and how is it measured?
||Discrepancy is how far y is from predicted value of y for a given value of x
It is measured by comparing ZRESID and Studentized deleted residual
|What are the two components of influence and what residual diagnostics are used to reflect those two components?
||Influence is how much the point moves the line.
DFFIT measures influence on y (whole regression equation)
DFBETA (x) measures influence on the slope (DFBETA for constant is less important)
||measures influence on the slope (DFBETA for constant is less important); global
||measures influence on y (whole regression equation); specific
|What measure tells us how much the group of independent variables together estimate y?
|What are the limitations of R2 when used to compare between different studies?
||It does not separate variables to determine the individual contribution of each variable, controlling for the others in the model
|What measure tells us about the contribution of a single IV to estimating y when other variables are included in the regression equation?
||Semi-partial correlation (must square to explain variance)
|How are these descriptive measures interpreted?
||Controlling for other variables, x1 accounts for n% of the variation in y.
|Why are the regression coefficients in a multiple regression equation called “partial”?
||Because they account for “part” of the variation accounted for by the full model
|Interpret a regression coefficient in a multiple regression model?
||Controlling for other variables, for every one-unit increase in x1, there is a n-unit increase in y.
|What hypotheses are tested in the ANOVA summary table of a multiple regression model?
H0: R2y.123…p = 0
H1: R2y.123…p > 0
H0: B1= B2= B3= . . . Bp= 0
H1: not all betas are equal
|If the F-statistic is significant, will all of the individual regression coefficients be significant?
||Not necessarily, depends on the beta for each variable
|What test determines significance of individual regression coefficients?
|What are effects of collinearity on regression?
||1. Affects estimates of partial regression coefficients
2. Affects size of SE(b)
3. Makes interpretations more complex b/c estimate of effect depends upon variables included
4. When extreme, there is no unique solution to the regression problem.
|What is the term for extreme cases of inter-correlation among the IVs?
|What factors determine the size of the standard error of a regression coefficient?
||1.Specification issues – IV omitted, model doesn’t explain enough variance
2. Restricted range of x – not providing enough variation in x to show full range of y
3. Inter-correlation – high correlation > low tolerance > small denominator > big SE
|semi partial correlation
||Increase in R2 when x1 is added to an equation containing x2 or the percentage of variance in Y uniquely accounted for by x1 because all other variables have been statistically controlled.
|How is the semi-partial correlation interpreted?
||Controlling for other variables, x1 accounts for n% of variation in y.
||Correlation between y and x1 when linear effects of other variables are removed from x1 and y.
|How is the partial correlation interpreted?
||When the variation of other variables is removed, x1 accounts for n% of variation in y.