Correlation and Regression

Question Answer
Purpose of Correlation and Regression Make inferences based on sample data that come in pairs. Determine if there is a linear relationship b/w the two quantitative variables & describe it with an equation that can be used for predictions. Two dependent populations (quantitative data).
Correlation Correlation coefficient measures the strength of the linear relationship b/w two quantitative variables. Variables must be continuous/discrete. Use scatter plot. X & Y are linearly related if the scatter of points can be approximated by a straight line.
Correlation Coefficient r measures the strength of the linear relationship b/w the paired x & y values in a sample. Represents the linear correlation coefficient for a sample. Rho represents the linear correlation coefficient for a population.
Correlation Sxy:covariance of x & y. Sx:standard deviation of x. Sy:standard deviation of y.
Interpreting the Linear Correlation Coefficient r Between -1 & 1. If r close to 0, no linear correlation b/w x & y. If r close to -1 or 1, strong linear correlation. Negative value indicates negative or inverse relationship. Positive value indicates positive relationship. r measures strength & direction.
Factors That Affect the Size of r Nonlinear relationship: linear correlation only measures degree of linear relationship, so if Xs and Ys are nonlinearly related, r may be 0 even though the 2 variables are nonlinearly related. Restricted range: restrictions on range of X/Y will reduce r.
Factors That Affect the Size of r Extreme Scores: a single extreme score may produce evidence of correlation when none exists. Combining groups: there may be no correlation w/n either group, but combining them can give the illusion of a linear correlation. Can also change its direction.
Correlation Testing hypotheses about rho. A single r can be tested to determine if the corresponding rho is different from a hypothesized value. df=n-2. CORRELATION DOES NOT PROVE CAUSATION. measures how well the best-fitting straight line actually fits.
Assumptions For each value of X there is a normally dist. subpop. of Y values. For each value of Y there is a normally dist. subpop. of X values. Joint dist. of X & Y is a normal dist. The variance of Xs/Ys is same at each value of X/Y (homoscedasticity).
R-Squared Coefficient of determination. The proportion of the variation in y that is explained by the linear relationship b/w x & y. SSR/SST. Measures closeness of fit of the sample regression equation to the observed values of Y.
Regression Used to find the best-fitting straight line that relates the scores. Objective is to predict the value of one variable (the outcome) based on the value of another variable. Use scatter plot. Best fitting line minimizes y-yhat (actual-predicted).
Regression SStotal:variation in obs. values of response variable. SSregression:variation in obs. values of response variable explained by regression. SSerror:variation in obs. values of response variable not explained by regression. SSR:1df SSE:n-2df SST:n-1df
Least Squares Criterion The best fitting straight line is the one that minimizes the sum of the squared deviation b/w the actual y values & the predicted values. Minimize SSE.
Beta The population parameter for b, the slope of the line.
How Can You Tell A Regression Question From A Correlation Question? Intent: Prediction=regression, Strength of relationship=correlation