## data analysis

useful graphs scatterplot
can get a sense for the nature of the relationship
what to look for in a graph relationship between two variables where one variable causes changes to another
location where most of the data lies
spread variability of the data, how far apart or close together it is
shape symetric, skewed etc
nature of relationship existent/ non-existent
strong/ weak
increasing/ decreasing
linear/ non-linear
outliers in scatterplots represent some unexplainable anomalies in data
could reveal possible systematic structure worthy of investigation
casual relationship relationship between two variables where one variable causes changes to another
explanatory variable explains or causes the change
on x-axis
response variable is changed
on y-axis
useful numbers correlation and regression
formula for the correlation coefficient r= 1/(n-1) ?-?((xi-x ?)/sx?)((yi-y ?)/sy)
xi or yi axis values of corresponding letter
xbar or ybar mean of axis values of corresponding letter
sx or sy standard deviation of axis values of corresponding latter
properties of r close to 1 = strong positive linear relatoinship
close to -1 = strong negative linear relationship
close to 0 = weak or non-existent linear relationsip
cautions about the use of r only useful for describing linear relationships
sensitive to outliers
regression models general linear relationships between variables

focus negative = decrease

what regression modelling does describes behaviour of response variable (the variable of interest) in terms of a collection predictors (related variables ie. explanatory variable(s))
a linear framework is used to look at? the relationship between the response and the regressors
formula: Y = ? + ?x
Where ? is the intercept and ? is the slope
ideal model for linear framework in terms of responses and regressors one unique response to one given regressor
real world model for linear framework in terms of responses and regressors must approximate
statistical model relates response to physical model predictions
allows for better predictions and quantification of uncertainty concerning the response
to make decisions
what does regression analysis do? finds the best relationship between responses and regressors for a particular class of models
experimenter controls predictors, why? may be important for making inferences about the effect of predictors on response
course assumption predictors are controlled in an experiment or at least accurately measured
define a good statistical model fit, predictive performance, parsimony interpretability
qualitative description of model response = signal + noise
Y = ? + ?x + o
o = noise
define signal a small number of unknown parameters
variation in response explained in terms of predictors
it is the systematic part of the model
define noise residual variation unexplained in the systematic part of the model
can be described in terms of unknown parameters
what does a good statistical model do to possibly large and complex data reduces it to a small number of parameters
a model will fit well if the systematic part of the model describes much of the variation in the response (low noise)
large number of parameters may be required to do this
define parsimony: smaller number of parameters = grater reduction of data, more useful for making a decision
there is a cycle between what? tentative model formulation, estimation of parameters and model criticism
a good model will manage balance between goodness of fit and complexity
provide reduction useful data
model response variable in terms of a single predictor yn = values of the response variable