For Life Sciences

a statistic is? a summary of data
a field of statistics is? the collecting, analysing and understanding of data measured with uncertainty
what is a categorical variable? one which is measured descriptively eg: hair colour or major at university
what is a define quantitative variable? one which is measured numerically: time it takes to get home from work
graphical summary of one categorical variable? bar graph
graphical summary of one quantitative variable? histogram or boxplot
how to graphically summarise relationship between two categorical variables clustered bar chart or jittered scatterplot
how to graphically summarise relationship between two quantitative variables scatterplot
how to graphically summarise relationship between one categorical and one quantitative variable comparative boxplots or comparative histograms
what to look for in a graph location, spread, shape, unusual observations
define 'location' graphically where most of the data lies
define 'spread' graphically variability of the data, how far apart or close together it is
define 'shape' graphically symetric, skewed etc
how to numerically summarise one categorical variable table of frequencies or percentages
how to numerically summarise one quantitative variable location: mean or median;
spread: standard deviation or inter quartile range
formula for mean? xhat=1/N times summation of xi;
preferable for approximately normal data
formula for Median? M=midn or (midn1+midn2)/ 2;
less affected by outliers therefore used for outlier ridden data
formula for standard deviation? s=v1/N-1 times summation of ((xi-x) squared);
preferable for approximately normal data
formula for inter quartile range? Q3 – Q1= IQR;
less affected by outliers therefore used for outlier ridden data
which numbers are needed to create a five number summary? minimum, Q1, median (sometimes mean included), Q3, maximum
an outlier is? more than 1.5 x IQR lower than Q1;
more than 1.5 x IQR higher than Q3
define linear transformation transformation of a variable from x to xnew
examples of linear transformation use change of units;
use of normal assumption therefore to find 'z' scores
formula for linear transformation? xnew=a+bx
formula for new mean once linear transformation has occurred? xbarnew=a+bxbar
formula for new median once linear transformation has occurred? Mnew=a+bM
formula for new standard deviation once linear transformation has occurred? snew=bs
formula for IQR once linear transformation has occurred? 1QRnew=bIQR
explain density curves area under the curve in any range of values is the proportion of all observations that fall within that range for a quantitative variable;
like a smoothed out histogram describes probabilistic behaviour
total area under density curve equals? 1
explain the normality assumption normal curve can be used if a histogram looks like a normal curve;
termed 'reasonable';
must start at 0 and end at 0
how does a normal quantile plot confirm the normality assumption? if in a straight line, or close to it, then normal and assumption is reasonable
define the 68-95-99.7 rule 68% of results will be within 1 standard deviation of the mean;
95% of results will be within 2 standard deviations of the mean;
99.7% of data will be within 3 standard deviations of the mean
symbol for mean of a density curve? ?
symbol for standard deviation of density curve? ?
normal distribution short hand X = random variable;
N = normal distribution;
first number in brackets = mean;
second number in brackets = standard deviation
explain the standard normal variable example of set out: P = (n>Z);
corresponds to the area under the curve of the corresponding region;
will always be to the left of Z
use of the standard normal distribution table to find P: Z found along x and y axis of table;
to find Z: P found in results of table;
table ordered from smallest to largest
reverse use of the standard normal distribution table eg of how set out: P(Z<c)= n;
c = right of Z
X =? N(?,?)
formula and use of standardising transformation Z= (X-?)/?;
used when distribution is not N(0,1)and so it needs to be altered
relationships between variables best explored through? why? scatterplot;
can get a sense for the nature of the relationship
how to define the nature of relationship? existent/ non-existent;
strong/ weak;
increasing/ decreasing;
linear/ non-linear
outliers in scatterplots? represent some unexplainable anomalies in data;
could reveal possible systematic structure worthy of investigation
define casual relationship relationship between two variables where one variable causes changes to another
define the explanatory variable explains or causes the change;
written on x-axis
define the response variable that which changes;
written on y-axis
useful numbers for two quantitative variables? correlation or regression
formula for the correlation coefficient? r= 1/(n-1) times summation (xi-xbar/sx)(yi-ybar/sy)
define xi or yi axis values of corresponding letter
define xbar and ybar mean of axis values of corresponding letter
define sx and sy standard deviation of axis values of corresponding latter
state the properties of r is the correlation coefficient;
numerically expresses relationships;
if close to 1 = strong positive linear relatoinship;
if close to -1 = strong negative linear relationship;
close to 0 = weak or non-existent linear relationsip
state the cautions about the use of r only useful for describing linear relationships;
sensitive to outliers
what is least squares regression used for? to explain how a response variable is related to explanatory variable;
focus positive = increase;
focus negative = decrease
mathematical representation of regression b1=r(sy/sx);
b0=yhat-b1xbar;
y=b0+b1x
facts about b1 b1 = r = correlation coefficient = slope
how to determine the strength of a regression rsquared = syhat/sy;
r-squared is the % variation in y explained by linear regression
state the basic regression assumptions y=b_0+b_1+error;
error~0;
error corresponds to random scatter about line;
this is checked by residual plots
formula for residual plots? y – y-hat
residual plot is a scatter plot of? residuals(y axis) against explanatory variable(x axis)
interpreting residual plots focus on pattern;
there should be no pattern;
if there is a pattern then the linear assumption is incorrect
what to do if any residuals stand out? they are either an influential point and to be left alone;
or they are an outlier and to be removed if affecting results too much
how to attach special cause to an outlier analyse if recording error;
refit line;
if remove then justify why (down weight influence)
translated residuals (removing the outlier) should have what effect? spread pattern
any 0 intercepting points on a residual plot are? 1 standard deviation from mean
if parabola presents after outlier removal? x-hat assumption not appropriate
if spread doesn't vary far from 0? there is no pattern
when to remove outlier if influences results
when will outlier not influence results? when close to mean;
– will have little influence on the gradient and intercept of fitted line
what are lurking variables? variables that can influence results which have not been taken into account
to account for lurking variables you? analyse the covariance
state the strategy for using data in research? identify question to be answered;
identify population studied;
locate variables: which one is IV and DV, explanatory and response;
define anecdotal data haphazard collection of data;
unreliable for drawing conclusions
define available data use of data that has come from another source
possibly obtained for a reason other than the one you intend to use it for
define collect your own data use of a census, a survey, or observations from an experiment
define census use of whole population to obtain data
define sample use of a randomised selection of the population to represent the whole;
smaller and easier to do than a census
explain observational study no variables are manipulated or influenced;
data obtained from population as it is
explain experiment variables are influenced or manipulated so that responses can be noted and recorded;
usually a control group utilised
control group = does not undergo treatment, act as a comparison group
explain causation a response that is the result of another variable eg: moon's movements CAUSE the tides
common response in terms of variables means? explanatory variable causes the response variables;
response variables are associated to one another
causation in terms of variables means? explanatory variable causes response variable;
response variable and explanatory variable are associated
confounding in terms of variables means? two or more explanatory variables are present and associated to one another;
all explanatory variables could have caused response variable by themselves or together;
explanatory variables called confounded causes
why an experiment? allows demonstration of causation;
intervention can be used to determine whether or not effect is present
state the principles of experiment design subjects, treatment, factor, levels, response variable
definition of subjects in terms of experiment design things upon which experiment is done;
eg: people, animals, chemicals etc
definition of treatment in terms of experiment design circumstances which applied to subjects;
eg: given medication
definition of factor in terms of experiment design variables that are apparent within different treatments;
eg: given medication or placebo
definition of levels in terms of experiment design formation of treatments determined by which combination of factors used;
eg: dosage of medication/how many doses per day vs dosage of placebo/ how many placebo taken
definition of response variable in terms of principle of experiment design the variable which will answer the question
variable of most interest that is measured on subject after treatment
explain a principles of experiment summarisation table Factors on x and y axis;
levels in first columns and rows;
rest of table = number allocated to that particular treatment group
state the three principles all experiments must follow compare two or more treatments where one is the control;
random assignment of subjects to treatments;
repeat the experiment on numerous subjects (for reduction of confounding variables)
how to randomise allocate all subjects a random number;
order subjects in accordance to those random numbers (smallest to largest, or largest to smallest);
form treatments by selecting subjects in a systematic pattern applied to the random numbers representing subjects
define control group: different from all other treatments as it only pretends to apply explanatory variable;
is the group that the results are compared against
explain random comparative experiment subjects randomly allocated one of several treatments;
responses compared across treatment groups
explain matched pairs design break subjects with similar properties into pairs;
one of two treatments applied to one of each pair;
can produce more precise results;
used in before and after, and twin studies
explain random block design block = group of subjects known before experiment to be similar in some way that would affect response;
randomised assignment of treatments to subjects within block;
matched pairs is special case of this
experimental caution: appropriate control only variant across treatment(s) is/are factor(s)
experimental caution: beware of bias administrator of experiment can present bias towards certain treatment to certain subjects
double blind accounts for this: neither subject nor administrator know which treatment applied
experimental caution: repetition of entire subjects all steps for experiment are performed for all subjects in all treatments
experimental caution: realistic experiment experiment needs to duplicate real-world conditions