Formula Quiz 1

Question Answer
individual an object described by a set of data
variable a characteristic of an individual
quantitative variable a variable that takes on a numerical value that can be measured
quantitative data values of quantitative variables
categorical/qualitative variable a variable that places an individual into a category
distribution indicates what values a variable takes on and the frequency at which it takes these values
graphs of qual. pie chart, bar chart
bar chart qual.
pie chart qual.
graphs of quan. dotplot, histogram, stemplot
dotplot quan.
historgram quan.
stemplot quan.
outlier an individual observation that falls outside the overaell pattern of the graph
relative frequency histogram has the same shape as a histogram with the exception that the vertical axis measures relative frequencies instead of frequencies
key features of a histogram the center (mean, median), the spread (range), the shape
shapes of a graph. symmetric, skewed left, skewed right
measures of center sample mean, mode, median
sample mean arithmetic average or arkithmetic mean
mode element or elements that occur most often
median "the middle number"/average of two middle numbers
median position formula (n+1)/2 (n=number of numbers in the data set)
mean=median when… distribution is perfectly symmetric
when it is skewed right the mean is dragged to the right
when it is skewed left the median is dragged to the left
measures of spread range, iqr, five number summary, the variance and the sample standard deviation
range largest #- smallest #
iqr IQR= Q3-Q1
five number summary min. q1 med q3 max
variance sum of (xi-the mean)squared divded by n-1
standard deviation (equation) square root of variance…sum of (xi-average)square/n-1
standard deviation (definition) the st. dev. is a set of numbers that emasures how numbers are spread out from the mean
xi-xbar a deviation of xi from the mean
the sum of all the deviations of the mean always equals 0
st. dev. is …. to outliers nonresistant (is affected by)
n-1 is.. degrees of freedom
a datapoint is an outlier if… it lies mroe than one a half iqr ranges before q1 or above q3
boxplot is a graph which displays five num summary of a set of data
modified boxplot a graph that displays the fiver numeb summary of a data set (tests for outliers)
side-by-side boxplots can be used to compare the distributions of to data sets
within one standard deviation of the mean 68% of the data will fall
two sample standard deviations from the mean about 95% of the data will fall
three devations fromt he mean about 99.7% of the data wqill fall
z-score meansures how far these points lie from the mean (using standard devations as the unit)
equation for z-score x-xbar/s
sample mean of a z-score is 0
the sample dev of a zcore is 1
cumulative frequency is the nunber of observations less than or equal to a given number
cumulative relative frequency cumulative frequency divded by the toal number ofobservations
empirical distribution function is a graph of the cumulative relative frquency vs. the raw data in the sample
a density curve a curve that always lies on or above the horizontal axis and has area exactly of 1 underneath
median of a density curve is the point that divides the area under the curve in half
mean of a density curve the point at which the curve would balance if it was made of a solid material
the standard normal distribution is..(mean/st. dev.) a normal distribution with mean 0 and standard deviation 1
conversation formula is used to convert normal distribiton values to standard normal distribution values
conversation formula (actual form) z= (x-mu)/s
what does a z-score measure the number of standard deviations between anobservation x and the mean mu of the data set
normal quantile plot graphs raw data (horizontal) versus their z-score (y-axis)
a data set is approximately normal when its quantile plot is approximately linear
independent variable x is the explanatory varaible
dependent variable y response variable
directions of scatterplots positive association, negative association or neither
scatterplots are analyzed according to: direction, form, strength of relationship, and outliers
correlation coefficient measures the direction and strength of the linear relationship between two quantitative variables
formula for r r= one over n minus 1 times the sum of the (xi-x) divided by sx and (yi-y) divided by sy
the correclation coefficient r is always a number between -1 and 1
if r is positive then x and y have a posistive association
if r=1 then x and y have a perfect positive correlation
if r is negatrive then x and y have a negative association
if r=-1 then x and y have a perfect negative correlation
least squares regression lineis the equation.. of the line that makes the sum of the squares fof the residuals as small as possible
equation for the LSQR yhat=bnaught+b1x
bnaught is.. y intercept
b1 is… the slope of the line
equation for b1 r(sy/sx)
equation for bnaught ybar-b1xbar
ybar the mean of the y coordinates
x bar is the mean of the x coordinates
the difference between y and yhat is called an error or a residual
residual is the observed value of y mins the predicted value of y (y-yhat)
the point xbar, ybar… is a point on every regression line
rsquared is called the coefficient of determination
rsquared measures the variation in y that is explained by y's linear association with x
a residual plot graphs.. the residuals on the vertical axis and either the explanatory, response or preodicted response values on the horizontal
residuals from a LSQR always have a mean of 0
the horizxontal axis of a residual plot corresponds to the regresson line
an observation is influential if removing it would markedly change the position fot her egression line