Statistics for the Behavorial Sciences 9th Edition Chap. 11

Question Answer
The dependent variable is measure two or more times for each individual in a single sample. The same group of subjects is used in all of the treatment conditions. Repeated-Measures Design or Within-Subject Design
Each individual in one sample is matched with an individual in the other sample. The matching is done so that the two individuals are equivalent with respect to a specific variable the researcher would like to control. Matched-subjects design
For a research study comparing two treatment conditions, what characteristic differentiates a repeated-measures t statistic? For a repeated-measures design, the same group of individuals is tested in both of the treatments. An independent-measures design uses a separate group for each treatment.
Describe the data used to compute the sample mean and the sample variance for the repeated-measures t statistic. The two scores obtained for each individual are use to compute a difference score. The sample of difference scores is used to compute the mean and variance.
In words and symbols, what is the null hypothesis for a repeated measures t test? H0: uD = 0
The null hypothesis states that, for the general population, the average difference between two conditions is zero.
What assumptions must be satisfied for repeated-measures t tests to be valid? The observations within a treatment are independent. The population distribution of D scores is assumed to be normal.
Describe some situations for which a repeated-measures design is well suited. When a particular type of subject is not readily available for a study. When fewer subjects are needed. Studies where time is a factor.
How is a matched-subjects design similar to a repeated-measure design? How do they differ? They are similar in that the role of individual differences in the experiment is reduced. They differ in that there are two samples in a matched-subjects design and only one in a repeated-measure study.
The data from a research study consist of 10 scores in each of two different treatment conditions. How many individual subjects would be needed to produce the data:
a. for an independent-measures design?
20 subjects
The data from a research study consist of 10 scores in each of two different treatment conditions. How many individual subjects would be needed to produce the data:
b. for a repeated-measures design?
10 subjects
The data from a research study consist of 10 scores in each of two different treatment conditions. How many individual subjects would be needed to produce the data:
c. for a matched-subjects design?
20 subjects
The difference between the first and second measurements for each subject in a repeated-measures t test. D scores
D =X2-X1

Stats-Chapter 2

Question Answer
MEAN Sum of the data entries divided by the number of entries. (Average)
MEDIAN The middle data when the data set is sorted in ascending or descending order (Middle)
MODE The data entry that occurs with the greatest freq. (Most)
OUTCIER Data entry that is "far" removed the other entries in a data set.
SYMMETRIC DISTRIBUTATION When a vertical line is drawn thru the middle of the distribution, the two halves approx. "Mirror Image".
UNIFORM DISTRIBUTATION All entries in the distribution have equal freq.
SKEWED LEFT DISTRIBUATION Freq. dist. with a "tail" that extends to the left.
SKEWED RIGHT DISTRIBUATION Freq. dist. with a "tail" that extends to the right.
FREQ. DIST. Table that shows classes or intervals of data entries with the count of the number of entries in each class.
CUMULATIVE FREQ. Sum of freq. of that class and all freq. classes "running total"
STEM & LEAF PLOT Displays quantitative data.
STEM Data entry's left most digit(s)
LEAF Data entry's right most digit
SCATTER PLOT GRAPH Uses ordered pairs of quantitative variables on a coordinate plane
PIE CHART Circle graph that shows relationships of parts to a whole graph of data.

Fundamentals of Statistics III Sullivan Chapter 1

Question Answer
statistics The science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. In addition, statistics is about providing a measure of confidence in any conclusions.
data Facts or propositions used to draw a conclusion or make a decision. The list of observed values for a variable.
anecdotal the information being conveyed is based on casual observation, not scientific research
population the entire group of individuals to be studied
individual a person or object that is a member of the population being studied
sample a subset of the population that is being studied
statistic a numerical summary of a sample
descriptive statistics Consist of organizing and summarizing data. Describe data through numerical summaries, tables, and graphs.
inferential statistics Uses methods that take a result from a sample, extend it to the population, and measure the reliability of the result. One goal is to estimate parameters.
parameter a numerical summary of a population
the process of statistics 1. Identify the research objective
2. Collect the data needed to answer the question(s) posed in step 1
3. Describe the data
4. Perform inference
convenience samples Samples obtained through convenience rather than systematically, i.e. Internet or phone-in polls. Not based on randomness. Not considered reliable.
variables the characteristics of the individuals within the population
qualitative (categorical) variables allow for classification of individuals based on some attribute or characteristic
quantitative variables Provide numerical measures of individuals. Math operations such as addition and subtraction can be performed on the values of a quantitative variable and will provide meaningful results.
approach a way to look at and organize a problem so that it can be solved.
discrete variable A quantitative variable that has either a finite number of possible values or a countable number of possible values. The values result from counting.
continuous variable A quantitative variable that has an infinite number of possible values that are not countable, but are instead measured.
Qualitative data Observations corresponding to a qualitative variable.
Quantitative data Observations corresponding to a quantitative variable.
Discrete data Observations corresponding to a discrete variable
Continuous data Observations corresponding to a continuous variable.
nominal level of measurement The values of a variable name, label, or categorize. The naming scheme does not allow for the values of the variable to be arranged in a ranked or specific order.
ordinal level of measurement The variable has the properties of the nominal level of measurement and the naming scheme allows for the values of the variable to be arranged in a ranked or specific order.
interval level of measurement The variable has the properties of the ordinal level of measurement and the differences in the values of the variable have meaning. Zero does not mean the absence of the quantity. Addition and subtraction can be performed on values of the variable.
ratio level of measurement the variable has the properties of the interval level of measurement and the ratios of the values of the variable have meaning. Zero means the absence of quantity. Multiplication and division can be performed on values of the variable.
validity Represents how close to the true value of a measurement a measurement is. A variable is valid if it measures what it is supposed to measure.
reliability The ability of different measurements of the same individual to yield the same results.
Four levels of measurement of a variable 1. nominal
2. ordinal
3. interval
4. ratio
observational study Measure the value of response variable w/out trying to influence the value of the response or explanatory variables. Researcher observes behavior of individuals in the study w/out trying to influence outcome. Association may be claimed but not causation.
designed experiment An experiment where the researcher assigns the individuals in a study to a certain group, intentionally changes the value of an explanatory variable, then records the value of the response variable for each group.
explanatory variable a variable that explains or causes changes in the response variable
response variable a variable that measures an outcome or result of a study (variable whose changes are to be studied)
confounding Occurs when the effects of two or more explanatory variables are not separated, so any change in the response variable may be due to a variable that was not accounted for in the study.
lurking variable An explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. Lurking variables are typically related to explanatory variables considered in the study.
three categories of observational studies 1. cross-sectional studies
2. case-control studies
3. cohort studies
cross-sectional studies Observational studies that collect information about individuals at a specific point in time or over a very short period of time.
case-control studies Retrospective studies that require individuals to look back in time or require the researcher to examine existing records. Individuals that have a certain characteristic are matched with those that do not.
cohort studies Group of individuals participates in study (the cohort). Cohort observed over time. Characteristics @ individuals are recorded. Some individuals exposed to certain factors; others are not. At study end, value of response value is recorded for individuals.
census a list of all individuals in a population along with certain characteristics of each individual
random sampling the process of using chance to select individuals from a population to be included in the sample
simple random sampling every possible sample of size n from a population of size N has an equally likely chance of occurring
frame lists all the individuals in a population
sample without replacement once an individual is selected, he is removed from the population and cannot be chosen again
sampling with replacement a selected individual is placed back in the population and could be chosen again
seed provides an initial point for a random-number generator to start creating random numbers
stratified sample Obtained by separating the population into non-overlapping groups called strata and then obtaining a simple random sample from each stratum. The individuals within each stratum should be homogeneous in some way.
systematic sample Obtained by selecting every kth individual from the population. The first individual selected corresponds to a random number between 1 and k.
steps in systematic sampling 1. Approximate population size, N
2. Determine sample size, n
3.Find N/n, round down to the nearest integer – this is k.
4. Randomly select a number between 1 and k. This is p.
5. The sample will be the following individuals: p, p+k, p+2k,…p+(n-1)k
cluster sample Obtained by selecting all individuals within a randomly selected collection or group of individuals
self-selected convenience sample Individuals themselves decide to participate in a survey. Also known as voluntary response samples.
multistage sampling the use of a combination of sampling techniques
bias the results of the sample are not representative of the population
three sources of bias in sampling 1. Sampling bias
2. Nonresponse bias
3. Response bias
sampling bias the technique used to obtain the individuals to be in the sample tends to favor one part of the population over another
undercoverage the proportion of one segment of the population is lower in a sample than it is in the population
nonresponse bias individuals selected to be in the sample who do not respond to the survey have different opinions from those who do
methods to decrease nonresponse bias 1. callbacks
2. rewards and incentives
response bias the answers on a survey do not reflect the true feelings of the respondent
sources of response bias 1. interviewer error
2. misrepresented answers
3. wording of questions
4. ordering of questions or words
5. type of question (open or closed)
6. data entry error
open question a question for which the respondent is free to choose his or her response
closed question a question for which the respondent must choose from a list of predetermined responses
nonsampling errors Errors that result from undercoverage, nonresponse bias, response bias, or data entry error. May be present in a complete census of the population.
sampling error Error that results from using a sample to estimate information about a population. Occurs because a sample gives incomplete information about a population.

Frequency Distributions

Question Answer
What is a raw score A data point that has not been transformed or analyzed
What is a frequency distribution the pattern of a set of numbers by displaying a count or proportion for each possible variable
What is a frequency table a visual depiction of data that shows how often each value occurred that is how many scores were at each value.Values are listed in one column and scores are listed in second column
What is a histogram It looks like a bar graph but is typically used to depict scale data with the x-axis and the frequencies on the y-axis
What is a frequency polygon is a line graph with the x-axis representing values or midpoints of intervals and the y-axis representing frequencies. A dot is placed at the frequency and connected
What is a grouped frequency table a visual depiction of data that reports the frequencies within a given interval rather than the frequencies for a specific value
What is a normal distribution a specific frequency distribution that is a bell shaped symmetric uni-modal curve
What does positively skewed mean the distribution tail extends to the right in a positive direction
What is the floor effect a situation in which a constraint prevents a variable from taking on values below a certain point
What is the ceiling effect a situation in which a constraint prevents a variable from taking on values above a given number

intro to stats and research design

Question Answer
What is a variable any observation of a physical, attitudinal or behavioral characteristic that can take on values
What is a discrete observation can take on only specific values–whole #– no other values can exist between these numbers
What is a continuous observation can take on a full range of values–number out several decimal places– an infinite number of potential values exists
What is nominal variables a variable used for observations that have categories or names as their values
What is an ordinal variable a variable used for observations that have rankings such as 1st, 2nd, 3rd…
What is an interval variable a variable used for observations that have numbers as their values and the distance or interval between pairs of consecutive numbers is assumed to be equal
What is a ratio variable a variable that meets criteria for an interval variable but also has a meaningful zero point
What is a scale variable A variable that meets the criteria for an interval or a ratio variable
What is a level a discrete value or condition that a variable can take on
What is an independent variable a variable that we either manipulate or observe to determine its effects on the dependent variable
What is a good variable reliable and complete
what is a dependent variable the outcome variable that we hypothesize to be related to or caused by changes in the independent variable
What is a confounding variable any variable systematically varies with the independent variable so that we cannot logically determine which variable is at work also called a confound
What is reliability Refers to the consistency of a measure
What is validity the extent to which a test actually measures what was intended to measure
What is hypothesis testing the process of drawing conclusions about whether a particular relation between variables is supported by the evidence
What is operational definition specifies the operations or procedures used to measure or manipulate a variable
What is correlation an association between two or more variables
What is an experiment a study in participants are randomly assigned to a condition or level of one or more independent variables which par
What is a random assignment every participant has an equal chance of being assigned to any group or experimental condition in the study
What is an between–groups research design participants experience one, and only one ,level of the independent variable
What is a within–groups research design the different levels of the independent variable are experienced by all participants in the study also called a repeated- measure research design

For Life Sciences

Question Answer
a statistic is? a summary of data
a field of statistics is? the collecting, analysing and understanding of data measured with uncertainty
what is a categorical variable? one which is measured descriptively eg: hair colour or major at university
what is a define quantitative variable? one which is measured numerically: time it takes to get home from work
graphical summary of one categorical variable? bar graph
graphical summary of one quantitative variable? histogram or boxplot
how to graphically summarise relationship between two categorical variables clustered bar chart or jittered scatterplot
how to graphically summarise relationship between two quantitative variables scatterplot
how to graphically summarise relationship between one categorical and one quantitative variable comparative boxplots or comparative histograms
what to look for in a graph location, spread, shape, unusual observations
define 'location' graphically where most of the data lies
define 'spread' graphically variability of the data, how far apart or close together it is
define 'shape' graphically symetric, skewed etc
how to numerically summarise one categorical variable table of frequencies or percentages
how to numerically summarise one quantitative variable location: mean or median;
spread: standard deviation or inter quartile range
formula for mean? xhat=1/N times summation of xi;
preferable for approximately normal data
formula for Median? M=midn or (midn1+midn2)/ 2;
less affected by outliers therefore used for outlier ridden data
formula for standard deviation? s=v1/N-1 times summation of ((xi-x) squared);
preferable for approximately normal data
formula for inter quartile range? Q3 – Q1= IQR;
less affected by outliers therefore used for outlier ridden data
which numbers are needed to create a five number summary? minimum, Q1, median (sometimes mean included), Q3, maximum
an outlier is? more than 1.5 x IQR lower than Q1;
more than 1.5 x IQR higher than Q3
define linear transformation transformation of a variable from x to xnew
examples of linear transformation use change of units;
use of normal assumption therefore to find 'z' scores
formula for linear transformation? xnew=a+bx
formula for new mean once linear transformation has occurred? xbarnew=a+bxbar
formula for new median once linear transformation has occurred? Mnew=a+bM
formula for new standard deviation once linear transformation has occurred? snew=bs
formula for IQR once linear transformation has occurred? 1QRnew=bIQR
explain density curves area under the curve in any range of values is the proportion of all observations that fall within that range for a quantitative variable;
like a smoothed out histogram describes probabilistic behaviour
total area under density curve equals? 1
explain the normality assumption normal curve can be used if a histogram looks like a normal curve;
termed 'reasonable';
must start at 0 and end at 0
how does a normal quantile plot confirm the normality assumption? if in a straight line, or close to it, then normal and assumption is reasonable
define the 68-95-99.7 rule 68% of results will be within 1 standard deviation of the mean;
95% of results will be within 2 standard deviations of the mean;
99.7% of data will be within 3 standard deviations of the mean
symbol for mean of a density curve? ?
symbol for standard deviation of density curve? ?
normal distribution short hand X = random variable;
N = normal distribution;
first number in brackets = mean;
second number in brackets = standard deviation
explain the standard normal variable example of set out: P = (n>Z);
corresponds to the area under the curve of the corresponding region;
will always be to the left of Z
use of the standard normal distribution table to find P: Z found along x and y axis of table;
to find Z: P found in results of table;
table ordered from smallest to largest
reverse use of the standard normal distribution table eg of how set out: P(Z<c)= n;
c = right of Z
X =? N(?,?)
formula and use of standardising transformation Z= (X-?)/?;
used when distribution is not N(0,1)and so it needs to be altered
relationships between variables best explored through? why? scatterplot;
can get a sense for the nature of the relationship
how to define the nature of relationship? existent/ non-existent;
strong/ weak;
increasing/ decreasing;
linear/ non-linear
outliers in scatterplots? represent some unexplainable anomalies in data;
could reveal possible systematic structure worthy of investigation
define casual relationship relationship between two variables where one variable causes changes to another
define the explanatory variable explains or causes the change;
written on x-axis
define the response variable that which changes;
written on y-axis
useful numbers for two quantitative variables? correlation or regression
formula for the correlation coefficient? r= 1/(n-1) times summation (xi-xbar/sx)(yi-ybar/sy)
define xi or yi axis values of corresponding letter
define xbar and ybar mean of axis values of corresponding letter
define sx and sy standard deviation of axis values of corresponding latter
state the properties of r is the correlation coefficient;
numerically expresses relationships;
if close to 1 = strong positive linear relatoinship;
if close to -1 = strong negative linear relationship;
close to 0 = weak or non-existent linear relationsip
state the cautions about the use of r only useful for describing linear relationships;
sensitive to outliers
what is least squares regression used for? to explain how a response variable is related to explanatory variable;
focus positive = increase;
focus negative = decrease
mathematical representation of regression b1=r(sy/sx);
facts about b1 b1 = r = correlation coefficient = slope
how to determine the strength of a regression rsquared = syhat/sy;
r-squared is the % variation in y explained by linear regression
state the basic regression assumptions y=b_0+b_1+error;
error corresponds to random scatter about line;
this is checked by residual plots
formula for residual plots? y – y-hat
residual plot is a scatter plot of? residuals(y axis) against explanatory variable(x axis)
interpreting residual plots focus on pattern;
there should be no pattern;
if there is a pattern then the linear assumption is incorrect
what to do if any residuals stand out? they are either an influential point and to be left alone;
or they are an outlier and to be removed if affecting results too much
how to attach special cause to an outlier analyse if recording error;
refit line;
if remove then justify why (down weight influence)
translated residuals (removing the outlier) should have what effect? spread pattern
any 0 intercepting points on a residual plot are? 1 standard deviation from mean
if parabola presents after outlier removal? x-hat assumption not appropriate
if spread doesn't vary far from 0? there is no pattern
when to remove outlier if influences results
when will outlier not influence results? when close to mean;
– will have little influence on the gradient and intercept of fitted line
what are lurking variables? variables that can influence results which have not been taken into account
to account for lurking variables you? analyse the covariance
state the strategy for using data in research? identify question to be answered;
identify population studied;
locate variables: which one is IV and DV, explanatory and response;
obtain data which answers question
define anecdotal data haphazard collection of data;
unreliable for drawing conclusions
define available data use of data that has come from another source
possibly obtained for a reason other than the one you intend to use it for
define collect your own data use of a census, a survey, or observations from an experiment
define census use of whole population to obtain data
define sample use of a randomised selection of the population to represent the whole;
smaller and easier to do than a census
explain observational study no variables are manipulated or influenced;
data obtained from population as it is
explain experiment variables are influenced or manipulated so that responses can be noted and recorded;
usually a control group utilised
control group = does not undergo treatment, act as a comparison group
explain causation a response that is the result of another variable eg: moon's movements CAUSE the tides
common response in terms of variables means? explanatory variable causes the response variables;
response variables are associated to one another
causation in terms of variables means? explanatory variable causes response variable;
response variable and explanatory variable are associated
confounding in terms of variables means? two or more explanatory variables are present and associated to one another;
all explanatory variables could have caused response variable by themselves or together;
explanatory variables called confounded causes
why an experiment? allows demonstration of causation;
intervention can be used to determine whether or not effect is present
state the principles of experiment design subjects, treatment, factor, levels, response variable
definition of subjects in terms of experiment design things upon which experiment is done;
eg: people, animals, chemicals etc
definition of treatment in terms of experiment design circumstances which applied to subjects;
eg: given medication
definition of factor in terms of experiment design variables that are apparent within different treatments;
eg: given medication or placebo
definition of levels in terms of experiment design formation of treatments determined by which combination of factors used;
eg: dosage of medication/how many doses per day vs dosage of placebo/ how many placebo taken
definition of response variable in terms of principle of experiment design the variable which will answer the question
variable of most interest that is measured on subject after treatment
explain a principles of experiment summarisation table Factors on x and y axis;
levels in first columns and rows;
rest of table = number allocated to that particular treatment group
state the three principles all experiments must follow compare two or more treatments where one is the control;
random assignment of subjects to treatments;
repeat the experiment on numerous subjects (for reduction of confounding variables)
how to randomise allocate all subjects a random number;
order subjects in accordance to those random numbers (smallest to largest, or largest to smallest);
form treatments by selecting subjects in a systematic pattern applied to the random numbers representing subjects
define control group: different from all other treatments as it only pretends to apply explanatory variable;
is the group that the results are compared against
explain random comparative experiment subjects randomly allocated one of several treatments;
responses compared across treatment groups
explain matched pairs design break subjects with similar properties into pairs;
one of two treatments applied to one of each pair;
can produce more precise results;
used in before and after, and twin studies
explain random block design block = group of subjects known before experiment to be similar in some way that would affect response;
randomised assignment of treatments to subjects within block;
matched pairs is special case of this
experimental caution: appropriate control only variant across treatment(s) is/are factor(s)
experimental caution: beware of bias administrator of experiment can present bias towards certain treatment to certain subjects
double blind accounts for this: neither subject nor administrator know which treatment applied
experimental caution: repetition of entire subjects all steps for experiment are performed for all subjects in all treatments
experimental caution: realistic experiment experiment needs to duplicate real-world conditions

2nd Quiz Study Guide

Question Answer
Traditionalism 1) Social Science is not a hard Science
2) Humans are too complex for quantification
3) Historical, anecdotal, journalistic approach
Behavioralism (aka Basic Research) 1) There are regularities to permit generlizations
2) Explicit, Replicable, neutral methods
3) Priority: hypothesis testing to build theories
Goal: highly predictive interlocking theories
Applied Research (Post-Behaviorialism or Policy Analysis) Accepted the merits of explicit, rigorous, replicable scientific methods
Changed the goal from building theory to addressing practical/applied/policy questions
And acknowledged the role of values in setting research priorities
Classic Model of the Scientific Process 1) Theory
2) Deduce Hypothesis from theory
3) Design Study and operationalize concepts
4) Conduct the Study (collect the data)
5) Analyze data to accept/reject hypothesis
6) Support, modify, or reject initial theory
Model of Applied Research Begin with specific, practical issue
Devise Testable research question
– Design study and operationalize concepts
– Conduct the study (collect the data)
– Analyze data to accept/reject hypothesis
Use results to inform decision-maing
Hypothesis A testable statement of the relationship between two or more variables
Theory A set of logically related propositions intended to explain a range of phenomena
Main Structure of Research Reports Intro (Problem Area; Issues)
Literature Review
Discussion and Conclusion
The Strong Lit Review Primary (not secondary) sources
Nonelectronic searches
Contact leading researchers
Add unpublished/forthcoming research
Diagram/model key relationships
Use elements of meta-analysis
Meta-Analysis Steps (1) Clear Statement of Hypothesis
(2) Explicit and Replicable Lit Searches
(3) Set Variables for Coding Studies
(4) Analyze predictors of the results – Certain factors associated with certain outcomes?
Good Individual Questions Short as possible
Shared, simple vocab
Unbiased Language/premises
Unambiguous Answers
Confined to one issue
Exhaustive/Exclusive Categories
Good Format and Overall Flow Brief Smooth Intro
Easy Non-threatening start
Early closed-ended questions
Move from general to specific
Delay sensitive issues until later
Demographics last
Fair Framing
Short transitions
Consistent series answer format
Census vs Sample Use Census if feasible, affordable and not often; but samples usually more practical
Random vs Nonprobability Use random samples unless desperate
Nonprobability Sampling Convenience
Random sampling includes Simple (every nth)
Stratified (proportionate or non proportionate)
Simple Random Sampling Each sample chosen independently and randomly from the sampling frame
Systematic Selecting every nth item from a list (from a random point)
Stratified Draw random samples within groups if easier or to over sample a group intentionally.
Proportionate or Disproportionate
Response Rate Determinants Costs – Est. Lengths / Time / Complexity
Benefits – Enjoyable / Important/ Satisfaction
Evaluating a Sample Size Overall precision (CI) needed
Depth of Subgroup analysis
As well as the research budget
95% Confidence Interval – Sample 100 +/- 10%
95% Confidence Interval – Sample 600 +/- 4%
95% Confidence Interval – Sample 1100 +/- 3%
Nominal Categories by names only (region, religion, sex)
Ordinal Categories can be ordered on a single dimension (agree/disagree; highest degree earned; young, middle, old)
Interval Increments are consistent but no absolute zero (Fahrenheit, year of birth)
Ration Absolute Quantities (amount of dollars, inches, siblings, years, pounds) ask yourself…can it be TWICE AS MUCH?
Principles of Data Analysis (1) Good Data are a prerequisite
(2) All Statistics are reductionist
(3) Context dictates interpretation
(4) Avoid Exaggerating small gaps (Bill hates this!)
(5) Correlation DOES NOT equal Causation
(6) Start with Univariate Analysis
Univariate Nominal Variables Mode = Plurality but not always a majority
Percentages = usually round %
Univariate Nominal Variables – Interpretation Pitfalls Misleading Pictograms
Confusing absolute and relative %
Misinterpreting nominal nodes as if they were midpoint/averages
Misleading/simplified composites from nominal and other modes
n Univariate Sample size
N Univariate population size
Measures of Central Tendency (1) Mean
(2) Median
(3) Trimmed Means
Mean Sum divided by # of cases; very sensitive to extreme values. x with line on top is sample mean; mu which looks like a u is for population mean
Median 50th Percentile; half of the cases below; half above; totally insensitive higher and lower values
Trimmed Means Discard a percent of the highest/lowest values, top and bottom five percent…used in Olympic scoring
Measures of Dispersion (1) Range
(2) Standard Deviation
(3) Interquartile Range
Range Highest to lowest value; crude measure of dispersion
Standard Deviation (Equation) Square root of the sum of the squared difference of each case from the mean divided by the number of cases
Standard Deviation Shows the range of the middle 68% of cases in a normal curve, otherwise it only tells relative dispersion
IQR 25th to 75th percentiles; range of the middle 50% of all cases; easy to explain.
Smaller IQR/SD Scores Tight cluster of cases
Measure of Shape Skewness
Skewness Asymmetrical distribution skewed positively if a few high scores pull the mean above median; reverse (mean below the median) reflects a negative skew.
The Normal Curve The Bell Shaped Curve
Central Limit Theorem
+/- 1 Std Dev 68.3% of all cases
+/- 2 Std Dev 95.4% of all cases
+/- 3 Std Dev 99.7% of all cases
Descriptive Statistics Data of the whole relevant population – treat results as real.
Inferential Statistics Used with sample because results are estimates. Keeps us from jumping to conclusions and treating sample estimate as more precise than they really are.
Population based statistics are… Descriptive Only
Sample based statistics are… Inferential and descriptive
Formula for 95% CI around a proportion… (Sqr Root of P multiplied by (1 minus P) divided by Sample Size) mulitplied by 1.96
Confidence Intervals for Means Formula Std Dev of Sample divided by the sqr root of sample size, then multiplied by 1.96
When to use T-Test Comparing means of two groups…
(1) using sample data (derived from random sampling)
(2) using experimental data (derived from random assignment)
T-Test Steps (1) State the Null Hypothesis
(2) State Research Hypothesis
(3) State Decision Rule (Probability Level)
(4) Assume Equal Variance – Unless F-Test is significant
(5) Reject or fail to reject the null
Easiest Null for T-Tests There is no difference in the mean (dependent variable) of (group 1) and (group 2)
T-Test Interpretation (1) Prevents 'jumping to conclusions' when differences in two means may just be random variation
(2) Statistical significance is not the same as substantive significance
(3) Easy to get stat. sig. with large samples, hard with small samples
T-Test and Population studies without randomized data No need for T-Test, because it is inferential.
Difference in steps between Chi Square and T-Test T-Test adds the F-test step.
Similarities between Chi Square and T-Test (1) Stat. Sig. does NOT necessarily mean it is important or consequential.
(2) If NOT stat. sig. remember we never prove the null we just fail to reject the null.
(3) A small sample may not be Stat Sig, but could be Stat Sig in a larger sample
Three Elements of Causal Inference (1) X & Y covary
(2) X precedes Y
(3) Rule out the Z's
Post Hoc Fallacy Fallacy of concluding that since change in Y followed X, it was caused by X.
Antecendent Variables Before X (Z->X->Y)
Intervening Variables Between X and Y (X->Z->Y)
Campbell and Stanley's Notation System O = Observations (measures) of Y
Left to Right = Chronological Order
Each Row = One Group of Subjects
Single Group posttest only X O
Single Group pretest-posttest (before and after design) O X O
Static Group Design O X O
History External event during period
Maturation Subjects change over time
Practice Familiarity with the measure
Instrumentation A changed measure
Regression to the mean If subjects are chosen due to extreme scores, they tend to regress to the mean on posttest
Selection Groups different from start
Intragroup history unique group event
Mortality groups differ in attrition
What to do with Attrition… (1) Omit pretest scores of lost subjects;
(2) Omit all data of lost 'types' from all groups
(3) Match by statistical weighting
(4) Analyze by "intention to treat" (i.e. include dropouts)
Between Group Reactivity (1) Spillover (My buddy is sick and I know if I give him a lime he will get better)
(2) Compensatory rivalry (controls try harder)
(3) Resentful demoralization (Controls try less…I never get picked so I will just suck)
Placebo Effect Subject expectancy to get better and psychologically they do. (Reactivity)
Novelty Effect X works because its new. Innovation effect. Short term effect.(reactivity)
Guinea Pig Effect Subjects act differently because they feel that they are under surveillance. Evaluation Apprehension – I know I am under
Demand Effect Think they know what authority wants of them. The real pills are handed out with more conviction, requires double blind effect to limit.
Social desirability Reflexivity – Political Correctness, Societal pressures/inhibitions, I am supposed to act a certain way.
Hawthorne Effect Electric Plant Light Dimming Example. Refers to reactivity in general.
Heisenberg Effect Act of measuring something changes what you're measuring
Two Elements of a true experiment (1) Random Assignment of subjects to groups
(2) Random Assignment of Treatments to groups
Source of power of experiments Comparability of the groups – the only real difference is one gets X, the other doesn't. Otherwise the two groups are identical.
Classic Experimental Design R) O X O
R) O O
Posttest Only Experiment R) X O
R) O
Factorial Design R) O Xa Xb O
R) O Xa O
R) O Xb O
R) O O
Complex X Many ingredients in X
Multiple Ys Studies often measure the impact of X on several Ys.
Compensatory Rivalry Controls try harder
Resentful demoralization Controls try less
Spillover effects/diffusion Some X spills over to controls
Strategies to minimize reactivity (1) deceit
(2) obscure / mislead
(3) use placebo
(4) double blind
(5) time (hope they forget the study)
Placebo A dummy treatment given to the controls to 'hold constant' the impact of their expectations. Common in medical studies; not always possible.
Natural Experiment Both subjects and X were randomly assigned without a researcher's intervention; term is also sometimes used less strictly to refer to a close natural approximation even if lacking in randomization
Big Four Categories of Validity (1) Measurement Validity
(2) Internal Validity
(3) Statistical Conclusion Validity
(4) External Validity
External Validity Generalizability; the essential yet unavoidably subjective judgment about the extent to which it is reasonable to generalize/extrapolate the findings of one study to other places, subjects, times, etc.
How to strengthen external validity (1) Test subjects representative of the subjects you want to generalize to
(2) replications in varied settings
(3) Consistent results in varied tests
Limitations of Experiments (1) Unethical or illegal to withhold X
(2) Unethical or illegal to risk trying X
(3) Unaffordable to finance in field
(4) Infeasible to enforce X vs no X
(5) Impractical to field test outside a lab
Quasi-Experimental Designs Commonly means any clever design lacking randomized control groups
Causal-Comparative Designs Studies that seek to infer causality using comparison groups without randomly assigned subjects
Primary threat of Internal validity when no randomization Selection
NEC Nonequivelent Comparison Group Design
Nonequivalent Comparison Group Designs O X O
Retrospective match / Ex post facto design Creating a comparison group later by finding and matching subjects similar to those who previously got exposed to X.
Time Series Designs X may be short term or enduring. Top internal validity threat is history. Trend line makes it superior to O X O.
Simple Interrupted Time Series O O O O O O X O O O O O O
Reiterative Time Series O O X O O X O O X O O
Comparison Time Series O O O O O X O O O O O
Multiple Time Series O O X O O O O O O
Panel Repeated data tracking same people; valuable but expensive, can produce reactivity
Cross-sectional data Time series with new random samples from same population. Shows net change but masks the rest.
Deceptive Time Series Charts Using a truncated base plus narrow or wide axis.
Retrospective pretests Proxy pretests – recollections used for pretest measure.
Danger of time series inferences from a single survey Can not infer age = time. Bill used the Navy Officer surveys of high ranking and low ranking officers, infering that low ranking officers will think like high ranking officer when they get there.
Correlational Designs Typically using a single survey to try to "statistically control" for alternative explanations, often using multiple regression. Issues with selection.
Aggregate Data Units of analysis are groups, such as precincts, cities, states.
Ecological Fallacy Drawing individual level inferences from aggregae-level correlations.
Check list of Empirical Studies (1) Theory Building or Applied Research
(2) Causal or Descriptive
(3) Exact Hypothesis
(4) Independent Variable(s)
(5) Dependent Variable(s)
When Something is NOT Statistically Significant Do not bring it up. Consider the dispersion between the groups.
T-Test Analysis Analysis is black and white, it is or it isn't stat. sig. If you hit .05, you have a slight relationship. State just that, a slight relationship.
Grouping Ratios Becomes Ordinal
Central Tendency Mean, Median, Trimmed Mean
Extreme Lopsided Distribution does what to Confidence Intervals? Becomes Smaller
At what level is .012 statistically significant? It is Stat. Sig at .05, but NOT at .001 or .01.
True or False – Standard Deviation is a measure of Central Tendency? False
What is the biggest threat to NEC design? Selection
What is the biggest threat to Time Series Designs? History
What does comparing results to go good existing records? Concurrent Validity
What are two elements of dispersion? IQR and SD
Two Types of Empirical Validity (1) Concurrent Validity
(2) Predictive Validity
Concurrent Validity Testing a measure against existing data believed accurate. (Empirical)
Predictive Validity Testing a measure designed to predict future outcomes by the actual success of its forecasts. (Empirical)
Subjective Validity (1) Face Validity
(2) Content Validity
Face Validity Operationalizing the usual usage of a word in a reasonable way.
Content Validity Operationalizing the full scope of the entire intended concept and not just a part of it.
Multiple Measures (Triangulation) Assessment using a variety of indicators (not just one)
Unobtrusive Measures No survey – Measuring actual behavior – not just self-reported behavior.
Validity Accuracy
Reliability Consistency
According SPSS Scale Measurements are… Interval and Ratio
Content Analysis Steps (1) Define exact scope of the study (dates, sources, search strategy); (2) Operationalize variables to code; (3) Refine coding system & test reliability; (4) Code the content under study; (5) Analyze Patterns
Is Content Analysis Descriptive or Causal? By itself it's descriptive. If part of a study it can be Causal.
Intercoder Reliability Test Where independent coders, at least 2, evaluate a characteristic of a message or artifact and reach the same conclusion. Must have atleast 80% rate.
What to worry about in analyzing patterns in Content Analysis… Caution in drawing inferences.
Types of Operationalize Variables to Code (1) Specific Word Count; (2) Sources Quoted; (3) Topics; (4) Overt Visual Image; (5) Voice Inflections; (6) Subtle Themes; (7) Global Code
Uses in Content Analysis History, Public Relations, National Intelligence, Lobbying, Detective Work, Mass Communication, Linguistics
Content Analysis Systematic analysis of patterns in communications
When to use inferential Stats? Randomized – ALWAYS! Population – Use if group can be used as a sample.
Qualitative Research More exploratory, small purposive "samples", open-ended semi-structured interviews, more time per subject, narrative format, note researchers impact.
Quantitative Research More defined, specific hypothesis testing, large random samples, close-ended instruments, less time per subject, data-based reports, distant/unacknowledged.
Matching Qualitative and Quantitative Start with Qualitative research to define the issues/vocabulary, to help generate/refine research questions, test a draft questionnaire. Then conduct quantitative study. Use qualitative to explore puzzles found.
Purpose of Focus Groups In-depth probing of views (pre-existing); Reactions to new stimuli (new responses); Group brainstorming (new idea generation);
Focus Groups Format Recruit relevant participants; 10-12 people, 1.5 to 2 hours long, audio/video taped, semi-structured format w/ open ended agenda questions, neutral moderator.
The right number of Focus Group meetings Depends on resources, how much is at stake, but at least more than one!
Bivariate Regression One X, Correlation Coefficient = r, Coefficient of Determination = r2, Y=a + bX
Multiple Regression Two or more Xs, Multiple Correlation Coefficient = Multiple R, Multiple Coefficient of Determination = Multiple R2, Y=a+b1X1 + b2X2…b#X#
Y = a + bX a=intercept; b=slope
Multiple Correlation Coefficient Multiple R
Multiple Coefficient of Determination R (squared)
Unstandardized Coefficients in Multiple Regression Equations Symbol: b; Unstandardized Partial regression coefficient/slop; slope change measured in original units;
How to interpret Unstandardized Coefficients in Multiple Regression Equations If b is -3, subtract 3 years for every pack of cigarettes.
Standardized Coefficients in Multiple Regression Equations Symbol: B (Greek Beta); Beta or beta weight or standardized partial regression coefficient/slope; in units standardized as Z-scores (Std. Dev. Units) to allow comparisons.
How to interpret Standarized Coefficients in Multiple Regression Equations Use for ranking variables: The higher the beta the more powerful the X.
Multicollinearity Overlap of variables
Dummy Variable When there is a dichotomy within variables, this process enables the portion of the variable not being measured to not be calculated.
r Correlation Coefficient
Correlation Coefficient (r) Summarizes the strengths of the linear relationship between two scale variables. Perfect Positive Correlation 1.0 (Left up to Right); Perfect Negative Correlation -1.0 (Left down to Right). 0 = No correlation.
r(squared) Coefficient of Determination
Coefficient of Determination (r2) Indicates strength of relationship but has no negative sign. Yields lower but more intuitive score.
Role of Correlation Coefficient and Coefficient of Determination Both summarize (in slightly different ways) the strength of the relationship between two scale variables. Neither is inferential.
Feature of Correlation Coefficient Shows strength and direction, though somewhate inflated.
Feature of Coefficient of Determination Shows strength and proportion of variation explained, but lacks direction sign.
Homoscedasticity Even variation around the slope (Homo is straight)
Heteroscedasticity Uneven Variation on the slope (Hetero is balled up)
Bivariate Analysis of Outliers Could be bad data, but may provide lesson learned data for how to do it right or very bad.
Standard Error of the Estimate (SEE) Applies lines that show what falls within the 68% of the regression line.
Is Standard Error of the Estimate Inferential? Not just no, but hell no!
Aggregate Data Units of analysis are collectivities (i.e. counties, states, countries)
Ecological Fallacy Drawing individual-level inference from a pattern in aggregate data.

Exam 1 Kellogg

Question Answer
Sample individuals selected to represent the population
Population all possible individuals which a study may apply to.
parameter numerical value that replaces characteristic of a population
Statistic Numerical value that describes a characteristic of a sample
Variable Characteristic that changes within the same individual or between different individuals
Sampling Error Numeric difference that exists between the statistic and the parameter
Nominal No real quantitative value: Numbers simply replace name
Ratio Has an absolute zero
Ordinal variable has ordered categories that are not equidistant and does not have a point of absolute zero: Order is not equal distance
Interval Variable has ordered numeric categories that are equal but does not have a point of absolute zero
Experimental method cause and effect
Correlational (Observational) method naturally occurring relationship between two or more variables
Dependent Variable measurable/observable for group differences to access the effect of Independent variable
Independent variable variable manipulated (or controlled) by experimenter.
Operational definition defining a variable by manner in which the variable is used or measured.
Inferential statistics procedure used to generalize characteristics of sample to population
What two graphs can be used for Ordinal Bar or Pie graph
What two graphs can be used for Nominal Bar or Pie graph
What two graphs can be used for Interval Freq Poly or Histogram
What two graphs can be used for Ratio Freq Poly or Histogram
Constant Does NOT change from one individual to the next
Theory Set of ideas used to explain the functioning of and making predictions about a relationships or set or relationships
Hypothesis Specific, testable prediction about the relationship between two or more variables
Experimental control using random assignment and holding extraneous variables constant
Experimental group receives the treatment level of the independent variable, get some sort of active treatment
Control Group Does NOT receive the treatment of the independent variable: Gets no treatment or placebo
Confounding variable Uncontrolled variable that can systematically vary with independent, masking or enhancing the true effect or the independent variable.
Mesokurtic normal distribution
Leptokurtic Tall and Thin
Platykurtic flat with little elevation
Skew where does the tail extension
Kurtosis odd shape
Give examples of Nominal Number given in place of name: Male/Female
Give examples of Ordinal Order Matters; numbers mean different things: Ranks/Place finishes
Give example of Intervals has both nominal and ordinal scales, but the numbers are equal (2 and 3 is the same distance as 3 is to 4)
Give example of Ratio Has all the characteristics as nominal, ordinal and intervals except it has an absolute zero: Kelvin scale. Zero means Zero

Test 3

Question Answer
uniform distribution values spread evenly over the range of possibilities
Standard normal distribution normal probability distribution with mean=0 and stand. deviation=1
normal cdf probability
inv.norm. z-score
z score formula x-mean/stand. dev.
Standard error of the mean o
mean/square root of n
Sample of Values _
x-mean/stand.Dev./Sqare root of n
Point Estimate single value(or point) used to approximate a population parameter
Confidence Interval range of values used to estimate the true value of a population parameter
Confidence level probability 1-a
critical value number on the borderline seperating sample statistics that are likely to occur from those that are unlikely to occur(z
Margin of error denoted by E,maximum likely difference
Round off rule-CI 3 digits
Round off rule-Sample size larger whole number, can't have half a person
sample size for population mean n=[za/2o
Degrees of Freedom number of sample values that can vary after certain restrictionsw have been imposed on all data values
Degrees of freedom formula n-1
Zinter mean is known and n>30
Tinter mean NOT known and n>30

stats test 3

Question Answer
what is the mean of z score distribution 0
what is the standard deviation of z score distribution 1
under what conditions will our distribution be normal? as the sample size increases, the shape of the distribution becomes more like the normal curve
calculate z-scores
what information effects standardized error? sample size, variability
the larger the sample size, the smaller the standard deveiation of the distribution of means-standard error
6 steps of hypothesis testing
1.identify the pouplations, distribution&assumptions then choose the appropriate hypothesis test
2.state the null&research hypotheses in words&symbolic notation
3.deturmine the characteristics of the comparison distributio
4.deturmine the critical values or cutoffs that indicate the points beyond which we will reject the null hypotheis
5.calculate the test statistic
6.decide whether to reject or fail to reject the null hypothesis