Question | Answer |
---|---|
The dependent variable is measure two or more times for each individual in a single sample. The same group of subjects is used in all of the treatment conditions. | Repeated-Measures Design or Within-Subject Design |
Each individual in one sample is matched with an individual in the other sample. The matching is done so that the two individuals are equivalent with respect to a specific variable the researcher would like to control. | Matched-subjects design |
For a research study comparing two treatment conditions, what characteristic differentiates a repeated-measures t statistic? | For a repeated-measures design, the same group of individuals is tested in both of the treatments. An independent-measures design uses a separate group for each treatment. |
Describe the data used to compute the sample mean and the sample variance for the repeated-measures t statistic. | The two scores obtained for each individual are use to compute a difference score. The sample of difference scores is used to compute the mean and variance. |
In words and symbols, what is the null hypothesis for a repeated measures t test? | H0: uD = 0 The null hypothesis states that, for the general population, the average difference between two conditions is zero. |
What assumptions must be satisfied for repeated-measures t tests to be valid? | The observations within a treatment are independent. The population distribution of D scores is assumed to be normal. |
Describe some situations for which a repeated-measures design is well suited. | When a particular type of subject is not readily available for a study. When fewer subjects are needed. Studies where time is a factor. |
How is a matched-subjects design similar to a repeated-measure design? How do they differ? | They are similar in that the role of individual differences in the experiment is reduced. They differ in that there are two samples in a matched-subjects design and only one in a repeated-measure study. |
The data from a research study consist of 10 scores in each of two different treatment conditions. How many individual subjects would be needed to produce the data: a. for an independent-measures design? |
20 subjects |
The data from a research study consist of 10 scores in each of two different treatment conditions. How many individual subjects would be needed to produce the data: b. for a repeated-measures design? |
10 subjects |
The data from a research study consist of 10 scores in each of two different treatment conditions. How many individual subjects would be needed to produce the data: c. for a matched-subjects design? |
20 subjects |
The difference between the first and second measurements for each subject in a repeated-measures t test. | D scores D =X2-X1 |
Stats-Chapter 2
Question | Answer |
---|---|
MEAN | Sum of the data entries divided by the number of entries. (Average) |
MEDIAN | The middle data when the data set is sorted in ascending or descending order (Middle) |
MODE | The data entry that occurs with the greatest freq. (Most) |
OUTCIER | Data entry that is "far" removed the other entries in a data set. |
SYMMETRIC DISTRIBUTATION | When a vertical line is drawn thru the middle of the distribution, the two halves approx. "Mirror Image". |
UNIFORM DISTRIBUTATION | All entries in the distribution have equal freq. |
SKEWED LEFT DISTRIBUATION | Freq. dist. with a "tail" that extends to the left. |
SKEWED RIGHT DISTRIBUATION | Freq. dist. with a "tail" that extends to the right. |
FREQ. DIST. | Table that shows classes or intervals of data entries with the count of the number of entries in each class. |
CUMULATIVE FREQ. | Sum of freq. of that class and all freq. classes "running total" |
STEM & LEAF PLOT | Displays quantitative data. |
STEM | Data entry's left most digit(s) |
LEAF | Data entry's right most digit |
SCATTER PLOT GRAPH | Uses ordered pairs of quantitative variables on a coordinate plane |
PIE CHART | Circle graph that shows relationships of parts to a whole graph of data. |
Fundamentals of Statistics III Sullivan Chapter 1
Question | Answer |
---|---|
statistics | The science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions. In addition, statistics is about providing a measure of confidence in any conclusions. |
data | Facts or propositions used to draw a conclusion or make a decision. The list of observed values for a variable. |
anecdotal | the information being conveyed is based on casual observation, not scientific research |
population | the entire group of individuals to be studied |
individual | a person or object that is a member of the population being studied |
sample | a subset of the population that is being studied |
statistic | a numerical summary of a sample |
descriptive statistics | Consist of organizing and summarizing data. Describe data through numerical summaries, tables, and graphs. |
inferential statistics | Uses methods that take a result from a sample, extend it to the population, and measure the reliability of the result. One goal is to estimate parameters. |
parameter | a numerical summary of a population |
the process of statistics | 1. Identify the research objective 2. Collect the data needed to answer the question(s) posed in step 1 3. Describe the data 4. Perform inference |
convenience samples | Samples obtained through convenience rather than systematically, i.e. Internet or phone-in polls. Not based on randomness. Not considered reliable. |
variables | the characteristics of the individuals within the population |
qualitative (categorical) variables | allow for classification of individuals based on some attribute or characteristic |
quantitative variables | Provide numerical measures of individuals. Math operations such as addition and subtraction can be performed on the values of a quantitative variable and will provide meaningful results. |
approach | a way to look at and organize a problem so that it can be solved. |
discrete variable | A quantitative variable that has either a finite number of possible values or a countable number of possible values. The values result from counting. |
continuous variable | A quantitative variable that has an infinite number of possible values that are not countable, but are instead measured. |
Qualitative data | Observations corresponding to a qualitative variable. |
Quantitative data | Observations corresponding to a quantitative variable. |
Discrete data | Observations corresponding to a discrete variable |
Continuous data | Observations corresponding to a continuous variable. |
nominal level of measurement | The values of a variable name, label, or categorize. The naming scheme does not allow for the values of the variable to be arranged in a ranked or specific order. |
ordinal level of measurement | The variable has the properties of the nominal level of measurement and the naming scheme allows for the values of the variable to be arranged in a ranked or specific order. |
interval level of measurement | The variable has the properties of the ordinal level of measurement and the differences in the values of the variable have meaning. Zero does not mean the absence of the quantity. Addition and subtraction can be performed on values of the variable. |
ratio level of measurement | the variable has the properties of the interval level of measurement and the ratios of the values of the variable have meaning. Zero means the absence of quantity. Multiplication and division can be performed on values of the variable. |
validity | Represents how close to the true value of a measurement a measurement is. A variable is valid if it measures what it is supposed to measure. |
reliability | The ability of different measurements of the same individual to yield the same results. |
Four levels of measurement of a variable | 1. nominal 2. ordinal 3. interval 4. ratio |
observational study | Measure the value of response variable w/out trying to influence the value of the response or explanatory variables. Researcher observes behavior of individuals in the study w/out trying to influence outcome. Association may be claimed but not causation. |
designed experiment | An experiment where the researcher assigns the individuals in a study to a certain group, intentionally changes the value of an explanatory variable, then records the value of the response variable for each group. |
explanatory variable | a variable that explains or causes changes in the response variable |
response variable | a variable that measures an outcome or result of a study (variable whose changes are to be studied) |
confounding | Occurs when the effects of two or more explanatory variables are not separated, so any change in the response variable may be due to a variable that was not accounted for in the study. |
lurking variable | An explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. Lurking variables are typically related to explanatory variables considered in the study. |
three categories of observational studies | 1. cross-sectional studies 2. case-control studies 3. cohort studies |
cross-sectional studies | Observational studies that collect information about individuals at a specific point in time or over a very short period of time. |
case-control studies | Retrospective studies that require individuals to look back in time or require the researcher to examine existing records. Individuals that have a certain characteristic are matched with those that do not. |
cohort studies | Group of individuals participates in study (the cohort). Cohort observed over time. Characteristics @ individuals are recorded. Some individuals exposed to certain factors; others are not. At study end, value of response value is recorded for individuals. |
census | a list of all individuals in a population along with certain characteristics of each individual |
random sampling | the process of using chance to select individuals from a population to be included in the sample |
simple random sampling | every possible sample of size n from a population of size N has an equally likely chance of occurring |
frame | lists all the individuals in a population |
sample without replacement | once an individual is selected, he is removed from the population and cannot be chosen again |
sampling with replacement | a selected individual is placed back in the population and could be chosen again |
seed | provides an initial point for a random-number generator to start creating random numbers |
stratified sample | Obtained by separating the population into non-overlapping groups called strata and then obtaining a simple random sample from each stratum. The individuals within each stratum should be homogeneous in some way. |
systematic sample | Obtained by selecting every kth individual from the population. The first individual selected corresponds to a random number between 1 and k. |
steps in systematic sampling | 1. Approximate population size, N 2. Determine sample size, n 3.Find N/n, round down to the nearest integer – this is k. 4. Randomly select a number between 1 and k. This is p. 5. The sample will be the following individuals: p, p+k, p+2k,…p+(n-1)k |
cluster sample | Obtained by selecting all individuals within a randomly selected collection or group of individuals |
self-selected convenience sample | Individuals themselves decide to participate in a survey. Also known as voluntary response samples. |
multistage sampling | the use of a combination of sampling techniques |
bias | the results of the sample are not representative of the population |
three sources of bias in sampling | 1. Sampling bias 2. Nonresponse bias 3. Response bias |
sampling bias | the technique used to obtain the individuals to be in the sample tends to favor one part of the population over another |
undercoverage | the proportion of one segment of the population is lower in a sample than it is in the population |
nonresponse bias | individuals selected to be in the sample who do not respond to the survey have different opinions from those who do |
methods to decrease nonresponse bias | 1. callbacks 2. rewards and incentives |
response bias | the answers on a survey do not reflect the true feelings of the respondent |
sources of response bias | 1. interviewer error 2. misrepresented answers 3. wording of questions 4. ordering of questions or words 5. type of question (open or closed) 6. data entry error |
open question | a question for which the respondent is free to choose his or her response |
closed question | a question for which the respondent must choose from a list of predetermined responses |
nonsampling errors | Errors that result from undercoverage, nonresponse bias, response bias, or data entry error. May be present in a complete census of the population. |
sampling error | Error that results from using a sample to estimate information about a population. Occurs because a sample gives incomplete information about a population. |
Frequency Distributions
Question | Answer |
---|---|
What is a raw score | A data point that has not been transformed or analyzed |
What is a frequency distribution | the pattern of a set of numbers by displaying a count or proportion for each possible variable |
What is a frequency table | a visual depiction of data that shows how often each value occurred that is how many scores were at each value.Values are listed in one column and scores are listed in second column |
What is a histogram | It looks like a bar graph but is typically used to depict scale data with the x-axis and the frequencies on the y-axis |
What is a frequency polygon | is a line graph with the x-axis representing values or midpoints of intervals and the y-axis representing frequencies. A dot is placed at the frequency and connected |
What is a grouped frequency table | a visual depiction of data that reports the frequencies within a given interval rather than the frequencies for a specific value |
What is a normal distribution | a specific frequency distribution that is a bell shaped symmetric uni-modal curve |
What does positively skewed mean | the distribution tail extends to the right in a positive direction |
What is the floor effect | a situation in which a constraint prevents a variable from taking on values below a certain point |
What is the ceiling effect | a situation in which a constraint prevents a variable from taking on values above a given number |
intro to stats and research design
Question | Answer |
---|---|
What is a variable | any observation of a physical, attitudinal or behavioral characteristic that can take on values |
What is a discrete observation | can take on only specific values–whole #– no other values can exist between these numbers |
What is a continuous observation | can take on a full range of values–number out several decimal places– an infinite number of potential values exists |
What is nominal variables | a variable used for observations that have categories or names as their values |
What is an ordinal variable | a variable used for observations that have rankings such as 1st, 2nd, 3rd… |
What is an interval variable | a variable used for observations that have numbers as their values and the distance or interval between pairs of consecutive numbers is assumed to be equal |
What is a ratio variable | a variable that meets criteria for an interval variable but also has a meaningful zero point |
What is a scale variable | A variable that meets the criteria for an interval or a ratio variable |
What is a level | a discrete value or condition that a variable can take on |
What is an independent variable | a variable that we either manipulate or observe to determine its effects on the dependent variable |
What is a good variable | reliable and complete |
what is a dependent variable | the outcome variable that we hypothesize to be related to or caused by changes in the independent variable |
What is a confounding variable | any variable systematically varies with the independent variable so that we cannot logically determine which variable is at work also called a confound |
What is reliability | Refers to the consistency of a measure |
What is validity | the extent to which a test actually measures what was intended to measure |
What is hypothesis testing | the process of drawing conclusions about whether a particular relation between variables is supported by the evidence |
What is operational definition | specifies the operations or procedures used to measure or manipulate a variable |
What is correlation | an association between two or more variables |
What is an experiment | a study in participants are randomly assigned to a condition or level of one or more independent variables which par |
What is a random assignment | every participant has an equal chance of being assigned to any group or experimental condition in the study |
What is an between–groups research design | participants experience one, and only one ,level of the independent variable |
What is a within–groups research design | the different levels of the independent variable are experienced by all participants in the study also called a repeated- measure research design |
For Life Sciences
Question | Answer |
---|---|
a statistic is? | a summary of data |
a field of statistics is? | the collecting, analysing and understanding of data measured with uncertainty |
what is a categorical variable? | one which is measured descriptively eg: hair colour or major at university |
what is a define quantitative variable? | one which is measured numerically: time it takes to get home from work |
graphical summary of one categorical variable? | bar graph |
graphical summary of one quantitative variable? | histogram or boxplot |
how to graphically summarise relationship between two categorical variables | clustered bar chart or jittered scatterplot |
how to graphically summarise relationship between two quantitative variables | scatterplot |
how to graphically summarise relationship between one categorical and one quantitative variable | comparative boxplots or comparative histograms |
what to look for in a graph | location, spread, shape, unusual observations |
define 'location' graphically | where most of the data lies |
define 'spread' graphically | variability of the data, how far apart or close together it is |
define 'shape' graphically | symetric, skewed etc |
how to numerically summarise one categorical variable | table of frequencies or percentages |
how to numerically summarise one quantitative variable | location: mean or median; spread: standard deviation or inter quartile range |
formula for mean? | xhat=1/N times summation of xi; preferable for approximately normal data |
formula for Median? | M=midn or (midn1+midn2)/ 2; less affected by outliers therefore used for outlier ridden data |
formula for standard deviation? | s=v1/N-1 times summation of ((xi-x) squared); preferable for approximately normal data |
formula for inter quartile range? | Q3 – Q1= IQR; less affected by outliers therefore used for outlier ridden data |
which numbers are needed to create a five number summary? | minimum, Q1, median (sometimes mean included), Q3, maximum |
an outlier is? | more than 1.5 x IQR lower than Q1; more than 1.5 x IQR higher than Q3 |
define linear transformation | transformation of a variable from x to xnew |
examples of linear transformation use | change of units; use of normal assumption therefore to find 'z' scores |
formula for linear transformation? | xnew=a+bx |
formula for new mean once linear transformation has occurred? | xbarnew=a+bxbar |
formula for new median once linear transformation has occurred? | Mnew=a+bM |
formula for new standard deviation once linear transformation has occurred? | snew=bs |
formula for IQR once linear transformation has occurred? | 1QRnew=bIQR |
explain density curves | area under the curve in any range of values is the proportion of all observations that fall within that range for a quantitative variable; like a smoothed out histogram describes probabilistic behaviour |
total area under density curve equals? | 1 |
explain the normality assumption | normal curve can be used if a histogram looks like a normal curve; termed 'reasonable'; must start at 0 and end at 0 |
how does a normal quantile plot confirm the normality assumption? | if in a straight line, or close to it, then normal and assumption is reasonable |
define the 68-95-99.7 rule | 68% of results will be within 1 standard deviation of the mean; 95% of results will be within 2 standard deviations of the mean; 99.7% of data will be within 3 standard deviations of the mean |
symbol for mean of a density curve? | ? |
symbol for standard deviation of density curve? | ? |
normal distribution short hand | X = random variable; N = normal distribution; first number in brackets = mean; second number in brackets = standard deviation |
explain the standard normal variable | example of set out: P = (n>Z); corresponds to the area under the curve of the corresponding region; will always be to the left of Z |
use of the standard normal distribution table | to find P: Z found along x and y axis of table; to find Z: P found in results of table; table ordered from smallest to largest |
reverse use of the standard normal distribution table | eg of how set out: P(Z<c)= n; c = right of Z |
X =? | N(?,?) |
formula and use of standardising transformation | Z= (X-?)/?; used when distribution is not N(0,1)and so it needs to be altered |
relationships between variables best explored through? why? | scatterplot; can get a sense for the nature of the relationship |
how to define the nature of relationship? | existent/ non-existent; strong/ weak; increasing/ decreasing; linear/ non-linear |
outliers in scatterplots? | represent some unexplainable anomalies in data; could reveal possible systematic structure worthy of investigation |
define casual relationship | relationship between two variables where one variable causes changes to another |
define the explanatory variable | explains or causes the change; written on x-axis |
define the response variable | that which changes; written on y-axis |
useful numbers for two quantitative variables? | correlation or regression |
formula for the correlation coefficient? | r= 1/(n-1) times summation (xi-xbar/sx)(yi-ybar/sy) |
define xi or yi | axis values of corresponding letter |
define xbar and ybar | mean of axis values of corresponding letter |
define sx and sy | standard deviation of axis values of corresponding latter |
state the properties of r | is the correlation coefficient; numerically expresses relationships; if close to 1 = strong positive linear relatoinship; if close to -1 = strong negative linear relationship; close to 0 = weak or non-existent linear relationsip |
state the cautions about the use of r | only useful for describing linear relationships; sensitive to outliers |
what is least squares regression used for? | to explain how a response variable is related to explanatory variable; focus positive = increase; focus negative = decrease |
mathematical representation of regression | b1=r(sy/sx); b0=yhat-b1xbar; y=b0+b1x |
facts about b1 | b1 = r = correlation coefficient = slope |
how to determine the strength of a regression | rsquared = syhat/sy; r-squared is the % variation in y explained by linear regression |
state the basic regression assumptions | y=b_0+b_1+error; error~0; error corresponds to random scatter about line; this is checked by residual plots |
formula for residual plots? | y – y-hat |
residual plot is a scatter plot of? | residuals(y axis) against explanatory variable(x axis) |
interpreting residual plots | focus on pattern; there should be no pattern; if there is a pattern then the linear assumption is incorrect |
what to do if any residuals stand out? | they are either an influential point and to be left alone; or they are an outlier and to be removed if affecting results too much |
how to attach special cause to an outlier | analyse if recording error; refit line; if remove then justify why (down weight influence) |
translated residuals (removing the outlier) should have what effect? | spread pattern |
any 0 intercepting points on a residual plot are? | 1 standard deviation from mean |
if parabola presents after outlier removal? | x-hat assumption not appropriate |
if spread doesn't vary far from 0? | there is no pattern |
when to remove outlier | if influences results |
when will outlier not influence results? | when close to mean; – will have little influence on the gradient and intercept of fitted line |
what are lurking variables? | variables that can influence results which have not been taken into account |
to account for lurking variables you? | analyse the covariance |
state the strategy for using data in research? | identify question to be answered; identify population studied; locate variables: which one is IV and DV, explanatory and response; obtain data which answers question |
define anecdotal data | haphazard collection of data; unreliable for drawing conclusions |
define available data | use of data that has come from another source possibly obtained for a reason other than the one you intend to use it for |
define collect your own data | use of a census, a survey, or observations from an experiment |
define census | use of whole population to obtain data |
define sample | use of a randomised selection of the population to represent the whole; smaller and easier to do than a census |
explain observational study | no variables are manipulated or influenced; data obtained from population as it is |
explain experiment | variables are influenced or manipulated so that responses can be noted and recorded; usually a control group utilised control group = does not undergo treatment, act as a comparison group |
explain causation | a response that is the result of another variable eg: moon's movements CAUSE the tides |
common response in terms of variables means? | explanatory variable causes the response variables; response variables are associated to one another |
causation in terms of variables means? | explanatory variable causes response variable; response variable and explanatory variable are associated |
confounding in terms of variables means? | two or more explanatory variables are present and associated to one another; all explanatory variables could have caused response variable by themselves or together; explanatory variables called confounded causes |
why an experiment? | allows demonstration of causation; intervention can be used to determine whether or not effect is present |
state the principles of experiment design | subjects, treatment, factor, levels, response variable |
definition of subjects in terms of experiment design | things upon which experiment is done; eg: people, animals, chemicals etc |
definition of treatment in terms of experiment design | circumstances which applied to subjects; eg: given medication |
definition of factor in terms of experiment design | variables that are apparent within different treatments; eg: given medication or placebo |
definition of levels in terms of experiment design | formation of treatments determined by which combination of factors used; eg: dosage of medication/how many doses per day vs dosage of placebo/ how many placebo taken |
definition of response variable in terms of principle of experiment design | the variable which will answer the question variable of most interest that is measured on subject after treatment |
explain a principles of experiment summarisation table | Factors on x and y axis; levels in first columns and rows; rest of table = number allocated to that particular treatment group |
state the three principles all experiments must follow | compare two or more treatments where one is the control; random assignment of subjects to treatments; repeat the experiment on numerous subjects (for reduction of confounding variables) |
how to randomise | allocate all subjects a random number; order subjects in accordance to those random numbers (smallest to largest, or largest to smallest); form treatments by selecting subjects in a systematic pattern applied to the random numbers representing subjects |
define control group: | different from all other treatments as it only pretends to apply explanatory variable; is the group that the results are compared against |
explain random comparative experiment | subjects randomly allocated one of several treatments; responses compared across treatment groups |
explain matched pairs design | break subjects with similar properties into pairs; one of two treatments applied to one of each pair; can produce more precise results; used in before and after, and twin studies |
explain random block design | block = group of subjects known before experiment to be similar in some way that would affect response; randomised assignment of treatments to subjects within block; matched pairs is special case of this |
experimental caution: appropriate control | only variant across treatment(s) is/are factor(s) |
experimental caution: beware of bias | administrator of experiment can present bias towards certain treatment to certain subjects double blind accounts for this: neither subject nor administrator know which treatment applied |
experimental caution: repetition of entire subjects | all steps for experiment are performed for all subjects in all treatments |
experimental caution: realistic experiment | experiment needs to duplicate real-world conditions |
2nd Quiz Study Guide
Question | Answer |
---|---|
Traditionalism | 1) Social Science is not a hard Science 2) Humans are too complex for quantification 3) Historical, anecdotal, journalistic approach |
Behavioralism (aka Basic Research) | 1) There are regularities to permit generlizations 2) Explicit, Replicable, neutral methods 3) Priority: hypothesis testing to build theories Goal: highly predictive interlocking theories |
Applied Research (Post-Behaviorialism or Policy Analysis) | Accepted the merits of explicit, rigorous, replicable scientific methods Changed the goal from building theory to addressing practical/applied/policy questions And acknowledged the role of values in setting research priorities |
Classic Model of the Scientific Process | 1) Theory 2) Deduce Hypothesis from theory 3) Design Study and operationalize concepts 4) Conduct the Study (collect the data) 5) Analyze data to accept/reject hypothesis 6) Support, modify, or reject initial theory |
Model of Applied Research | Begin with specific, practical issue Devise Testable research question – Design study and operationalize concepts – Conduct the study (collect the data) – Analyze data to accept/reject hypothesis Use results to inform decision-maing |
Hypothesis | A testable statement of the relationship between two or more variables |
Theory | A set of logically related propositions intended to explain a range of phenomena |
Main Structure of Research Reports | Intro (Problem Area; Issues) Literature Review Methodology Findings Discussion and Conclusion |
The Strong Lit Review | Primary (not secondary) sources Nonelectronic searches Contact leading researchers Add unpublished/forthcoming research Diagram/model key relationships Use elements of meta-analysis |
Meta-Analysis Steps | (1) Clear Statement of Hypothesis (2) Explicit and Replicable Lit Searches (3) Set Variables for Coding Studies (4) Analyze predictors of the results – Certain factors associated with certain outcomes? |
Good Individual Questions | Short as possible Shared, simple vocab Unbiased Language/premises Unambiguous Answers Confined to one issue Exhaustive/Exclusive Categories |
Good Format and Overall Flow | Brief Smooth Intro Easy Non-threatening start Early closed-ended questions Move from general to specific Delay sensitive issues until later Demographics last Fair Framing Short transitions Consistent series answer format |
Census vs Sample | Use Census if feasible, affordable and not often; but samples usually more practical |
Random vs Nonprobability | Use random samples unless desperate |
Nonprobability Sampling | Convenience Purposive Snowball |
Random sampling includes | Simple (every nth) Stratified (proportionate or non proportionate) |
Simple Random Sampling | Each sample chosen independently and randomly from the sampling frame |
Systematic | Selecting every nth item from a list (from a random point) |
Stratified | Draw random samples within groups if easier or to over sample a group intentionally. Proportionate or Disproportionate |
Response Rate Determinants | Costs – Est. Lengths / Time / Complexity Benefits – Enjoyable / Important/ Satisfaction |
Evaluating a Sample Size | Overall precision (CI) needed Depth of Subgroup analysis As well as the research budget |
95% Confidence Interval – Sample 100 | +/- 10% |
95% Confidence Interval – Sample 600 | +/- 4% |
95% Confidence Interval – Sample 1100 | +/- 3% |
Nominal | Categories by names only (region, religion, sex) |
Ordinal | Categories can be ordered on a single dimension (agree/disagree; highest degree earned; young, middle, old) |
Interval | Increments are consistent but no absolute zero (Fahrenheit, year of birth) |
Ration | Absolute Quantities (amount of dollars, inches, siblings, years, pounds) ask yourself…can it be TWICE AS MUCH? |
Principles of Data Analysis | (1) Good Data are a prerequisite (2) All Statistics are reductionist (3) Context dictates interpretation (4) Avoid Exaggerating small gaps (Bill hates this!) (5) Correlation DOES NOT equal Causation (6) Start with Univariate Analysis |
Univariate Nominal Variables | Mode = Plurality but not always a majority Percentages = usually round % |
Univariate Nominal Variables – Interpretation Pitfalls | Misleading Pictograms Confusing absolute and relative % Misinterpreting nominal nodes as if they were midpoint/averages Misleading/simplified composites from nominal and other modes |
n | Univariate Sample size |
N | Univariate population size |
Measures of Central Tendency | (1) Mean (2) Median (3) Trimmed Means |
Mean | Sum divided by # of cases; very sensitive to extreme values. x with line on top is sample mean; mu which looks like a u is for population mean |
Median | 50th Percentile; half of the cases below; half above; totally insensitive higher and lower values |
Trimmed Means | Discard a percent of the highest/lowest values, top and bottom five percent…used in Olympic scoring |
Measures of Dispersion | (1) Range (2) Standard Deviation (3) Interquartile Range |
Range | Highest to lowest value; crude measure of dispersion |
Standard Deviation (Equation) | Square root of the sum of the squared difference of each case from the mean divided by the number of cases |
Standard Deviation | Shows the range of the middle 68% of cases in a normal curve, otherwise it only tells relative dispersion |
IQR | 25th to 75th percentiles; range of the middle 50% of all cases; easy to explain. |
Smaller IQR/SD Scores | Tight cluster of cases |
Measure of Shape | Skewness |
Skewness | Asymmetrical distribution skewed positively if a few high scores pull the mean above median; reverse (mean below the median) reflects a negative skew. |
The Normal Curve | The Bell Shaped Curve Central Limit Theorem |
+/- 1 Std Dev | 68.3% of all cases |
+/- 2 Std Dev | 95.4% of all cases |
+/- 3 Std Dev | 99.7% of all cases |
Descriptive Statistics | Data of the whole relevant population – treat results as real. |
Inferential Statistics | Used with sample because results are estimates. Keeps us from jumping to conclusions and treating sample estimate as more precise than they really are. |
Population based statistics are… | Descriptive Only |
Sample based statistics are… | Inferential and descriptive |
Formula for 95% CI around a proportion… | (Sqr Root of P multiplied by (1 minus P) divided by Sample Size) mulitplied by 1.96 |
Confidence Intervals for Means Formula | Std Dev of Sample divided by the sqr root of sample size, then multiplied by 1.96 |
When to use T-Test | Comparing means of two groups… (1) using sample data (derived from random sampling) (2) using experimental data (derived from random assignment) |
T-Test Steps | (1) State the Null Hypothesis (2) State Research Hypothesis (3) State Decision Rule (Probability Level) (4) Assume Equal Variance – Unless F-Test is significant (5) Reject or fail to reject the null |
Easiest Null for T-Tests | There is no difference in the mean (dependent variable) of (group 1) and (group 2) |
T-Test Interpretation | (1) Prevents 'jumping to conclusions' when differences in two means may just be random variation (2) Statistical significance is not the same as substantive significance (3) Easy to get stat. sig. with large samples, hard with small samples |
T-Test and Population studies without randomized data | No need for T-Test, because it is inferential. |
Difference in steps between Chi Square and T-Test | T-Test adds the F-test step. |
Similarities between Chi Square and T-Test | (1) Stat. Sig. does NOT necessarily mean it is important or consequential. (2) If NOT stat. sig. remember we never prove the null we just fail to reject the null. (3) A small sample may not be Stat Sig, but could be Stat Sig in a larger sample |
Three Elements of Causal Inference | (1) X & Y covary (2) X precedes Y (3) Rule out the Z's |
Post Hoc Fallacy | Fallacy of concluding that since change in Y followed X, it was caused by X. |
Antecendent Variables | Before X (Z->X->Y) |
Intervening Variables | Between X and Y (X->Z->Y) |
Campbell and Stanley's Notation System | O = Observations (measures) of Y Left to Right = Chronological Order Each Row = One Group of Subjects |
Single Group posttest only | X O |
Single Group pretest-posttest (before and after design) | O X O |
Static Group Design | O X O —– O O |
History | External event during period |
Maturation | Subjects change over time |
Practice | Familiarity with the measure |
Instrumentation | A changed measure |
Regression to the mean | If subjects are chosen due to extreme scores, they tend to regress to the mean on posttest |
Selection | Groups different from start |
Intragroup history | unique group event |
Mortality | groups differ in attrition |
What to do with Attrition… | (1) Omit pretest scores of lost subjects; (2) Omit all data of lost 'types' from all groups (3) Match by statistical weighting (4) Analyze by "intention to treat" (i.e. include dropouts) |
Between Group Reactivity | (1) Spillover (My buddy is sick and I know if I give him a lime he will get better) (2) Compensatory rivalry (controls try harder) (3) Resentful demoralization (Controls try less…I never get picked so I will just suck) |
Placebo Effect | Subject expectancy to get better and psychologically they do. (Reactivity) |
Novelty Effect | X works because its new. Innovation effect. Short term effect.(reactivity) |
Guinea Pig Effect | Subjects act differently because they feel that they are under surveillance. Evaluation Apprehension – I know I am under |
Demand Effect | Think they know what authority wants of them. The real pills are handed out with more conviction, requires double blind effect to limit. |
Social desirability | Reflexivity – Political Correctness, Societal pressures/inhibitions, I am supposed to act a certain way. |
Hawthorne Effect | Electric Plant Light Dimming Example. Refers to reactivity in general. |
Heisenberg Effect | Act of measuring something changes what you're measuring |
Two Elements of a true experiment | (1) Random Assignment of subjects to groups (2) Random Assignment of Treatments to groups |
Source of power of experiments | Comparability of the groups – the only real difference is one gets X, the other doesn't. Otherwise the two groups are identical. |
Classic Experimental Design | R) O X O R) O O |
Posttest Only Experiment | R) X O R) O |
Factorial Design | R) O Xa Xb O R) O Xa O R) O Xb O R) O O |
Complex X | Many ingredients in X |
Multiple Ys | Studies often measure the impact of X on several Ys. |
Compensatory Rivalry | Controls try harder |
Resentful demoralization | Controls try less |
Spillover effects/diffusion | Some X spills over to controls |
Strategies to minimize reactivity | (1) deceit (2) obscure / mislead (3) use placebo (4) double blind (5) time (hope they forget the study) |
Placebo | A dummy treatment given to the controls to 'hold constant' the impact of their expectations. Common in medical studies; not always possible. |
Natural Experiment | Both subjects and X were randomly assigned without a researcher's intervention; term is also sometimes used less strictly to refer to a close natural approximation even if lacking in randomization |
Big Four Categories of Validity | (1) Measurement Validity (2) Internal Validity (3) Statistical Conclusion Validity (4) External Validity |
External Validity | Generalizability; the essential yet unavoidably subjective judgment about the extent to which it is reasonable to generalize/extrapolate the findings of one study to other places, subjects, times, etc. |
How to strengthen external validity | (1) Test subjects representative of the subjects you want to generalize to (2) replications in varied settings (3) Consistent results in varied tests |
Limitations of Experiments | (1) Unethical or illegal to withhold X (2) Unethical or illegal to risk trying X (3) Unaffordable to finance in field (4) Infeasible to enforce X vs no X (5) Impractical to field test outside a lab |
Quasi-Experimental Designs | Commonly means any clever design lacking randomized control groups |
Causal-Comparative Designs | Studies that seek to infer causality using comparison groups without randomly assigned subjects |
Primary threat of Internal validity when no randomization | Selection |
NEC | Nonequivelent Comparison Group Design |
Nonequivalent Comparison Group Designs | O X O —– O O |
Retrospective match / Ex post facto design | Creating a comparison group later by finding and matching subjects similar to those who previously got exposed to X. |
Time Series Designs | X may be short term or enduring. Top internal validity threat is history. Trend line makes it superior to O X O. |
Simple Interrupted Time Series | O O O O O O X O O O O O O |
Reiterative Time Series | O O X O O X O O X O O |
Comparison Time Series | O O O O O X O O O O O ——————— O O O O O O O O O O |
Multiple Time Series | O O X O O O O O O ——————— O O O O X O O O O ——————— O O O O O O X O O ———————- O O O O O O O O |
Panel | Repeated data tracking same people; valuable but expensive, can produce reactivity |
Cross-sectional data | Time series with new random samples from same population. Shows net change but masks the rest. |
Deceptive Time Series Charts | Using a truncated base plus narrow or wide axis. |
Retrospective pretests | Proxy pretests – recollections used for pretest measure. |
Danger of time series inferences from a single survey | Can not infer age = time. Bill used the Navy Officer surveys of high ranking and low ranking officers, infering that low ranking officers will think like high ranking officer when they get there. |
Correlational Designs | Typically using a single survey to try to "statistically control" for alternative explanations, often using multiple regression. Issues with selection. |
Aggregate Data | Units of analysis are groups, such as precincts, cities, states. |
Ecological Fallacy | Drawing individual level inferences from aggregae-level correlations. |
Check list of Empirical Studies | (1) Theory Building or Applied Research (2) Causal or Descriptive (3) Exact Hypothesis (4) Independent Variable(s) (5) Dependent Variable(s) |
When Something is NOT Statistically Significant | Do not bring it up. Consider the dispersion between the groups. |
T-Test Analysis | Analysis is black and white, it is or it isn't stat. sig. If you hit .05, you have a slight relationship. State just that, a slight relationship. |
Grouping Ratios | Becomes Ordinal |
Central Tendency | Mean, Median, Trimmed Mean |
Extreme Lopsided Distribution does what to Confidence Intervals? | Becomes Smaller |
At what level is .012 statistically significant? | It is Stat. Sig at .05, but NOT at .001 or .01. |
True or False – Standard Deviation is a measure of Central Tendency? | False |
What is the biggest threat to NEC design? | Selection |
What is the biggest threat to Time Series Designs? | History |
What does comparing results to go good existing records? | Concurrent Validity |
What are two elements of dispersion? | IQR and SD |
Two Types of Empirical Validity | (1) Concurrent Validity (2) Predictive Validity |
Concurrent Validity | Testing a measure against existing data believed accurate. (Empirical) |
Predictive Validity | Testing a measure designed to predict future outcomes by the actual success of its forecasts. (Empirical) |
Subjective Validity | (1) Face Validity (2) Content Validity |
Face Validity | Operationalizing the usual usage of a word in a reasonable way. |
Content Validity | Operationalizing the full scope of the entire intended concept and not just a part of it. |
Multiple Measures (Triangulation) | Assessment using a variety of indicators (not just one) |
Unobtrusive Measures | No survey – Measuring actual behavior – not just self-reported behavior. |
Validity | Accuracy |
Reliability | Consistency |
According SPSS Scale Measurements are… | Interval and Ratio |
Content Analysis Steps | (1) Define exact scope of the study (dates, sources, search strategy); (2) Operationalize variables to code; (3) Refine coding system & test reliability; (4) Code the content under study; (5) Analyze Patterns |
Is Content Analysis Descriptive or Causal? | By itself it's descriptive. If part of a study it can be Causal. |
Intercoder Reliability Test | Where independent coders, at least 2, evaluate a characteristic of a message or artifact and reach the same conclusion. Must have atleast 80% rate. |
What to worry about in analyzing patterns in Content Analysis… | Caution in drawing inferences. |
Types of Operationalize Variables to Code | (1) Specific Word Count; (2) Sources Quoted; (3) Topics; (4) Overt Visual Image; (5) Voice Inflections; (6) Subtle Themes; (7) Global Code |
Uses in Content Analysis | History, Public Relations, National Intelligence, Lobbying, Detective Work, Mass Communication, Linguistics |
Content Analysis | Systematic analysis of patterns in communications |
When to use inferential Stats? | Randomized – ALWAYS! Population – Use if group can be used as a sample. |
Qualitative Research | More exploratory, small purposive "samples", open-ended semi-structured interviews, more time per subject, narrative format, note researchers impact. |
Quantitative Research | More defined, specific hypothesis testing, large random samples, close-ended instruments, less time per subject, data-based reports, distant/unacknowledged. |
Matching Qualitative and Quantitative | Start with Qualitative research to define the issues/vocabulary, to help generate/refine research questions, test a draft questionnaire. Then conduct quantitative study. Use qualitative to explore puzzles found. |
Purpose of Focus Groups | In-depth probing of views (pre-existing); Reactions to new stimuli (new responses); Group brainstorming (new idea generation); |
Focus Groups Format | Recruit relevant participants; 10-12 people, 1.5 to 2 hours long, audio/video taped, semi-structured format w/ open ended agenda questions, neutral moderator. |
The right number of Focus Group meetings | Depends on resources, how much is at stake, but at least more than one! |
Bivariate Regression | One X, Correlation Coefficient = r, Coefficient of Determination = r2, Y=a + bX |
Multiple Regression | Two or more Xs, Multiple Correlation Coefficient = Multiple R, Multiple Coefficient of Determination = Multiple R2, Y=a+b1X1 + b2X2…b#X# |
Y = a + bX | a=intercept; b=slope |
Multiple Correlation Coefficient | Multiple R |
Multiple Coefficient of Determination | R (squared) |
Unstandardized Coefficients in Multiple Regression Equations | Symbol: b; Unstandardized Partial regression coefficient/slop; slope change measured in original units; |
How to interpret Unstandardized Coefficients in Multiple Regression Equations | If b is -3, subtract 3 years for every pack of cigarettes. |
Standardized Coefficients in Multiple Regression Equations | Symbol: B (Greek Beta); Beta or beta weight or standardized partial regression coefficient/slope; in units standardized as Z-scores (Std. Dev. Units) to allow comparisons. |
How to interpret Standarized Coefficients in Multiple Regression Equations | Use for ranking variables: The higher the beta the more powerful the X. |
Multicollinearity | Overlap of variables |
Dummy Variable | When there is a dichotomy within variables, this process enables the portion of the variable not being measured to not be calculated. |
r | Correlation Coefficient |
Correlation Coefficient (r) | Summarizes the strengths of the linear relationship between two scale variables. Perfect Positive Correlation 1.0 (Left up to Right); Perfect Negative Correlation -1.0 (Left down to Right). 0 = No correlation. |
r(squared) | Coefficient of Determination |
Coefficient of Determination (r2) | Indicates strength of relationship but has no negative sign. Yields lower but more intuitive score. |
Role of Correlation Coefficient and Coefficient of Determination | Both summarize (in slightly different ways) the strength of the relationship between two scale variables. Neither is inferential. |
Feature of Correlation Coefficient | Shows strength and direction, though somewhate inflated. |
Feature of Coefficient of Determination | Shows strength and proportion of variation explained, but lacks direction sign. |
Homoscedasticity | Even variation around the slope (Homo is straight) |
Heteroscedasticity | Uneven Variation on the slope (Hetero is balled up) |
Bivariate Analysis of Outliers | Could be bad data, but may provide lesson learned data for how to do it right or very bad. |
Standard Error of the Estimate (SEE) | Applies lines that show what falls within the 68% of the regression line. |
Is Standard Error of the Estimate Inferential? | Not just no, but hell no! |
Aggregate Data | Units of analysis are collectivities (i.e. counties, states, countries) |
Ecological Fallacy | Drawing individual-level inference from a pattern in aggregate data. |
Exam 1 Kellogg
Question | Answer |
---|---|
Sample | individuals selected to represent the population |
Population | all possible individuals which a study may apply to. |
parameter | numerical value that replaces characteristic of a population |
Statistic | Numerical value that describes a characteristic of a sample |
Variable | Characteristic that changes within the same individual or between different individuals |
Sampling Error | Numeric difference that exists between the statistic and the parameter |
Nominal | No real quantitative value: Numbers simply replace name |
Ratio | Has an absolute zero |
Ordinal | variable has ordered categories that are not equidistant and does not have a point of absolute zero: Order is not equal distance |
Interval | Variable has ordered numeric categories that are equal but does not have a point of absolute zero |
Experimental method | cause and effect |
Correlational (Observational) method | naturally occurring relationship between two or more variables |
Dependent Variable | measurable/observable for group differences to access the effect of Independent variable |
Independent variable | variable manipulated (or controlled) by experimenter. |
Operational definition | defining a variable by manner in which the variable is used or measured. |
Inferential statistics | procedure used to generalize characteristics of sample to population |
What two graphs can be used for Ordinal | Bar or Pie graph |
What two graphs can be used for Nominal | Bar or Pie graph |
What two graphs can be used for Interval | Freq Poly or Histogram |
What two graphs can be used for Ratio | Freq Poly or Histogram |
Constant | Does NOT change from one individual to the next |
Theory | Set of ideas used to explain the functioning of and making predictions about a relationships or set or relationships |
Hypothesis | Specific, testable prediction about the relationship between two or more variables |
Experimental control | using random assignment and holding extraneous variables constant |
Experimental group | receives the treatment level of the independent variable, get some sort of active treatment |
Control Group | Does NOT receive the treatment of the independent variable: Gets no treatment or placebo |
Confounding variable | Uncontrolled variable that can systematically vary with independent, masking or enhancing the true effect or the independent variable. |
Mesokurtic | normal distribution |
Leptokurtic | Tall and Thin |
Platykurtic | flat with little elevation |
Skew | where does the tail extension |
Kurtosis | odd shape |
Give examples of Nominal | Number given in place of name: Male/Female |
Give examples of Ordinal | Order Matters; numbers mean different things: Ranks/Place finishes |
Give example of Intervals | has both nominal and ordinal scales, but the numbers are equal (2 and 3 is the same distance as 3 is to 4) |
Give example of Ratio | Has all the characteristics as nominal, ordinal and intervals except it has an absolute zero: Kelvin scale. Zero means Zero |
Test 3
Question | Answer |
---|---|
6.2 | |
uniform distribution | values spread evenly over the range of possibilities |
Standard normal distribution | normal probability distribution with mean=0 and stand. deviation=1 |
normal cdf | probability |
inv.norm. | z-score |
6.3 | |
z score formula | x-mean/stand. dev. |
Standard error of the mean o x |
mean/square root of n |
Sample of Values | _ x-mean/stand.Dev./Sqare root of n |
7.2 | |
Point Estimate | single value(or point) used to approximate a population parameter |
Confidence Interval | range of values used to estimate the true value of a population parameter |
Confidence level | probability 1-a |
critical value | number on the borderline seperating sample statistics that are likely to occur from those that are unlikely to occur(z a/2) |
Margin of error | denoted by E,maximum likely difference |
Round off rule-CI | 3 digits |
Round off rule-Sample size | larger whole number, can't have half a person |
sample size for population mean | n=[za/2o /e]^2 |
Degrees of Freedom | number of sample values that can vary after certain restrictionsw have been imposed on all data values |
Degrees of freedom formula | n-1 |
Zinter | mean is known and n>30 |
Tinter | mean NOT known and n>30 |
stats test 3
Question | Answer |
---|---|
what is the mean of z score distribution | 0 |
what is the standard deviation of z score distribution | 1 |
under what conditions will our distribution be normal? | as the sample size increases, the shape of the distribution becomes more like the normal curve |
calculate z-scores | |
what information effects standardized error? | sample size, variability the larger the sample size, the smaller the standard deveiation of the distribution of means-standard error |
6 steps of hypothesis testing 1.identify the pouplations, distribution&assumptions then choose the appropriate hypothesis test 2.state the null&research hypotheses in words&symbolic notation 3.deturmine the characteristics of the comparison distributio |
4.deturmine the critical values or cutoffs that indicate the points beyond which we will reject the null hypotheis 5.calculate the test statistic 6.decide whether to reject or fail to reject the null hypothesis |