Question | Answer |
---|---|

Compare a 'population' to a 'sample'. | Pop = set of all measurements of interestSample = subset of the population |

Compare a 'parameter' to a 'statistic'. | Parameter = a # deduced from the populationStatistic = a # taken from the sample data |

What is the problem with populations/parameters? Or, why do we use samples/statistics more frequently? | Populations and parameters are very difficult to gather. Stats gives us an accurate account of the larger groups information. |

Define 'mean'. | The sum of the observations divided by the # of observations. (average) |

Define 'median'. | The value in the middle of the data set when they are organized lowest to highest. This is averaged when there are two numbers. (middle) |

Define 'mode'. | The value that occurs with the greatest frequency. |

Define and calculate a 'percentile'. | Def – The pth percentile is a value where p percent of all observations are less than or equal to this value. i = (p/100)n, where n is the number of values. 'i' is the i'th number in the ordered list of data. note: the 50th percentile is also the median. |

Calculate the 'quartiles'. | Q1 :: i=(25/100)n , Q2 :: i=(50/100)n , Q3 :: i=(75/100)n |

Calculate 'range'. | largest value – smallest value = range |

Calculate 'Interquartile Range (IQR)'. | IQR = [Q1 – Q3] , Q1 :: i=(25/100)n , Q3 :: i=(75/100)n |

Define the 'variance' and calculate sample variance. | The measure of variability around the mean. Sample variance (denoted as s^2) = (sum of all squared deviations)/(n – 1) where "deviations" is (x'i – mean) |

Define and calculate the 'standard deviation'. | The standard deviation is the positive square root of the variance. |

Calculate the 'coefficient of variation'. | ((standard deviation / mean) x 100)% |

Define and calculate the 'z-score'. | aka 'the standardized value'. The number of standard deviations the value is away from the mean. (x'i – mean)/(sample standard deviation) |

Define 'Chebyshev's Theorem'. | At least (1 – 1/z^2) of the data values must be within z standard deviations of the mean, where z is any value greater than 1. |

Define 'empirical rule'. | *only used when symmetrical, bell-curve distribution* 68% of data is within 1 standard deviation, 95% is within 2 sd, and almost all is within 3 sd. |

Explain how to detect an outlier. | An outlier has a z-score of 3 or more (it is 3 or more standard deviations away from the mean). |

Combinations | — |

Permeutations | — |

Draw a Tree Diagram. | … |

Combinations nCr | |

Define 'Intersection'. | The points belonging to A and B. |

Define 'mutually exclusive'. | Neither A nor B have any similar points. |

## CH 5, 6, 7, & 8

Question | Answer |
---|---|

Raw score | An original, untransformed observation or measurement. |

Z-score | A standardized score with a sign that indicates direction from the mean (+ above µ and – below µ), and a numerical value equal to the distance from the mean measured in standard deviation. |

Z-score transformation | A transformation that changes raw scores (X values) into z-scores. |

Standard score | A score that has been transformed into a standard from. |

Standardized distribution | An entire distribution that has been transformed to create predetermined values for µ and Theta |

Z= | X-µ/ O |

ZO = | X-µ = deviation score |

X = | µ+ZO |

Probability | Probability is defined as a proportion, a specific part out of the whole setoff possibilities. |

Proportion | A part of the whole usually expressed as a fraction |

Random sample | A sample obtained using a process that gives every individual an equal chance of being selected constant over a series of selections |

Sampling with replacement | A sampling technique that returns the current selection to the population before the next selection is made. A required part of random sampling. |

Independent events | Two events are independent if the occurrence of either one has no effect on the probability that the other will occur. |

Normal distribution | A symmetrical, bell-shaped distribution with proportions corresponding to those listed in the unit normal table. |

Unit normal table | A table listing proportions corresponding to each Z-score location in a normal distribution. |

Percentile | A score that is identified by the percentage of the distribution that falls below a specific score. |

Percentile rank | The percentage of a distribution that falls below a specific score. |

Binomial distribution | the distribution of probabilities, for each possible outcome, for a series of observations of a dichotomous variable. |

(A)p = | Number of ways event A can occur / Total number of possible outcomes |

z= | X – pn / vnpq |

µ = | pn |

O = | vnpq |

Distribution of sample means | The set of sample means from all the possible random samples for a specific sample size (n) from a specific population |

Sampling distribution | a distribution of statistics (as opposed to a distribution of scores). The distribution of sample means is an example of a sampling distribution. |

Expected value of M | The mean of the distribution of sample means. The average of the M values. |

Standard error of M | The standard deviation of the distribution of sample means. The standard distance between a sample mean and the population mean. |

The central limit theorem | A mathematical theorem that specifies the characteristics of the distribution of sample means. |

Om= | O/vn or v((O^2)/n) |

Z = | M-µ / Om |

Hypothesis testing | A statistical procedure that uses data from a sample to test a hypothesis about a population |

Null hypothesis, Ho | The null hypothesis states that there is no effect, no difference, or no relationship. |

Alternative hypothesis, H1 | The alternative hypothesis states that there is an effect, there is a difference, or there is a relationship. |

Type I error | A type I error is rejecting a true null hypothesis. You have concluded that a treatment does have an effect when actually it does not. |

Type II error | A type II error is failing to reject a false null hypothesis. The test fails to detect a real treatment effect. |

Alpha (a) | Alpha is a probability value that defines the very unlikely outcomes if the mull hypothesis is true. Alpha also is the probability of committing a Type I error. |

Level of significance | The level of significance is the alpha level, which measures the probability of a Type I error. |

Critical region | The critical region consists of outcomes that are very unlikely to be obtained if the null hypothesis is true. The term very unlikely is defined by (alpha) a. |

Test statistic | A statistic that summarizes the sample data in a hypothesis test. The test statistic is used to determine whether or not the data are in the critical region. |

Beta (?) | Beta is the probability of a Type II error. |

Directional (one-tailed) test | A directional test is a hypothesis test that includes a directional prediction in the statement of the hypotheses and place the critical region entirely in one tail of the distribution. |

Effect size | A measure of the size of the treatment effect that is separate from the statistical significance of the effect. |

Power | The probability that the hypothesis test will reject the mull hypothesis when there actually is a treatment effect |

(Type I Error) p | a |

Type II Error) p | ? |

Cohen’s d | Mean difference / Standard Deviation = M -µ / O |

## Vocabulary

Question | Answer |
---|---|

Statistics | the study of how to collect, organize, analyze, and interpret numerical imformation from data |

Individuals | people or objects included in the study |

Variable | the characteristic of the individual to be measured or observed |

Quantitative variable | has a value or numerical measurement for which operations such as addition or averaging make sense |

Qualitative variable | describes an individual by placing the individual into a category or group such as male or female |

Population data | variable is from every individual of interest |

Sample data | variable is from only some of the individuals of interest |

Descriptive statistics | involves methods of organizing, picturing, and summarizing information from samples or populations |

Inferential statistics | involves methods of using information from a sample to draw conclusions regarding the population |

Nominal | We can put the data into categories |

Ordinal | We can order the data from worst to best |

Interval | We can order the data and also take the differences between the data values. Does not include zero |

Ratio | We can order the data, take differences, and also fine the ratio between data values. Does include zero |

Census | measurements or observations from the entire population are used |

Sample | measurements or observations from a representative part of the population should be used |

Observational study | observations and measurements of individuals are conducted in a way that doesn't change the response or the variable being measured |

Experiment | a treatment is deliberately imposed on the individuals in order to observe a possible change in the response or variable being measured |

Control group | group that receives a dummy treatment disguised as the real treatment |

Confounding variables | variables taht might be an underlying cause of a change in response in the experiment group |

Randomization | used to assign individuals to the treatment groups |

Replication | reduces the possibility that the differences in pain relief for the two groups occured by chance alone. |

Statistics | the study of how to collect, organize, analyze, and interpret numerical imformation from data |

Individuals | people or objects included in the study |

Variable | the characteristic of the individual to be measured or observed |

Quantitative variable | has a value or numerical measurement for which operations such as addition or averaging make sense |

Qualitative variable | describes an individual by placing the individual into a category or group such as male or female |

Population data | variable is from every individual of interest |

Sample data | variable is from only some of the individuals of interest |

Descriptive statistics | involves methods of organizing, picturing, and summarizing information from samples or populations |

Inferential statistics | involves methods of using information from a sample to draw conclusions regarding the population |

Nominal | We can put the data into categories |

Ordinal | We can order the data from worst to best |

Interval | We can order the data and also take the differences between the data values. Does not include zero |

Ratio | We can order the data, take differences, and also fine the ratio between data values. Does include zero |

Census | measurements or observations from the entire population are used |

Sample | measurements or observations from a representative part of the population should be used |

Observational study | observations and measurements of individuals are conducted in a way that doesn't change the response or the variable being measured |

Experiment | a treatment is deliberately imposed on the individuals in order to observe a possible change in the response or variable being measured |

Control group | group that receives a dummy treatment disguised as the real treatment |

Confounding variables | variables taht might be an underlying cause of a change in response in the experiment group |

Randomization | used to assign individuals to the treatment groups |

Replication | reduces the possibility that the differences in pain relief for the two groups occured by chance alone. |

## States vocabulary

Question | Answer |
---|---|

Observational study | a study based on data with no manipulation used |

retrospective study | subjects are selected and than their previous conditions or behaviors are determined. Not based on random sampling Focus on estimating differences between groups or associations between variables |

prospective study | subjects are followed to observe future outcomes. No treatment are applied. Not an experiment. Focus on estimating differences among groups during the study |

experiment | manipulates factor levels to create treatments, randomly assigns subjects to these treatments levels. Compares the responses of subject groups across treatments levels. |

random assignment | an experiment must assign experimental units to treatment groups at random. |

factor | a variable whose levels are controlled by the experimenter. |

response | a variable whose values are compared across different treatments. |

experimental units | individuals on whom an experiment is performed. Can be called subjects or participants |

level | the specific values that the experimenter chooses for a factor |

treatment | the process, intervention, or other controlled circumstance applied to randomly assigned experimental units. the Explanatory variable |

Principles of experimental design | Control, Randomize, Replicate, Block |

Control | make conditions as similar as possible for all treatment groups. |

Randomize | equalize the effects of unkown or uncontrollable source of variation |

Replicate | over as many subjects as possible. Results from a single subject are just anecdotes. |

Block | the only difference in the control group and the experimental group is the 1 thing we are testing. |

statistically significant | when an observed difference is too lrge for us to believe that it is likely to have occured naturally. |

control group | the experimental units assigned to a baseline treatment level. |

Blinding | don't let the patient know if they are in the control group or the experimental group. |

single-blind, double blind | these are two main classes of individuals who can affect the outcome of an experiment: those who could influence the results (subjects, technicians). Those who evaluate the results (juges, physicians) |

placebo effect | people think they feel differently just because they know they are being tested |

2 ways to replicate | use several subjects, or replicate the entire experiment on another group |

extraneous factors | factors that are not being experimented with but may be influencing the outcome. Eliminate this by blocking. |

confounded variables | factors that can't be distinguished between which one is affecting the outcome. |

placebo | a treatment known to have no effect. |

match | reduces unwanted variation |

2 types of designs | completely randomized design, and randomized block design |

completely randomized design | all experimental units have an equal chance of receiving any treatment |

randomized block design | the randomization occurs only within blocks |

## Stats basics, chapters 1, 2

Question | Answer |
---|---|

Statistics Paradigm | Population -> Sample -> Statistics |

Parameter | measurement of a population |

Statistic | Measurement of a sample |

Error formula | 1/(sq rt of N) |

Variable | something with more than one value (ex age, weight, grade) |

Explanatory Variable | Explains why the study's being done/how groups are broken up |

Mean | Average |

Confidence Interval | is set by the error (percent + and – the error), Usually 95% |

3 Key Components of Statistical Studies | Design (logistics), Description (data), Inference (not in descriptive studies, the outcome; assumed/projected) |

Area under a bell/normal curve | 1 |

Categorical Variable | gender, race, etc |

Numerical/Quantitative Variable | Discrete = integers and continuous = fractions |

Integers | numbers without fractions/decimals |

## Parameter vs Statistic & Sample vs Population

Question | Answer |
---|---|

population | is the entire group of objects you want to study. ex: scores, people, measurements |

sample | is a smaller subset chosen from the population and a representative of the population. |

parameter | is a number which describes a property of only a sample. |

random sample | every object in the population is equally likely to be picked for the sample. ex: pick name out of hats. |

systematic sample | every Kth object is chosen for the sample. ex: think assembly line pick every 10th computer off the line. |

convenience/volunteer/self-selected | are non scientific approaches that will not lead to a representative sample. ex: online surveys, phone polling, restaurant surveys |

cluster sample | is the method that picks groups randomly from the population instead of pick one object. Every object in randomly selected groups forms our sample. |

stratified sample | is the method when we divide the entire population into meaningful groups. Ex: republican and democrats, male or female randomly sample to fill each group |

1st: randomly pick groups from population. 2nd: sample is every object from the groups |
cluster sample |

1st: subdivide population with named groups. 2nd: randomly select objects from each group. |
stratified sample |

quantitative data | is numeric data in which you can count. ex: ages, weight refers to data type not a level of measurement |

categorical (qualitative) data | is NOT numeric but instead you break them into categories by labels. ex: eye color, letter grades NOT how many people. |

ratio level | means 0=None Can not have negative numbers ex: age, length, weight, measurement of amounts |

interval level | 0 not equal to NONE. can be negative. ex: temperature |

ordinal level | categories have a built in order. reordering would be confusing. ex: letter grades a,b,c,d smallest to largest |

nominal level | categories can be put in any order and not be confusing. can not be arranged in an ordering scheme. ex: eye color, names, labels, categories. |

discrete data | data you can count. "number of" |

continuous data | data you can measure ex: height, length, age |

## Stats Quiz 5/29

Question | Answer |
---|---|

Cross-sectional study | data collected at a fixed point in time |

Retrospective (case controlled) study | data collected about the past (records, interviews, etc) |

Prospective study | data collected in the future from groups sharing common factors |

Randomization | Randomly select subjects for different groups |

Replication | Results can be repeated on more than one subject to reach the same result |

Blinding | Subjects do not know which group they are in (placebo) |

Double blinding | Researcher doesn't know which group the subject is in |

Placebo effect | An untreated subject reports and improvement in symptoms |

Confounding | occurs in an experiment when you are unable to distinguish the effects of different factors |

Completely Randomized Experimental Design | Assign subjects to different treatment groups through random selection |

Randomized Block Design | Form blocks of subjects with similar characteristics Randomly assign subjects within the blocks |

Rigorously Controlled Design | Carefully assign subjects to different treatments so that subjects for a particular treatment are important to the researcher. |

Matched Pairs Design | Compare exactly two treatment groups with subjects that are matched to have similar characteristics |

Sampling error | Results do not match results in whole population |

Nonsampling error | Occurs when data is incorrectly collected or analyzed |

Center | representative value for “the middle” of the data set |

Variation | measure of the amount of data that values vary |

Distribution | shape of the spread of the data |

Outliers | Sample values that lie far away from the majority of other values |

Frequency Distribution | also known as a frequency table, it shows how the data set is partitioned over various categories given by listing the categories and the number of data values in each category |

Lower-class limits | Smallest numbers that can belong to different classes |

Upper-class limits | Largest numbers that can belong to different classes |

Class boundaries | Centers in the gaps between upper and lower class limits of successive classes |

## Chapter 1.3-1.5

Term | Definition |
---|---|

Parameter | Measure of the whole population describing a characteristic |

Statistic | Measure of a sample describing some characteristic (not the whole population) |

Quantitative data | Data expressed by numbers |

Categorical data | Data that consists of names or labels that are not expressed in numbers |

Discrete data | Values are finite or countable |

Continuous data | Infinitely many possible values |

Nominal level of measurement | characterized by data that consists of names or labels; not ranked |

Ordinal level of measurement | data can be ordered but differences do not make sense |

Interval level of measurement | Difference between data is quantitative but there is no natural starting point |

Ratio level of measurement | data can be ordered, differences make sense, and there is a natural starting point |

Voluntary response sample | Respondents decide themselves whether to be included |

Problems with voluntary response sample | Strong opinions pervade, and inherent bias exist |

Correlation | When two events are somehow connected |

Causation | When one event causes another event |

Reporter bias | when respondents aim to please the researcher |

Small samples | not always indicative of the whole population, even if properly collective |

Loaded question | When strong wording skews responses |

Order of questions | structure of sentence can contributes to responses |

Non-response | when a person either refuses to respond to a survey question or is unavailable |

Missing data | Data values are missing for many factors |

Self-interest study | Researcher desires a certain conclusion and skews study methods in favor of that conclusion |

Observational study | measure specific characteristics but don't attempt to modify the subjects |

Experimental study | Apply a treatment and proceed to observe its effects |

Simple random sample | sample of size n is a selection of n subjects is chosen in such a way so that every group of n subjects has an equal chance of being chosen |

Random sample | members of the population chosen in such a way that every individual is equally likely to be chosen |

Probability sample | select members from the population in such a way that each member is chosen with a pre-selected probability |

Systematic sampling | select some starting point and select every kth person |

Convenience sampling | sampling from a group convenient to the researcher |

## Ellis Third Exam- 4/3/12

Question | Answer |
---|---|

A type 1 error is the result of | Incorrectly rejecting the NULL hypothisis |

A research article results of a test using dependent means as (38)=3.11. <01 | the result is significant |

When conducting a test for independent means a typical research hypothesis might be | The mean of population 1 is greater than the mean of population 2 |

A researcher tests whether a new teaching method is more effective than the old one. What is the RESEARCH hypothesis? | there is no difference in effectiveness between the old teaching method and the new teaching method |

A research strategy in which each person is tested more than once is known as: | any of the above |

A one tailed test is especially associated with: | the research hypothesis |

Which of the following is the most likely way for results of a test for dependent means to be presented in a research article for a study with 25 participants? | (24)<significant |

A researcher wants to know if a new type of exercise improves peoples health. Would this be a one tailed or two-tailed test and why? | one-tailed because the study is only interested in whether the exercise increased health |

In a chi-square test, the variables are: | categorical (nominal) |

Which of the following is the best way to reduce the variances in the distributions of means when conducting a test for independent means? | increased the size of the samples |

In which situation below would you use a test for dependent means? | To compare the level of reading comprehension of students at the beginning of a speed-reading class to their level of reading comprehension at the end of the class |

Before running statistical analyses, researchers should check their data for | all of the above |

A result is considered statistically significant when a sample value is so extreme that: | the null hypothesis is rejected |

Once a researcher has an idea for a research question, the next step is to | develop a specific research plan to address the question |

A researcher test whether there is any difference between how fast people work in the morning versus how fast they work in the evening, What is NULL hypothesis? | There is no difference in the speed at which people work. |

In the discussion section of a research article, one should | all of the above |

Before embarking on a new study, experienced researchers plan what statistical method (s) they will use when the study is complete. Why is it important to carry out this step? | all of the above |

What are the generally accepted cutoff points in hypothesis testing in psychology? | .01 and .05 |

In what section of a research article should the authors describe each analysis in a systematic fashion? | Methods |

Which of the following is true about distributions? | For any given sample size there are between two and-1 appropriate distributions. |

An analysis of variance differs from a test for independent means in that an analysis a variance | can be used to compare three or more groups, while a test for the independent mean cannot be used to compare more than two groups |

what is a hypothesis | a prediction about the results of the research study |

Another name for a research hypothesis is the | alternative hypothesis |

the set of frequencies obtained in actual frequency distribution are the | observed frequencies |

a chi square test of significance is essentially considered with. | the distinction between expected and observed frequency |

A researcher takes a sample and wants to compare the results to the population from which it is drawn. The indepent is gender and the depended variable is yes or no response to weather they favor the abortion. Which test would the researcher use. | a difference between means test |

What is the research hypothesis | the exercise will reduce the rate of heart attacks |

What is the NULL hypothesis? | the exercise will increase rate of heart attacks |

A researcher claims 62% of voters favor gun control | H0:p=0.62 H1:p=/ 0.62 ANSWER D |

How do you set up a hypothesis testing problem | you set it up to test the opposite of what you predict will happen. |

Other names for the test for dependent means include all of the following EXCEPT | test for match pairs |

SIX IS WAY TO LONG | |

The main idea of a chi square test is that you | compare population means to see if they vary from each other more than by chance. |

When conducting a test for independent means | you reject the null hypothesis if the score is more extreme than the cutoff score |

If you know the samples variance but not the populations variance | you can look up populations variance on the table. |

## Statistics for the Behavorial Sciences 9th Edition Chap. 9

Question | Answer |
---|---|

(Sm) used as an estimate of the real standard error (Om), when the value of O is unknown. Computed from the sample variance or sample standard deviation and provides an estimate of the standard distance between a sample mean M and the population mean U. | estimated standard error |

used to test hypotheses about an unknown population mean, U, when the value of O is unknown. | t statistic |

describes the number of scores in a sample that are independent and free to vary. (n -1) | degrees of freedom |

the complete set of t values computed for every possible random sample size (n) or a specific degrees of freedom (df). Approximates the shape of normal distribution. | t distribution |

Under what circumstances is a t statistic used instead of a z-score for a hypothesis test? | A t statistic is used instead of a z-score when the population standard deviation and variance are not known. |

A sample of n=9 scores has SS = 288. Compute the variance for the sample. | 36 |

A sample of n=9 scores has SS = 288. Compute the estimated standard error for the sample mean. | 2 |

True or False. In general a distribution of t statistics is flatter and more spread out than the standard normal distribution. | True – As sample size and df increase, the variability in the t distribution decreases, and more closely resembles a normal distribution. |

A researcher reports a t statistic with df = 20. How many individuals participated in the study. | n = 21 |

For df=15, find the value(s) of t associated with the top 5% of the distribution. | +1.753 |

For df=15, find the value(s) of t associated with the middle 95% of the distribution. | +-2.131 |

For df=15, find the value(s) of t associated with the middle 99% of the distribution. | +-2.947 |

Sample = n=4, U=40 Treatment sample = M=44, variance s2=16. Is this sample sufficient to conclude that the treatment has a significant effect? | No – Fail to reject H0, treatment does not have a significant effect |

Sample = n=4, U=40 Treatment sample = M=44, variance s2=16. If all factors remained constant and sample size increased to n = 16, is sample sufficient to prove significant effect? | Yes, Reject NO. Treatment has significant effect. |

an interval or range of values, centered around a sample statistic. | confidence interval |

If all other factors are held constant, an 80% confidence interval is wider than a 90% confidence interval. (True or False?) | False – Greater confidence requires wider interval. |

If all other factors are held constant, a confidence interval computed from a sample of n=25 is wider than a confidence interal computed from a sample of n = 100. | True. The smaller sample produces a wider interval. |