ECN225-EXAM1 FURMAN ROE

Compare a 'population' to a 'sample'. Pop = set of all measurements of interestSample = subset of the population
Compare a 'parameter' to a 'statistic'. Parameter = a # deduced from the populationStatistic = a # taken from the sample data
What is the problem with populations/parameters? Or, why do we use samples/statistics more frequently? Populations and parameters are very difficult to gather. Stats gives us an accurate account of the larger groups information.
Define 'mean'. The sum of the observations divided by the # of observations. (average)
Define 'median'. The value in the middle of the data set when they are organized lowest to highest. This is averaged when there are two numbers. (middle)
Define 'mode'. The value that occurs with the greatest frequency.
Define and calculate a 'percentile'. Def – The pth percentile is a value where p percent of all observations are less than or equal to this value. i = (p/100)n, where n is the number of values. 'i' is the i'th number in the ordered list of data. note: the 50th percentile is also the median.
Calculate the 'quartiles'. Q1 :: i=(25/100)n , Q2 :: i=(50/100)n , Q3 :: i=(75/100)n
Calculate 'range'. largest value – smallest value = range
Calculate 'Interquartile Range (IQR)'. IQR = [Q1 – Q3] , Q1 :: i=(25/100)n , Q3 :: i=(75/100)n
Define the 'variance' and calculate sample variance. The measure of variability around the mean. Sample variance (denoted as s^2) = (sum of all squared deviations)/(n – 1) where "deviations" is (x'i – mean)
Define and calculate the 'standard deviation'. The standard deviation is the positive square root of the variance.
Calculate the 'coefficient of variation'. ((standard deviation / mean) x 100)%
Define and calculate the 'z-score'. aka 'the standardized value'. The number of standard deviations the value is away from the mean. (x'i – mean)/(sample standard deviation)
Define 'Chebyshev's Theorem'. At least (1 – 1/z^2) of the data values must be within z standard deviations of the mean, where z is any value greater than 1.
Define 'empirical rule'. *only used when symmetrical, bell-curve distribution* 68% of data is within 1 standard deviation, 95% is within 2 sd, and almost all is within 3 sd.
Explain how to detect an outlier. An outlier has a z-score of 3 or more (it is 3 or more standard deviations away from the mean).
Combinations
Permeutations
Draw a Tree Diagram.
Combinations nCr
Define 'Intersection'. The points belonging to A and B.
Define 'mutually exclusive'. Neither A nor B have any similar points.

CH 5, 6, 7, & 8

Raw score An original, untransformed observation or measurement.
Z-score A standardized score with a sign that indicates direction from the mean (+ above µ and – below µ), and a numerical value equal to the distance from the mean measured in standard deviation.
Z-score transformation A transformation that changes raw scores (X values) into z-scores.
Standard score A score that has been transformed into a standard from.
Standardized distribution An entire distribution that has been transformed to create predetermined values for µ and Theta
Z= X-µ/ O
ZO = X-µ = deviation score
X = µ+ZO
Probability Probability is defined as a proportion, a specific part out of the whole setoff possibilities.
Proportion A part of the whole usually expressed as a fraction
Random sample A sample obtained using a process that gives every individual an equal chance of being selected constant over a series of selections
Sampling with replacement A sampling technique that returns the current selection to the population before the next selection is made. A required part of random sampling.
Independent events Two events are independent if the occurrence of either one has no effect on the probability that the other will occur.
Normal distribution A symmetrical, bell-shaped distribution with proportions corresponding to those listed in the unit normal table.
Unit normal table A table listing proportions corresponding to each Z-score location in a normal distribution.
Percentile A score that is identified by the percentage of the distribution that falls below a specific score.
Percentile rank The percentage of a distribution that falls below a specific score.
Binomial distribution the distribution of probabilities, for each possible outcome, for a series of observations of a dichotomous variable.
(A)p = Number of ways event A can occur / Total number of possible outcomes
z= X – pn / vnpq
µ = pn
O = vnpq
Distribution of sample means The set of sample means from all the possible random samples for a specific sample size (n) from a specific population
Sampling distribution a distribution of statistics (as opposed to a distribution of scores). The distribution of sample means is an example of a sampling distribution.
Expected value of M The mean of the distribution of sample means. The average of the M values.
Standard error of M The standard deviation of the distribution of sample means. The standard distance between a sample mean and the population mean.
The central limit theorem A mathematical theorem that specifies the characteristics of the distribution of sample means.
Om= O/vn or v((O^2)/n)
Z = M-µ / Om
Hypothesis testing A statistical procedure that uses data from a sample to test a hypothesis about a population
Null hypothesis, Ho The null hypothesis states that there is no effect, no difference, or no relationship.
Alternative hypothesis, H1 The alternative hypothesis states that there is an effect, there is a difference, or there is a relationship.
Type I error A type I error is rejecting a true null hypothesis. You have concluded that a treatment does have an effect when actually it does not.
Type II error A type II error is failing to reject a false null hypothesis. The test fails to detect a real treatment effect.
Alpha (a) Alpha is a probability value that defines the very unlikely outcomes if the mull hypothesis is true. Alpha also is the probability of committing a Type I error.
Level of significance The level of significance is the alpha level, which measures the probability of a Type I error.
Critical region The critical region consists of outcomes that are very unlikely to be obtained if the null hypothesis is true. The term very unlikely is defined by (alpha) a.
Test statistic A statistic that summarizes the sample data in a hypothesis test. The test statistic is used to determine whether or not the data are in the critical region.
Beta (?) Beta is the probability of a Type II error.
Directional (one-tailed) test A directional test is a hypothesis test that includes a directional prediction in the statement of the hypotheses and place the critical region entirely in one tail of the distribution.
Effect size A measure of the size of the treatment effect that is separate from the statistical significance of the effect.

Power The probability that the hypothesis test will reject the mull hypothesis when there actually is a treatment effect
(Type I Error) p a
Type II Error) p ?
Cohen’s d Mean difference / Standard Deviation = M -µ / O

Vocabulary

Statistics the study of how to collect, organize, analyze, and interpret numerical imformation from data
Individuals people or objects included in the study
Variable the characteristic of the individual to be measured or observed
Quantitative variable has a value or numerical measurement for which operations such as addition or averaging make sense
Qualitative variable describes an individual by placing the individual into a category or group such as male or female
Population data variable is from every individual of interest
Sample data variable is from only some of the individuals of interest
Descriptive statistics involves methods of organizing, picturing, and summarizing information from samples or populations
Inferential statistics involves methods of using information from a sample to draw conclusions regarding the population
Nominal We can put the data into categories
Ordinal We can order the data from worst to best
Interval We can order the data and also take the differences between the data values. Does not include zero
Ratio We can order the data, take differences, and also fine the ratio between data values. Does include zero
Census measurements or observations from the entire population are used
Sample measurements or observations from a representative part of the population should be used
Observational study observations and measurements of individuals are conducted in a way that doesn't change the response or the variable being measured
Experiment a treatment is deliberately imposed on the individuals in order to observe a possible change in the response or variable being measured
Control group group that receives a dummy treatment disguised as the real treatment
Confounding variables variables taht might be an underlying cause of a change in response in the experiment group
Randomization used to assign individuals to the treatment groups
Replication reduces the possibility that the differences in pain relief for the two groups occured by chance alone.
Statistics the study of how to collect, organize, analyze, and interpret numerical imformation from data
Individuals people or objects included in the study
Variable the characteristic of the individual to be measured or observed
Quantitative variable has a value or numerical measurement for which operations such as addition or averaging make sense
Qualitative variable describes an individual by placing the individual into a category or group such as male or female
Population data variable is from every individual of interest
Sample data variable is from only some of the individuals of interest
Descriptive statistics involves methods of organizing, picturing, and summarizing information from samples or populations
Inferential statistics involves methods of using information from a sample to draw conclusions regarding the population
Nominal We can put the data into categories
Ordinal We can order the data from worst to best
Interval We can order the data and also take the differences between the data values. Does not include zero
Ratio We can order the data, take differences, and also fine the ratio between data values. Does include zero
Census measurements or observations from the entire population are used
Sample measurements or observations from a representative part of the population should be used
Observational study observations and measurements of individuals are conducted in a way that doesn't change the response or the variable being measured
Experiment a treatment is deliberately imposed on the individuals in order to observe a possible change in the response or variable being measured
Control group group that receives a dummy treatment disguised as the real treatment
Confounding variables variables taht might be an underlying cause of a change in response in the experiment group
Randomization used to assign individuals to the treatment groups
Replication reduces the possibility that the differences in pain relief for the two groups occured by chance alone.

States vocabulary

Observational study a study based on data with no manipulation used
retrospective study subjects are selected and than their previous conditions or behaviors are determined. Not based on random sampling Focus on estimating differences between groups or associations between variables
prospective study subjects are followed to observe future outcomes. No treatment are applied. Not an experiment. Focus on estimating differences among groups during the study
experiment manipulates factor levels to create treatments, randomly assigns subjects to these treatments levels. Compares the responses of subject groups across treatments levels.
random assignment an experiment must assign experimental units to treatment groups at random.
factor a variable whose levels are controlled by the experimenter.
response a variable whose values are compared across different treatments.
experimental units individuals on whom an experiment is performed. Can be called subjects or participants
level the specific values that the experimenter chooses for a factor
treatment the process, intervention, or other controlled circumstance applied to randomly assigned experimental units. the Explanatory variable
Principles of experimental design Control, Randomize, Replicate, Block
Control make conditions as similar as possible for all treatment groups.
Randomize equalize the effects of unkown or uncontrollable source of variation
Replicate over as many subjects as possible. Results from a single subject are just anecdotes.
Block the only difference in the control group and the experimental group is the 1 thing we are testing.
statistically significant when an observed difference is too lrge for us to believe that it is likely to have occured naturally.
control group the experimental units assigned to a baseline treatment level.
Blinding don't let the patient know if they are in the control group or the experimental group.
single-blind, double blind these are two main classes of individuals who can affect the outcome of an experiment: those who could influence the results (subjects, technicians). Those who evaluate the results (juges, physicians)
placebo effect people think they feel differently just because they know they are being tested
2 ways to replicate use several subjects, or replicate the entire experiment on another group
extraneous factors factors that are not being experimented with but may be influencing the outcome. Eliminate this by blocking.
confounded variables factors that can't be distinguished between which one is affecting the outcome.
placebo a treatment known to have no effect.
match reduces unwanted variation
2 types of designs completely randomized design, and randomized block design
completely randomized design all experimental units have an equal chance of receiving any treatment
randomized block design the randomization occurs only within blocks

Stats basics, chapters 1, 2

Statistics Paradigm Population -> Sample -> Statistics
Parameter measurement of a population
Statistic Measurement of a sample
Error formula 1/(sq rt of N)
Variable something with more than one value (ex age, weight, grade)
Explanatory Variable Explains why the study's being done/how groups are broken up
Mean Average
Confidence Interval is set by the error (percent + and – the error),
Usually 95%
3 Key Components of Statistical Studies Design (logistics), Description (data), Inference (not in descriptive studies, the outcome; assumed/projected)
Area under a bell/normal curve 1
Categorical Variable gender, race, etc
Numerical/Quantitative Variable Discrete = integers and continuous = fractions
Integers numbers without fractions/decimals

Parameter vs Statistic & Sample vs Population

population is the entire group of objects you want to study.
ex: scores, people, measurements
sample is a smaller subset chosen from the population and a representative of the population.
parameter is a number which describes a property of only a sample.
random sample every object in the population is equally likely to be picked for the sample.
ex: pick name out of hats.
systematic sample every Kth object is chosen for the sample.
ex: think assembly line pick every 10th computer off the line.
convenience/volunteer/self-selected are non scientific approaches that will not lead to a representative sample.
ex: online surveys, phone polling, restaurant surveys
cluster sample is the method that picks groups randomly from the population instead of pick one object.
Every object in randomly selected groups forms our sample.
stratified sample is the method when we divide the entire population into meaningful groups.
Ex: republican and democrats, male or female
randomly sample to fill each group
1st: randomly pick groups from population.
2nd: sample is every object from the groups
cluster sample
1st: subdivide population with named groups.
2nd: randomly select objects from each group.
stratified sample
quantitative data is numeric data in which you can count.
ex: ages, weight
refers to data type not a level of measurement
categorical (qualitative) data is NOT numeric but instead you break them into categories by labels.
ex: eye color, letter grades
NOT how many people.
ratio level means 0=None
Can not have negative numbers
ex: age, length, weight, measurement of amounts
interval level 0 not equal to NONE. can be negative.
ex: temperature
ordinal level categories have a built in order. reordering would be confusing.
ex: letter grades a,b,c,d
smallest to largest
nominal level categories can be put in any order and not be confusing. can not be arranged in an ordering scheme.
ex: eye color, names, labels, categories.
discrete data data you can count. "number of"
continuous data data you can measure
ex: height, length, age

Stats Quiz 5/29

Cross-sectional study data collected at a fixed point in time
Retrospective (case controlled) study data collected about the past (records, interviews, etc)
Prospective study data collected in the future from groups sharing common factors
Randomization Randomly select subjects for different groups
Replication Results can be repeated on more than one subject to reach the same result
Blinding Subjects do not know which group they are in (placebo)
Double blinding Researcher doesn't know which group the subject is in
Placebo effect An untreated subject reports and improvement in symptoms
Confounding occurs in an experiment when you are unable to distinguish the effects of different factors
Completely Randomized Experimental Design Assign subjects to different treatment groups through random selection
Randomized Block Design Form blocks of subjects with similar characteristics
Randomly assign subjects within the blocks
Rigorously Controlled Design Carefully assign subjects to different treatments so that subjects for a particular treatment are important to the researcher.
Matched Pairs Design Compare exactly two treatment groups with subjects that are matched to have similar characteristics
Sampling error Results do not match results in whole population
Nonsampling error Occurs when data is incorrectly collected or analyzed
Center representative value for “the middle” of the data set
Variation measure of the amount of data that values vary
Distribution shape of the spread of the data
Outliers Sample values that lie far away from the majority of other values
Frequency Distribution also known as a frequency table, it shows how the data set is partitioned over various categories given by listing the categories and the number of data values in each category
Lower-class limits Smallest numbers that can belong to different classes
Upper-class limits Largest numbers that can belong to different classes
Class boundaries Centers in the gaps between upper and lower class limits of successive classes

Chapter 1.3-1.5

Term Definition
Parameter Measure of the whole population describing a characteristic
Statistic Measure of a sample describing some characteristic (not the whole population)
Quantitative data Data expressed by numbers
Categorical data Data that consists of names or labels that are not expressed in numbers
Discrete data Values are finite or countable
Continuous data Infinitely many possible values
Nominal level of measurement characterized by data that consists of names or labels; not ranked
Ordinal level of measurement data can be ordered but differences do not make sense
Interval level of measurement Difference between data is quantitative but there is no natural starting point
Ratio level of measurement data can be ordered, differences make sense, and there is a natural starting point
Voluntary response sample Respondents decide themselves whether to be included
Problems with voluntary response sample Strong opinions pervade, and inherent bias exist
Correlation When two events are somehow connected
Causation When one event causes another event
Reporter bias when respondents aim to please the researcher
Small samples not always indicative of the whole population, even if properly collective
Loaded question When strong wording skews responses
Order of questions structure of sentence can contributes to responses
Non-response when a person either refuses to respond to a survey question or is unavailable
Missing data Data values are missing for many factors
Self-interest study Researcher desires a certain conclusion and skews study methods in favor of that conclusion
Observational study measure specific characteristics but don't attempt to modify the subjects
Experimental study Apply a treatment and proceed to observe its effects
Simple random sample sample of size n is a selection of n subjects is chosen in such a way so that every group of n subjects has an equal chance of being chosen
Random sample members of the population chosen in such a way that every individual is equally likely to be chosen
Probability sample select members from the population in such a way that each member is chosen with a pre-selected probability
Systematic sampling select some starting point and select every kth person
Convenience sampling sampling from a group convenient to the researcher

Ellis Third Exam- 4/3/12

A type 1 error is the result of Incorrectly rejecting the NULL hypothisis
A research article results of a test using dependent means as (38)=3.11. <01 the result is significant
When conducting a test for independent means a typical research hypothesis might be The mean of population 1 is greater than the mean of population 2
A researcher tests whether a new teaching method is more effective than the old one. What is the RESEARCH hypothesis? there is no difference in effectiveness between the old teaching method and the new teaching method
A research strategy in which each person is tested more than once is known as: any of the above
A one tailed test is especially associated with: the research hypothesis
Which of the following is the most likely way for results of a test for dependent means to be presented in a research article for a study with 25 participants? (24)<significant
A researcher wants to know if a new type of exercise improves peoples health. Would this be a one tailed or two-tailed test and why? one-tailed because the study is only interested in whether the exercise increased health
In a chi-square test, the variables are: categorical (nominal)
Which of the following is the best way to reduce the variances in the distributions of means when conducting a test for independent means? increased the size of the samples
In which situation below would you use a test for dependent means? To compare the level of reading comprehension of students at the beginning of a speed-reading class to their level of reading comprehension at the end of the class
Before running statistical analyses, researchers should check their data for all of the above
A result is considered statistically significant when a sample value is so extreme that: the null hypothesis is rejected
Once a researcher has an idea for a research question, the next step is to develop a specific research plan to address the question
A researcher test whether there is any difference between how fast people work in the morning versus how fast they work in the evening, What is NULL hypothesis? There is no difference in the speed at which people work.
In the discussion section of a research article, one should all of the above
Before embarking on a new study, experienced researchers plan what statistical method (s) they will use when the study is complete. Why is it important to carry out this step? all of the above
What are the generally accepted cutoff points in hypothesis testing in psychology? .01 and .05
In what section of a research article should the authors describe each analysis in a systematic fashion? Methods
Which of the following is true about distributions? For any given sample size there are between two and-1 appropriate distributions.
An analysis of variance differs from a test for independent means in that an analysis a variance can be used to compare three or more groups, while a test for the independent mean cannot be used to compare more than two groups
what is a hypothesis a prediction about the results of the research study
Another name for a research hypothesis is the alternative hypothesis
the set of frequencies obtained in actual frequency distribution are the observed frequencies
a chi square test of significance is essentially considered with. the distinction between expected and observed frequency
A researcher takes a sample and wants to compare the results to the population from which it is drawn. The indepent is gender and the depended variable is yes or no response to weather they favor the abortion. Which test would the researcher use. a difference between means test
What is the research hypothesis the exercise will reduce the rate of heart attacks
What is the NULL hypothesis? the exercise will increase rate of heart attacks
A researcher claims 62% of voters favor gun control H0:p=0.62
H1:p=/ 0.62
How do you set up a hypothesis testing problem you set it up to test the opposite of what you predict will happen.
Other names for the test for dependent means include all of the following EXCEPT test for match pairs
SIX IS WAY TO LONG
The main idea of a chi square test is that you compare population means to see if they vary from each other more than by chance.
When conducting a test for independent means you reject the null hypothesis if the score is more extreme than the cutoff score
If you know the samples variance but not the populations variance you can look up populations variance on the table.

Statistics for the Behavorial Sciences 9th Edition Chap. 9

(Sm) used as an estimate of the real standard error (Om), when the value of O is unknown. Computed from the sample variance or sample standard deviation and provides an estimate of the standard distance between a sample mean M and the population mean U. estimated standard error
used to test hypotheses about an unknown population mean, U, when the value of O is unknown. t statistic
describes the number of scores in a sample that are independent and free to vary. (n -1) degrees of freedom
the complete set of t values computed for every possible random sample size (n) or a specific degrees of freedom (df). Approximates the shape of normal distribution. t distribution
Under what circumstances is a t statistic used instead of a z-score for a hypothesis test? A t statistic is used instead of a z-score when the population standard deviation and variance are not known.
A sample of n=9 scores has SS = 288. Compute the variance for the sample. 36
A sample of n=9 scores has SS = 288. Compute the estimated standard error for the sample mean. 2
True or False. In general a distribution of t statistics is flatter and more spread out than the standard normal distribution. True – As sample size and df increase, the variability in the t distribution decreases, and more closely resembles a normal distribution.
A researcher reports a t statistic with df = 20. How many individuals participated in the study. n = 21
For df=15, find the value(s) of t associated with the top 5% of the distribution. +1.753
For df=15, find the value(s) of t associated with the middle 95% of the distribution. +-2.131
For df=15, find the value(s) of t associated with the middle 99% of the distribution. +-2.947
Sample = n=4, U=40 Treatment sample = M=44, variance s2=16. Is this sample sufficient to conclude that the treatment has a significant effect? No – Fail to reject H0, treatment does not have a significant effect
Sample = n=4, U=40 Treatment sample = M=44, variance s2=16. If all factors remained constant and sample size increased to n = 16, is sample sufficient to prove significant effect? Yes, Reject NO. Treatment has significant effect.
an interval or range of values, centered around a sample statistic. confidence interval
If all other factors are held constant, an 80% confidence interval is wider than a 90% confidence interval. (True or False?) False – Greater confidence requires wider interval.
If all other factors are held constant, a confidence interval computed from a sample of n=25 is wider than a confidence interal computed from a sample of n = 100. True. The smaller sample produces a wider interval.