Statistical Evaluation of Data
Chapter 15: Statistical Evaluation of Data
Study Aids and Important terms and definitions
Many of you have been inquiring about how to better prepare for the quizzes.
One way is to use the online resources that accompany the textbook.
The book website is located at:
Select the chapter you want to view.
Any item with a symbol of a lock next to it can only be viewed by instructors.
Other, non-locked, items can be viewed by students and for each chapter there is a glossary, flash cards that you can set to view either a word or its definition first, a crossword puzzle, and a practice quiz.
Chapter 15 vocabulary words and terms you should know the definition of include:
Notice that this chapter has a very large number of vocaulary terms - statistics is like that!
descriptive statistics sampling error
inferential statistics hypothesis test
statistic null hypothesis
parameter standard error
frequency distribution test statistic
histogram alpha level or level of significance
polygon significant result or statistically significant result
bar graph Type I error
central tendency Type II error
mean effect size
median Cohen’s d
mode percentage of variance accounted for
bimodal distribution confidence interval
multimodal distribution independent measures t-test
variability repeated-measures t test
standard deviation single-factor analysis of variance or one-way
variance ANOVA
degrees of freedom, df error variance
line graph post hoc tests or post tests
correlation two-factor analysis of variance or two-way
scatter plot ANOVA
Pearson correlation parametric test
Spearman correlation nonparametric test
regression chi-square test for independence
regression equation split-half reliability
slope constant Spearman-Brown formula
Y-intercept Kuder-Richardson formula 20, or K-R 20
multiple regression Cronbach’s alpha
multiple-regression equation
The role of statistics in the research process (p.432)
You know it was coming didn't you?
It's just not possible to talk about research methods without some discussion of statistics!
The aim of statistics is to:
- summarize the data to see what happened
- summarize the data so it can be easily communicated to others
- determine what conclusions are justified by the results (significant testing)
We can simplify these puproses to description and analysis
Descriptive statisitcs are used to describe samples and summarize the results of studies
Inferential statistics are used to make generalizations about populations based on results obtained from samples
read the previosu sentence again...
It is a good summary of the purpose of statistical testing.
The researcher conducts a study with a sample, and statistical analysis of the results tells the researcher (in part) if the findings can be generalized from a sample to a population
Plan your statistics earlier rather than later
This advice applies to all research - including your research proposal paper
even though the results section of a study is near the end, the researchers plans the statistical tests that will be used well in advance.
The researcher has to make sure that the method of the study is compatible with the planned data analysis, for instance, the number of groups, the number of variables, the number of particpants needed to conduct the study.
The research design and statiscs used to summarize and analyze the results depend on the research question the investigator has in mind.
Statistics terminology ---- what is a statistic anyway? (p. 433)
A statisitc is a summary value that describes a sample, for isntance, the mean age of a sample, the median income of a smaple, the mean test score of a sample
A parameter is a score or value that describes a population while a statistic is a value or scor ethat describes a sample selected from that population.
For instance, we may have a sample that obtains an average IQ score of 101, and conclude that - with respect to IQ - the sample closely corresponds to the parameter of an IQ of 100 in the popualtion. On the other hand, the sample average would differ from the parameter for college students where the average IQ is higher than 100.
And if our sample was composed fo 65% women and 35% men, that would not closely match the parameter for sex ratio in the US population of 50.8% women to 49.2% men
Descriptive Statistics (p. 434)
As was mentioned above, descriptive statistics help us organize scores or values into tables and graphs that allow researchers to see a set of scores and allow the researcher to compute summary values, such as averages or medians, that describe the entire sample
Frequency distributions
a frequency distribution in the form of a table or graph is used to organize and illustrate scores for a sample by showing the frequency of each score (or range of scores) in the enitre sample or population.
For example, the image below is of a simple frequency distribution chart of people of different ages in the US population
(image of chart showing the % of the population in 5 different age groups)
Her is the same frequency distribution of ages of people in the US population shown in the form of a graph, in this case a bar graph.
(image of bar graph showing the % of th epopulation in 5 different age groups)
Measures of central tendency (p. 437)
As the name suggests, the CENTRAL tendency is a single number that describes the CENTER of a distribution.
The purpose of reporting the central tendency is to summarize a group with the single number that is most typical or representative of the group.
While the average score is most commonly used, there are three differetn ways to measure a central tendency, the mean, the median, and the mode
- mean: the average, obtained by adding the total of all scores and dividing by the number of scores
- median: the score that divides the distribution in half such that half the scores are higher than the median and half are lowerscore
- mode: the score that appears most frequently
The average is used most often
If the sample is normally distributed, that is a graph of the distribution looks like a normal or bell curve, then the median and the mean are about the same.
But sometimes they are very different and in thsoe cases it may be preferable to report both the median or both the mean and the median.
The median is used when the distribution is lopsided, that is when there are many scores at one end of the distribution. For example, in 2004 the median annual income in the US was about $43,400 and the mean income was $60,500!!
Why are they so different? Because many of the half of households tha tearn more than $43,400 earn A LOT more than $43,400 - milliniares raise the mean.
The median is reported more often, because it gives a more accurate idea of the 'typical' household.
The mode is most often used when the value of interest is non-numeric. For instnace, we could say that the modal student at UCF is a psychology major because there are more psychology majors than there are other types of majors. We couldn't speak of a mean or median major because major is a nominal scale.
Measures of variability
Variability describes the spread of the scores, that is how different the typical score is from the mean.
When variability is low the scores are clustered around the mean.
When variability is high the scores are all over the place.
For instance, the set of scores 9,9,10,10,11,11 has a mean of 10 and low variability and the set of scores 1,2,3,17,18,19 has a mean of 10 and high variability
There are many ways to measure variability.
The two most common are the standard deviation and variance, whcih are related to each other.
standard deviation
The standard deviation is the average difference of the scores from the mean score (yes - the average difference from the average!)
standard deviation is used as the measure of variance whenever a mean is used to measure the central tendency.
variance
the variance is the square of the standard deviation, so if the standard deviation is 5, then the variance is 5 squared or 25.
How to calculate the variance and standard deviation*
*note that there is a slight difference between calculating the variance for a population Vs the variance for a sample...
for a population the variance is calculated by doing the following
1. calculate a mean for the sample
2. subtract each score from that mean - which is the scores deviation from the mean
3. square each score
4. add the squared scores together
5. divide by the number of scores in the population (N)
5.* for a sample, divide by the number of scores in the same minus 1 (n-1)
For example, for the sample scores 1,3,5,7,9
1. the mean is 5: (1 + 3 + 5 + 7 + 9)/5 = 5
2. 5 - 1= 4 5 - 3 = 2 5 - 5 = 0 5 - 7 = -2 5 - 9 = -4
3. 4x4 = 16 2x2 = 4 0x0= 0 -2x-2 = 4 -4x-4= 16
4. 16 + 4 + 0 + 4 + 16 = 40
5.* (this is a sample so we divide by n-1) 40/(5-1) = 10
so the variance is 10 and the standard deviation is the square root of 10 which is about 3.162
This means that the average score differs from the mean of 5 by about 3.162 points.
still confused? check out this web page on standard deviation and variance: http://www.mathsisfun.com/data/standard-deviation.html Links to an external site.
sample variance and degrees of freedom
Degrees of freedom is a complex topic.
In the discussion of variance it was noted that for a sample we divide the sum of squared deviations by 'n-1'
n-1 is the degrees of freedom.
In statistics, degrees of freedom is noted by the italic letters df
in the above example, df = 4
the degrees of freedom refers to the number of values that are free to vary in order to know the system...yes, that is as complciated as it sounds!
While you don;t need to know much mroe then that now, you do need to know two things
1) ANY TIME statisics are computed we must know the value of df
when you read a study it is almost always reported in the results
20 You also need to know its purpose. The meausre of degrees of freedom allows us to make the most accurate calculation of the variance.
Without knowing the degrees of freedom we tend to overestimate the variance (the sum of the squared deviations divided ny N will ne larger than the sume of sqaured deviations divided by n-1...
Some day, if you take graduate statistics, you will have a more detailed discussion about standard deviations
it will be revisited a little later when hypothesis testing is discussed...
Describing interval and ratio data
The mean and standard deviationa are typically used to describe numeric values of the sort obtained in interval and ratio data.
You may have heard of a normal curve, or a bell curve.
The normal curve is a graphical representation of the distribution of scores in a population
The graph below shows the distribution of IQ scores.
IQ is "noramlly distributed" in the population.
Traits that are not 'normally distributed' have graphs with other shapes.
IQ and height are normally distributed. Weight and income are not (In the United States there are more people overweight than underweight, and more people at the low end than very high end for income)
(image of normal curve or bell curve showing distribution of IQ scores)
The standard deviation and shape of the distribution of scores tells us a lot about a population.
An IQ score has a mean of 100 an a standard deviation of 15.
In a normal curve, 68% of the population ahs a score within one standard deviation of the mean. FOr IQ scores that means that 68% of the population has an IQ score of 85 - 115.
95% of the population has an IQ score within two standard deviations of the mean. That means that about 2.5% have an IQ below 70 or above 130.
An IQ score below 70 is require for diagnosing mental retardation, and an IQ above 130 is often regarded as the beginning point of 'giftedness.'
Distributions with a low standard deviation will be taller and thinner than the normal curve, and those that have a large standard deviation will be flatter and more spread out.
Some have distributions with graphs that have peaks and valleys
Below is a graph showing the distribution of republicans and democrats based on education.
(graph illustrating distribution of republicans and democrats by education level. Distribution of republicans follows an approximately normal curve and distribution of democrats is bimodal)
Note that the distribution of republicans by education follows a normal curve. That is, the average self-described republican has an average level of education with fewer people with little or a great deal of education describing themselves as republican.
In contrast, the distribution of democracts has two peaks. Those with low level of education or high level of education are more likely to describe themsleves as democrats wiuth relatively fewer individuals with an average level of education describing themselves as republican.
A distribution with two peaks is called a bimodal distribution.
There are also distributions with three or more peaks or no peaks.
The distribution of scores tells us somethign about the population or sample fo interest.
Describing non-numerical data from ordinal scales of meaurement
Describing non-numeric data is, in some ways, easier than describing numeric data because there are no means and standard deviations.
Instead, proportions are reported, for example, the sample included 51% women and 49% men, or the distribution of grades was 30% A's, 31% B's, 22% C's, 11% D's and 6% F's.
The mode is used to describe the central tendency, e.g., "the modal grade was a B+"
Standard deviation is not computable for non-numeric data.
Correlations (p. 446)
Correlation statistics and meausres tell us several things about the relationship between two or more variables including,
- The direction of the relationship
- The form of the relationship
- The degree of consistency or strength of the relationship
- The statistical significance of the relationship
If the above doesn't sound familiar, you may want to do a quick review of chapter 12 on the correaltional research strategy...
A pearson correlation coefficient, indicated by the italic letter, r, that can range from -1 to +1 describes the strength of the relationship and the direction.
A correlation of +1 or -1 means the variables are perfectly correlated, and an r value near zero means that they are unrelated.
A perfect correlation would occur if we were correlating temperature in degrees Celcius with degrees Fahrenheit, or the value of currency in Dollars, vs Euro's vs Rupees Vs Yen.
Uncorrelated variables might include the relationship between one's zodiac sign and shoe size, or IQ and weight.
Most correlations we enocunter in psychology and biology are more than zero and less than 1. For instance, there is a very strong relationship between height and shoe size, and it is about .8, not 1. And child's IQ is correlated with parents' IQ with r ranging from 05 - .75 depending on the age of the child and the way IQ is measured.
Regression (p. 449)
pearson's correlation coefficient (r) is useful when the relationship between variables is linear.
Regression is the process of finding the equation for the straight line that provides the best fit for the data points observed on a scatterplot in a correlational study.
The resulting equation is called a regression equation
A regression equation has the form Y = bX + a
If you recall back to the days when you took algebra, this type of equation allows one to compute a values for Y given a value of X and to construct a line on a graph using the equation.
Multiple Regression
multiple regression is used when there is more than one predictor value.
For instance, we could use high school GPA to predict college GPA and after making several observations with a large sample we could calculate a regression equation as described with the form Y = bX + a where Y = college grade, X = high school grade and a and b are constants (the y intercept and slope constant)
If we wanted a more accurate prediction we might instead use a second predictor and use both high school GPA and SAT score to predict college grades more accurately.
This is multiple regression because there are multiple predictors in the regression equation.
Otherwise, it is the same as regression - the aim is to find the linear equation for Y so you can, given a value for each of the predictors, most accurately predict the value of Y.
As you may guess, the regression equation becomes more complicated as predictors are added.
BTW, to predict college success most colleges use multiple variables including high school GPA - which may be weighted based on whether the high school is public or private and a good school district vs a poor school district; SAT or ACT score, and they may add even more factors such as the difficulty of the courses taken, involvement in extra-curricular activities, writing skill, etc.
Inferential Statistics (p. 451)
Most statistics are inferential in that we collect data from a SAMPLE and unse that data to make INFERENCES about a POPULATION
Of course a sample can never precisely correspond to an entire population. For instance, I may collect data from 100 college freshman at UCF, say about their political views, or interests, or worries, and whatever outcome I obtain may be similar to that of the population of all UCF freshman, and it will not be exactly the same as the data for the population.
The difference between the sample and population is called the sampling error. It is the differences between a statistic from a sample and the corresponding parameter from a population. I may have a sample of American women with a mean weight of 145 and a mean IQ of 108. This sample would weight less and have a higher IQ than the corresponding parameters for weight and IQ in the population of American women. That difference is presumed to be due to sampling error, and some error will be present in any sample we might take from a popualtion.
The purpose of inferential statistics is to determine whether research results reflect relationships that can be generalized to the relevant population, or if the results are due to sampling error.
For instance, suppose that after treatment fro depression the group that received the experimental treatment has a score on a measure of depression that is 5% lower than the scores of particpant sin a no-treatment control group. The researcher must use statistics to determine if the 5% difference is due to an effect of treatment or is due to chance - to sampling error.
Hypothesis Test
a hypothesis is made about a population. Depressed people treated with treatment xyz will show greater reductions in depression as measured by the abc depression inventory as compared to people recieving no tretment. Men will score higher than women on a test of spatial ability. Women will score higher than men on a test of processing speed. The hypothesis applies to the entire population fo interest.
However, the research results apply to a sample of 25, or 50 or 200 individuals...
A hypothesis test is a statistical test to determine how confident we can be that the observed results are due to a real difference rather than to chance, to sampling error.
- Null Hypothesis - a statmetn about the population that says there is no effect, change, difference or relationship. FOr instance, the null hypothesis would be that people treated with xyz will not show changes in depression that differ from the control group
- Sample statistic - the data from the sample used to test the hypothesis, for instance the mean scores of two groups, or the correlation between two variables.
- standard error - the average (or standard) difference between a sample statistic and a corresponding population parameter
- test statistic - a calculation to compare the sample mean to the mean predicted by the null hypothesis while taking the standard error into account. If the difference between the sample statistic and the value predicted by the null hypotheis is large, then we reject the null hypohtesis
- alpha level - aka, level of significance - the maximum probablity that the results were due to chance.sampling error. If we set the alpha level at .05, then that means that there is a 5 in 100 or 5% probability taht the observed results are due to chance.
Make sure you understand the above concepts for taking the quiz and understanding the rest of the chapter!
Reporting the results from a hypothesis test
The basic result of a hypotheis test is that the results are or are not statistically significant.
We use what is called a p value to report singificance. For instance, saying that the difference between the two treatments was significant at p < .05.
A significant result means the odds are less than 5% that the outcome is due to chance so we should not reject the null hypothesis
Errors in hypothesis testing
note that the probability that the resutls are due to chance is not zero. Errors are possible. There are two types of errors that can be made in hypothesis testing.
type I error - the researcher failed to correctly reject the null hypothesis. These are also called 'false positives.' The results appeared to support the hypothesis but were only due to chance. For instance, by chance an ineffective tretment appeared effective because the scores in one group differed from another due to chance.
type II error - the null hypothesis is incorrectly accepted. These are also called 'false negative' That is, due to sampling error, the results appear to show no difference between the treatment and control conditions, when the treatment actually is effective.
Researchers should always think about which is more costly, making a type I error or a type II error.
For example, in the early days of testing for HIV one of the more common tests had a problem with type I errors. Many people who did not have HIV had false positive tests. They thought they had HIV when they didn't. However, doctors know about the false positives and followed up with a more accurate test that was very expensive.
The best thing would be an error-free test. However, In this case, it was better to have a false positive than a flase negative. Someone with a false negative, that is, they believe they don't have HIV when they do might infect others. Someone with a false postive would get further testing (BTW, today's HIV tests are much more accurate).
Factors that influence the outcome of a hypothesis test (p. 461
you might think that a hypothesis is either true or false.
However, since it is a probability statement rather than an absolute it can be effected by the conditions of research.
Two factors that affect the outcome of a hyptohesis test are the number of scores (sample size) and the size of the variance in scores
the number of scores in the sample
the size of the variance
Effect size and why it si very important
Examples of hypothesis tests
Comparing groups of scores in experimental, quasi-experimental, and nonexperimental designs
Test for mean differences
two-group between subjects test
Two treatment within subjects test
comparing more than two levels of a single factor
post hoc tests
Factorial Tests
Statistical tests for correlational designs
evaluating significance for a regression equation
evaluating relationships for non-numberic scores
Special statistics for research
Chronbachs Alpha