Statistical Evaluation of Data

Chapter 15: Statistical Evaluation of Data

Study Aids and Important terms and definitions

Many of you have been inquiring about how to better prepare for the quizzes.

One way is to use the online resources that accompany the textbook.

The book website is located at:

http://www.cengage.com/cgi-wadsworth/course_products_wp.pl?fid=M20bI&product_isbn_issn=9781111342258 Links to an external site.

Select the chapter you want to view.

Any item with a symbol of a lock next to it can only be viewed by instructors.

Other, non-locked, items can be viewed by students and for each chapter there is a glossary, flash cards that you can set to view either a word or its definition first, a crossword puzzle, and a practice quiz.

Chapter 15 vocabulary words and terms you should know the definition of include:

Notice that this chapter has a very large number of vocaulary terms - statistics is like that!

descriptive statistics                                             sampling error                                  

inferential statistics                                              hypothesis test

statistic                                                                 null hypothesis

parameter                                                             standard error

frequency distribution                                         test statistic

histogram                                                             alpha level or level of significance

polygon                                                               significant result or statistically significant result

bar graph                                                             Type I error

central tendency                                                  Type II error

mean                                                            effect size    

median                                                                 Cohen’s d

mode                                                                    percentage of variance accounted for

bimodal distribution                                            confidence interval

multimodal distribution                                        independent measures t-test

variability                                                             repeated-measures t test    

standard deviation                                               single-factor analysis of variance or one-way

variance                                                                ANOVA      

degrees of freedom, df                                        error variance

line graph                                                             post hoc tests or post tests

correlation                                                            two-factor analysis of variance or two-way

scatter plot                                                             ANOVA

Pearson correlation                                              parametric test

Spearman correlation                                           nonparametric test

regression                                                             chi-square test for independence

regression equation                                              split-half reliability

slope constant                                                      Spearman-Brown formula

Y-intercept                                                           Kuder-Richardson formula 20, or K-R 20

multiple regression                                               Cronbach’s alpha

multiple-regression equation                               

The role of statistics in the research process (p.432)

You know it was coming didn't you?

It's just not possible to talk about research methods without some discussion of statistics!

The aim of statistics is to:

  • summarize the data to see what happened
  • summarize the data so it can be easily communicated to others
  • determine what conclusions are justified by the results (significant testing)

We can simplify these puproses to description and analysis

Descriptive statisitcs are used to describe samples and summarize the results of studies

Inferential statistics are used to make generalizations about populations based on results obtained from samples

read the previosu sentence again...

It is a good summary of the purpose of statistical testing.

The researcher conducts a study with a sample, and statistical analysis of the results tells the researcher (in part) if the findings can be generalized from a sample to a population

Plan your statistics earlier rather than later

This advice applies to all research - including your research proposal paper

even though the results section of a study is near the end, the researchers plans the statistical tests that will be used well in advance.

The researcher has to make sure that the method of the study is compatible with the planned data analysis, for instance, the number of groups, the number of variables, the number of particpants needed to conduct the study.

The research design and statiscs used to summarize and analyze the results depend on the research question the investigator has in mind.

Statistics terminology ---- what is a statistic anyway? (p. 433)

A statisitc is a summary value that describes a sample, for isntance, the mean age of a sample, the median income of a smaple, the mean test score of a sample

A parameter is a score or value that describes a population while a statistic is a value or scor ethat describes a sample selected from that population.

For instance, we may have a sample that obtains an average IQ score of 101, and conclude that - with respect to IQ - the sample closely corresponds to the parameter of an IQ of 100 in the popualtion. On the other hand, the sample average would differ from the parameter for college students where the average IQ is higher than 100.

And if our sample was composed fo 65% women and 35% men, that would not closely match the parameter for sex ratio in the US population of 50.8% women to 49.2% men

Descriptive Statistics (p. 434)

As was mentioned above, descriptive statistics help us organize scores or values into tables and graphs that allow researchers to see a set of scores and allow the researcher to compute summary values, such as averages or medians, that describe the entire sample

Frequency distributions

a frequency distribution in the form of a table or graph is used to organize and illustrate scores for a sample by showing the frequency of each score (or range of scores) in the enitre sample or population.

For example, the image below is of a simple frequency distribution chart of people of different ages in the US population

frequency distribution.gif

(image of chart showing the % of the population in 5 different age groups)

Her is the same frequency distribution of ages of people in the US population shown in the form of a graph, in this case a bar graph.

frequency distribution 2.gif

(image of bar graph showing the % of th epopulation in 5 different age groups)

Measures of central tendency (p. 437)

As the name suggests, the CENTRAL tendency is a single number that describes the CENTER of a distribution.

The purpose of reporting the central tendency is to summarize a group with the single number that is most typical or representative of the group.

While the average score is most commonly used, there are three differetn ways to measure a central tendency, the mean, the median, and the mode

  • mean: the average, obtained by adding the total of all scores and dividing by the number of scores
  • median: the score that divides the distribution in half such that half the scores are higher than the median and half are lowerscore
  • mode: the score that appears most frequently

The average is used most often

If the sample is normally distributed, that is a graph of the distribution looks like a normal or bell curve, then the median and the mean are about the same.

But sometimes they are very different and in thsoe cases it may be preferable to report both the median or both the mean and the median.

The median is used when the distribution is lopsided, that is when there are many scores at one end of the distribution. For example, in 2004 the median annual income in the US was about $43,400 and the mean income was $60,500!!

Why are they so different? Because many of the half of households tha tearn more than $43,400 earn A LOT more than $43,400 - milliniares raise the mean.

The median is reported more often, because it gives a more accurate idea of the 'typical' household.

The mode is most often used when the value of interest is non-numeric. For instnace, we could say that the modal student at UCF is a psychology major because there are more psychology majors than there are other types of majors. We couldn't speak of a mean or median major because major is a nominal scale.

Measures of variability

Variability describes the spread of the scores, that is how different the typical score is from the mean.

When variability is low the scores are clustered around the mean.

When variability is high the scores are all over the place.

For instance, the set of scores 9,9,10,10,11,11 has a mean of 10 and low variability and the set of scores 1,2,3,17,18,19 has a mean of 10 and high variability

There are many ways to measure variability.

The two most common are the standard deviation and variance, whcih are related to each other.

standard deviation

The standard deviation is the average difference of the scores from the mean score (yes - the average difference from the average!)

standard deviation is used as the measure of variance whenever a mean is used to measure the central tendency.

variance

the variance is the square of the standard deviation, so if the standard deviation is 5, then the variance is 5 squared or 25.

How to calculate the variance and standard deviation*

         *note that there is a slight difference between calculating the variance for a population Vs the variance for a sample...

for a population the variance is calculated by doing the following

1.  calculate a mean for the sample

2. subtract each score from that mean - which is the scores deviation from the mean

3. square each score

4. add the squared scores together

5. divide by the number of scores in the population (N)

5.* for a sample, divide by the number of scores in the same minus 1  (n-1)

 

For example, for the sample scores 1,3,5,7,9

1. the mean is 5: (1 + 3 + 5 + 7 + 9)/5 = 5

2.  5 - 1= 4    5 - 3 = 2    5 - 5 = 0     5 - 7 = -2     5 - 9 = -4

3.  4x4 = 16    2x2 = 4     0x0= 0     -2x-2 = 4    -4x-4= 16

4. 16 + 4 + 0 + 4 + 16 = 40

5.* (this is a sample so we divide by n-1)    40/(5-1) = 10

so the variance is 10 and the standard deviation is the square root of 10   which is about 3.162

This means that the average score differs from the mean of 5 by about 3.162 points.

still confused? check out this web page on standard deviation and variance: http://www.mathsisfun.com/data/standard-deviation.html Links to an external site.

sample variance and degrees of freedom

Degrees of freedom is a complex topic.

In the discussion of variance it was noted that for a sample we divide the sum of squared deviations by 'n-1'

n-1 is the degrees of freedom.

In statistics, degrees of freedom is noted by the italic letters df

in the above example, df = 4

the degrees of freedom refers to the number of values that are free to vary in order to know the system...yes, that is as complciated as it sounds!

While you don;t need to know much mroe then that now, you do need to know two things

1) ANY TIME statisics are computed we must know the value of df

when you read a study it is almost always reported in the results

20 You also need to know its purpose. The meausre of degrees of freedom allows us to make the most accurate calculation of the variance.

Without knowing the degrees of freedom we tend to overestimate the variance (the sum of the squared deviations divided ny N will ne larger than the sume of sqaured deviations divided by n-1...

Some day, if you take graduate statistics, you will have a more detailed discussion about standard deviations

it will be revisited a little later when hypothesis testing is discussed...

Describing interval and ratio data

The mean and standard deviationa are typically used to describe numeric values of the sort obtained in interval and ratio data.

You may have heard of a normal curve, or a bell curve.

The normal curve is a graphical representation of the distribution of scores in a population

The graph below shows the distribution of IQ scores.

IQ is "noramlly distributed" in the population.

Traits that are not 'normally distributed' have graphs with other shapes.

IQ and height are normally distributed. Weight and income are not (In the United States there are more people overweight than underweight, and more people at the low end than very high end for income)

normal_curve.gif

(image of normal curve or bell curve showing distribution of IQ scores)

The standard deviation and shape of the distribution of scores tells us a lot about a population.

An IQ score has a mean of 100 an a standard deviation of 15.

In a normal curve, 68% of the population ahs a score within one standard deviation of the mean. FOr IQ scores that means that 68% of the population has an IQ score of 85 - 115.

 95% of the population has an IQ score within two standard deviations of the mean. That means that about 2.5% have an IQ below 70 or above 130.

An IQ score below 70 is require for diagnosing mental retardation, and an IQ above 130 is often regarded as the beginning point of 'giftedness.'

Distributions with a low standard deviation will be taller and thinner than the normal curve, and those that have a large standard deviation will be flatter and more spread out.

Some have distributions with graphs that have peaks and valleys

Below is a graph showing the distribution of republicans and democrats based on education.

 

bimodal distribution.jpg

(graph illustrating distribution of republicans and democrats by education level. Distribution of republicans follows an approximately normal curve and distribution of democrats is bimodal)

Note that the distribution of republicans by education follows a normal curve. That is, the average self-described republican has an average level of education with fewer people with little or a great deal of education describing themselves as republican.

In contrast, the distribution of democracts has two peaks. Those with low level of education or high level of education are more likely to describe themsleves as democrats wiuth relatively fewer individuals with an average level of education describing themselves as republican.

A distribution with two peaks is called a bimodal distribution.

There are also distributions with three or more peaks or no peaks.

The distribution of scores tells us somethign about the population or sample fo interest.

Describing non-numerical data from ordinal scales of meaurement

Describing non-numeric data is, in some ways, easier than describing numeric data because there are no means and standard deviations.

Instead, proportions are reported, for example, the sample included 51% women and 49% men, or the distribution of grades was 30% A's, 31% B's, 22% C's, 11% D's and 6% F's.

The mode is used to describe the central tendency, e.g., "the modal grade was a B+"

Standard deviation is not computable for non-numeric data. 

Correlations (p. 446)

Correlation statistics and meausres tell us several things about the relationship between two or more variables including,

  • The direction of the relationship
  • The form of the relationship
  • The degree of consistency or strength of the relationship
  • The statistical significance of the relationship

If the above doesn't sound familiar, you may want to do a quick review of chapter 12 on the correaltional research strategy...

A pearson correlation coefficient, indicated by the italic letter, r, that can range from -1 to +1 describes the strength of the relationship and the direction.

A correlation of +1 or -1 means the variables are perfectly correlated, and an r value near zero means that they are unrelated.

A perfect correlation would occur if we were correlating temperature in degrees Celcius with degrees Fahrenheit, or the value of currency in Dollars, vs Euro's vs Rupees Vs Yen. 

Uncorrelated variables might include the relationship between one's zodiac sign and shoe size, or IQ and weight.

Most correlations we enocunter in psychology and biology are more than zero and less than 1.  For instance, there is a very strong relationship between height and shoe size, and it is about .8, not 1.  And child's IQ is correlated with parents' IQ with r ranging from 05 - .75 depending on the age of the child and the way IQ is measured.

Regression (p. 449)

pearson's correlation coefficient (r) is useful when the relationship between variables is linear.

Regression is the process of finding the equation for the straight line that provides the best fit for the data points observed on a scatterplot in a correlational study.

The resulting equation is called a regression equation

A regression equation has the form Y = bX + a

If you recall back to the days when you took algebra, this type of equation allows one to compute a values for Y given a value of X and to construct a line on a graph using the equation.

Multiple Regression

multiple regression is used when there is more than one predictor value.

For instance, we could use high school GPA to predict college GPA and after making several observations with a large sample we could calculate a regression equation as described with the form Y = bX + a   where Y = college grade, X = high school grade and a and b are constants (the y intercept and slope constant)

If we wanted a more accurate prediction we might instead use a second predictor and use both high school GPA and SAT score to predict college grades more accurately.

This is multiple regression because there are multiple predictors in the regression equation.

Otherwise, it is the same as regression - the aim is to find the linear equation for Y so you can, given a value for each of the predictors, most accurately predict the value of Y.

As you may guess, the regression equation becomes more complicated as predictors are added.

BTW, to predict college success most colleges use multiple variables including high school GPA - which may be weighted based on whether the high school is public or private and a good school district vs a poor school district; SAT or ACT score, and they may add even more factors such as the difficulty of the courses taken, involvement in extra-curricular activities, writing skill, etc.

Inferential Statistics (p. 451)

Most statistics are inferential in that we collect data from a SAMPLE and unse that data to make INFERENCES about a POPULATION

Of course a sample can never precisely correspond to an entire population. For instance, I may collect data from 100 college freshman at UCF, say about their political views, or interests, or worries, and whatever outcome I obtain may be similar to that of the population of all UCF freshman, and it will not be exactly the same as the data for the population. 

The difference between the sample and population is called the sampling error. It is the differences between a statistic from a sample and the corresponding parameter from a population. I may have a sample of American women with a mean weight of 145 and a mean IQ of 108. This sample would weight less and have a higher IQ than the corresponding parameters for weight and IQ in the population of American women. That difference is presumed to be due to sampling error, and some error will be present in any sample we might take from a popualtion.

The purpose of inferential statistics is to determine whether research results reflect relationships that can be generalized to the relevant population, or if the results are due to sampling error.

For instance, suppose that after treatment fro depression the group that received the experimental treatment has a score on a measure of depression that is 5% lower than the scores of particpant sin a no-treatment control group. The researcher must use statistics to determine if the 5% difference is due to an effect of treatment or is due to chance -  to sampling error.

Hypothesis Test

a hypothesis is made about a population. Depressed people treated with treatment xyz will show greater reductions in depression as measured by the abc depression inventory as compared to people recieving no tretment. Men will score higher than women on a test of spatial ability. Women will score higher than men on a test of processing speed. The hypothesis applies to the entire population fo interest.

However, the research results apply to a sample of 25, or 50 or 200 individuals...

A hypothesis test is a statistical test to determine how confident we can be that the observed results are due to a real difference rather than to chance, to sampling error.

  1. Null Hypothesis - a statmetn about the population  that says there is no effect, change, difference or relationship. FOr instance, the null hypothesis would be that people treated with xyz will not show changes in depression that differ from the control group
  2. Sample statistic - the data from the sample used to test the hypothesis, for instance the mean scores of two groups, or the correlation between two variables.
  3. standard error - the average (or standard) difference between a sample statistic and a corresponding population parameter
  4. test statistic - a calculation to compare the sample mean to the mean predicted by the null hypothesis while taking the standard error into account. If the difference between the sample statistic and the value predicted by the null hypotheis is large, then we reject the null hypohtesis
  5. alpha level - aka, level of significance - the maximum probablity that the results were due to chance.sampling error. If we set the alpha level at .05, then that means that there is a 5 in 100 or 5% probability taht the observed results are due to chance.

Make sure you understand the above concepts for taking the quiz and understanding the rest of the chapter!

Reporting the results from a hypothesis test

The basic result of a hypotheis test is that the results are or are not statistically significant.

We use what is called a p value to report singificance. For instance, saying that the difference between the two treatments was significant at p < .05.

A significant result means the odds are less than 5% that the outcome is due to chance so we should not reject the null hypothesis

Errors in hypothesis testing

note that the probability that the resutls are due to chance is not zero. Errors are possible. There are two types of errors that can be made in hypothesis testing.

type I error - the researcher failed to correctly reject the null hypothesis. These are also called 'false positives.' The results appeared to support the hypothesis but were only due to chance. For instance, by chance an ineffective tretment appeared effective because the scores in one group differed from another due to chance.

type II error - the null hypothesis is incorrectly accepted. These are also called 'false negative' That is, due to sampling error, the results appear to show no difference between the treatment and control conditions, when the treatment actually is effective.

Researchers should always think about which is more costly, making a type I error or a type II error.

For example, in the early days of testing for HIV one of the more common tests had a problem with type I errors. Many people who did not have HIV had false positive tests. They thought they had HIV when they didn't. However, doctors know about the false positives and followed up with a more accurate test that was very expensive.

The best thing would be an error-free test. However, In this case, it was better to have a false positive than a flase negative. Someone with a false negative, that is, they believe they don't have HIV when they do might infect others. Someone with a false postive would get further testing (BTW, today's HIV tests are much more accurate).

Factors that influence the outcome of a hypothesis test (p. 461

you might think that a hypothesis is either true or false.

However, since it is a probability statement rather than an absolute it can be effected by the conditions of research.

Two factors that affect the outcome of a hyptohesis test are the number of scores (sample size) and the size of the variance in scores

the number of scores in the sample

 

the size of the variance

Effect size and why it si very important

Examples of hypothesis tests

Comparing groups of scores in experimental, quasi-experimental, and nonexperimental designs

Test for mean differences

two-group between subjects test

Two treatment within subjects test

comparing more than two levels of a single factor

post hoc tests

Factorial Tests

Statistical tests for correlational designs

evaluating significance for a regression equation

evaluating relationships for non-numberic scores

Special statistics for research

Chronbachs Alpha