The Correlational Research Strategy
chapter 12: The correlational Research Strategy
Study Aids and Important terms and definitions
Many of you have been inquiring about how to better prepare for the quizzes.
One way is to use the online resources that accompany the textbook.
The book website is located at:
Select the chapter you want to view.
Any item with a symbol of a lock next to it can only be viewed by instructors.
Other, non-locked, items can be viewed by students and for each chapter there is a glossary, flash cards that you can set to view either a word or its definition first, a crossword puzzle, and a practice quiz.
Chapter 12 vocabulary words and terms you should know the definition of include:
correlational research strategy Spearman correlation
scatter plot predictor variable
correlation or correlation coefficient criterion variable
positive relationship regression
negative relationship coefficient of determination
monotonic relationship statistical significance of a correlation
linear relationship third-variable problem
Pearson correlation multiple regression
linear relationship
What is correlational research? (p. 344)
The aim of correlation research is to show that two or more variables are related.
Correlational research is NOT experimental. It can show that there is some kind of relationship between variables, and it cannot tell us a great deal about the nature of the relationship.
And correlational research certainly cannot be used to demonstrate a cause and effect relationship - you've probably heard the saying "correlation does not equal causation."
For example, as early as the 1930's it was known that smoking cigarettes was correlated with cancer. And the tobacco companies answered with, "yes, but a mere correlation doesn't mean that smoking CAUSES cancer!."
So, what is correlational research good for then?
It is useful for preliminary research. We may want to first show that two variables are related before investing time and money into controlled experiments.
Other applications will be discussed a little later.
Differences between correlational research and nonexperimental designs
Most experimental designs can be modified and constructed as correlational designs. The statistics used to analyze them are different and the outcomes are similar.
The major difference is that in a nonexperimental design a participants are divided into groups and one score on the dependent variable is obtained for each, while in a correlational study there are no groups and two scores are obtained for each.
For example, in a nonexperimental design I might measure the IQ of each particpant and use the score to divide them into two groups, IQ < 100 and IQ > 100.
After diving them into groups we might measure some other variable such as GPA or income or memory.
Our outcome would be a comparison of the two groups, and we would determine if the mean score on the dependent variable for one group differed significantly from the mean score for the other group, for example that the group with IQ > 100 had a significantly higher average GPA, or income or score on a test of memory.
If we did the same study as a correlational deisgn there would be no groups.
We would measure the IQ and the other variable for each participant. Using the pair of scores for each participant we would then calculate a correlation coefficient with a value between -1 and +1 that woudl tell us the strnegth and direction of the relationship between IQ and the other variable, for example we mgith conclude that there is a strong postive correlation between IQ and memory and a moderate positive correlation between IQ and income.
The data for a correlational study
The data for a correlational study include two or more scores for each participant.
In most of this chapter we will only consider cases with two variables. Studies with more than two variables are considered briefly near the end of the chapter.
The pair of scores can be labelled X and Y.
The data are arranged as a list iwth each pair of X and Y scores.
Each pair of scores can be graphed with the X score on the X axis and Y score on the Y axis.
The resulting graph is called a scatter plot
A scatterplot allows for a visula inspection of the data.
Below is a scatterplot where X = # years education past high school and Y = annual income
Notice that the points form a fiarly striaght line, this suggests a strong relationship between education and income
A scatterplot of variabls that we would not expect to be related, would not be a straight line.
Below is a scatterplot where X = birth month (1 - 12) and Y = height in cemtimeters.
Notice that the points are very scattered and there is no straight line we could draw through most of them, which suggests that birth month is not related to height.
Measuring Relationships (p. 346)
In order to measure relationships between variables we can consider three aspects of the relationship: the direction, form, and consistency of the relationship.
Direction
The direction of a relationship (assuming there is one) is postive or negative.
If the relationship is positive the sclatterplot slopes up to the right.
That is, as the number on the X axis increases, so does the number on the Y axis.
If the line slopes down, this means that as the score on the X-axis gets higher the number on the Y-axis gets lower.
The images below are of a scatter plot showing a positive relationship (on the left) and a negative relationship (on the right).
(images of a graph illustrating upward sloping positive correlation and graph showing a downward negative correlation)
Examples of positive relationships, where as one variable increases the othe also increases:
- Income and years of education
- Years of smoking cigarettes and probability of lung cancer
- Shoe size and height
- Examples of negative relationships, where as one variable increases the other decreases.
- Hours spent watching television and grades
- Minutes spent exercising and body mass index (BMI)
- Age and processing speed
If two variables are uncorrelated, then we do not speak of the direction of the relationship..
Form
By the form of the relationship we mean the form of the scatter plot.
If the relationship is linear, then the scatterplot looks like a straight line - as in the scatter plot above showing the relationship between education and income.
This is the simplest form.
When there is no relationship between the variables the scatter plot looks like a big jumble or 'blob' of data points.
Other relationships are more complex.
For example, a curviliner relationship.
(image of scatter plot for X = anxiety and Y = performance)
The relationship between performance and anxiety/arousal is one of the better known examples (at least better known in psychology!) of a curvilnear relationship.
When anxiety is low, so is performance. As anxiety increases performance increases to a point...and suddenly the curve changes directiona nd as anxiety gets very high performance starts to decrease. The thinking behind this phenomenon, known as "optimal arousal" is that when anxiety is very low a person may be bored, disinterested, uncaring, etc. and perofrmance will suffer. When anxiety is very high a person will be tense and have clouded thinking and performance will suffer. And when anxiety is mild to moderate the person will be energized and focused and performance will be enhanced.
One take home message is that it is good to be a little anxious for a job interview or speech...the take home message for correlational designs is that many relationships are not linear.
Here is another curvilinear example with a different form:
(image of curvilinear scatter plot for X = Doctors per million population and Y = life expectancy)
The form of the graph illustrated above is common, where the line initially is curved when X is low, and then flattens as X gets larger.
This example praphs the relationship between the number of doctors per million people in the popualtion of a country and female life expectancy.
This result suggests that the number of doctors in a country is important to survival until ithe number of doctors reaches 2,000 per million people.
When it is very low people don't live as long, and as the number of doctors increases people in a country tend to live longer until the number of doctors reaches about 2,000 per million people. Above that number, having more doctors adds little to life expectancy.
I hope you can see from these examples how correlational studies can be used to understand and clearly illustrate relationships between variables, and that the outcomes can guide behavior and even public policy!
Consistency/Strength
The strength or consistency of a correlation refers to how well the points on a scatter plot fit a straght line (or a curvilinear form such as monotonic relationship).
In terms of measurement, the strength/consistency refers to how close the value of the correlation coefficient is to 1 or -1.
A correlation of zero or near zero (in practice, few correlation coefficients are exactly zero, a correlation coefficient such as r = .03 would also indicate no relationship between the variables) indicates no relationship.
A correlation of 1 or +1 means that the relationsip is as strong as it can possibly be, that if you know the value of X you can precisely predict the value of Y. A graph of temperature in Celcius Vs. temperature in farenheit would have this sort of one-to-one realtionship. Most psychological phenomena do not have a perfect correlation, and the relationship can be quite strong.
The graph below shows a strong relationship - the data points clearly form a straight line.
The graph below shows a little bit weaker correlation. We can draw a line on the scatter plot that would go through or near most of the data points. There is some consistency to the relationship between X and Y. In general as X increases, so does Y, but all the points are not as close to the line as in the example above of a stronger correlation.
Finally, in the scatter plot below, there is no relationship. There is no relationship between X and Y such that we could draw a line that go though or near most of the data points.
Here is a rule of thumb for interpreting the 'strength' of a relationship based on the Pearson's correlation coefficient:
If r = +.70 or higher Very strong positive relationship
+.40 to +.69 Strong positive relationship
+.30 to +.39 Moderate positive relationship +.20 to +.29 weak positive relationship
+.01 to +.19 No or negligible relationship
-.01 to -.19 No or negligible relationship
-.20 to -.29 weak negative relationship
-.30 to -.39 Moderate negative relationship
-.40 to -.69 Strong negative relationship
-.70 or higher Very strong negative relationship
Evaluating Relationships for Non-Numerical Scores (p. 349)
When one of the variables is non-numeric, such as gender, or ethnicity, or membership in a group we can still do correaltional research.
In this case of using gender as a variable, we would assign males and females values of 0 and 1.
In theory, would assign values of 1 and 2, or 35 and 58...the actual numbers don't matter, and 0 and 1 is custonary.
The Pearson correlation calcualted when one variable is numberic and the other is non-numeric and has two values is called a point-biserial correlation.
It has a value between 0 and 1 becasue the direction doesn't matter - it would not make sense to say something like, "as gender goes up so does reading score!" And it would make sense to say that gender is correlated with scores on reading tests.
If both values are non-numeric, say a correlation between gender and voting operationalized as answering yes or no to the question, "did you vot ein the last presidential election?"
Then the data would be organized into a matrix with four cells and we could evaluate it with a chi-square test
voted |
did not vote |
|
female | 247 | 187 |
male | 200 | 191 |
However, we could also calculate a correlation coefficient by assigning each variable - voting behavior and gender - a value of zero or 1 and then greate a pair of numbers for each partcipant and calculate the correlation coefficient, which is called a phi-coefficient when both variables are non-numeric.
Comparing correlational, experimental and differential research
The biggest difference between a correlational design and an experiment is that a correlational study is limited to showing that ther eis a relationship between variables while in an experiment we are trying to show that changes in one variable cause changes in another.
Since the correlational study only aims to show that the variables are related, so variables are manipulated or controlled. The experimenter merely measures the value of two or more variables in a group of participants.
Correlational and nonexperimental designs are more similar. The major difference is that in nonexperimental designs the particpants are divided into groups and group means are calculated to see if there are group differences, while in ocrrelational designs there is a single group of particpants, and pairs of scores are obtained and a correlation coefficient is calculated.
Applications of the corrleational strategy (p. 351)
Thre are several applications of correlational designs, and your textbook describes three
prediction
reliability and validity
evaluating theories
Prediction
Correlational studies can show that one variable is useful for predicting another.
For example, if we know a high school student's SAT score, astrological sign, favorite television program, and musical aptitude which do you think best predicts college performance? SAT scores are correlated with college grades, and GRE scores with graduate school performance, and MCATS with medical school performance, and LSAT's with law school performance, and so on. The correlation between each entry exam and school success is high enough (though not a perfect correlation of 1...) wuch that college admissions people can use the test scores to predict who is likely to do well in school.
The military uses tests to predicit who might be a good officer, and businesses use tests to predict who will make a good manager.
In this type of research the test score is called the predictor variable and the the thign it predicts (such as grades, or suitability for management training) is called the criterion variables. The process of using one variable to predict another is called regression.
Reliability and validity
Correlational studies can be used to evaluate the reliabilitya nd validity of testsassessment instruments and other measurement procedures.
Reliability and validity mean that a test measures what it claims to measure. Correlationsal research might be used to show that, for example, one IQ test is highly correlated with another (criterion validity), Or that a measure of anxiety is only somewhat but not too strongly correlated iwth a measure of depression (divergent validity), or that when a person takes the test twice their scores are highly similar (test-retest reliability), or that when two different people adminsiter the test the ontained scores are similar (inter-rater reliability).
Evaluating theories
Correlational research can be used to evaluate theories when experimental research isn't an option. For example, if we want to show that schizophrenia is genetic we could correlate the incidence of schizophrenia in people with and without relatives with schizophrenia, or in pairs of twins, or on offspring of people with schizophrenia.
Or if I wanted to evaluate a theory that some substance is linked to cancer I might correlate the amount of the substance individuals are exposed to with the incidence of cancer.
Interpreting a corrleation (p. 353)
Strength of relationship
The correlation coefficient can tell us something about the strength of a relationship, and a better measure is the coefficient of determination, which is simply the squared value of the correlation coefficient. For example, if r = .7, then r squared = .49. This coefficient of determination, .49. tells us that 49% of the value fo the criterion value is determined by the predicitor value.
When you hear about studies that say things like, 60% of IQ is genetic and the rest is environmental, or schizophrenia is at least 50% genetic, they are usually referring to the coefficient of determination, or strength of the relationship between the variables.
Significance of relationship
Statisitcal significane is different than but related to strength of a relationship. Statistical significance is affected by both the size of the correlation coefficient and the sample size.
In a small sample a very small correlation of, say .15 will probably not be significant. However, in a sample of hundreds of particpants it probably will be significant.
Statistical significance helps us predict how likely we would be to obtain similar scores of we repeated the procedure.
However, statisitically significant does not necessarily mean important, or clincially significant.
A correaltion of .15 may be statisitically significant is a large sample, but consider the coefficient of determination - >15 x .15 = .025 THis means that only 2.25 % of the variance in the criterion variable is accounted for by the score on the predictor variable...
It would not be very helpful to say, "I have a test that can predict 2% of the variance in college grades!", even if it is statistically significant, you won't be useful to admissions staff.
On the other hand, maybe you would eat one more piece of fruit a day if it decreased your risk of heart disease by 2%, or would would want a new test that would predict major hurricanes with 2% more accuracy than the old version...
Statistical significance is math, it's in the numbers. CLincial significance or importance is in the eye of the beholder.
Strengths and weaknesses of the correlational strategy (p. 355)
Correlational designs are often used in preliminary research on topics that haven't bee researched much previously. If two or more variables are shown to be correlated, then there might be good reason to do further research with more complex designs.
A second strength is that since the investigator does not manipulate or control anything, then s/he cna be confident that observed relationships are based on how the variables are related in nature rather than due to error or the artifical conditions of an experiment.
However, there are also several weaknesses to correlational designs.
The third variable problem was discussed previosuly, and it is a problem in all correlational research. We can never be sure whether two correlated variables are directly related or if they are both related to a third variable. You may recall the example of the high correlation between crime and ice cream sales discussed in a previous chapter. Ice cream sales don't cause crime, do they? OF course not! However, both crime and ice cream sales are related to a third variable - warm weather!
The directionality problem also plagues correlational studies. When two variables are correlated, it is impossible to determine which variable is the cause and which is the effect using a correlational design. Does watching violent Tv cause aggressiv ebeahvior? Or do people who behave aggressively watch more violent TV? A correlational study cannot answer this question.
Relationships with more than two variables (p. 358)
Most of chapter 12 is focused on correlations between two variables.
However, msot psychological phenomena are related to many variables. For example, the probability of developing PTS after exposure to a trauma is related to (that we know of...) being exposed to a trauma, the number of traumas exposoed to, gender, the size of the hippocampus, family history of mental illness, history of anxiety disorders, coping skills, social support, and so on.
Multiple regression is a technique for considering correlations aong multiple variables. Two advantages of this technique are that it can be used to (1) find out the relative contribution of each variable to the phenomenon of interest, or (2) allow the researcher to control for the effect of one or mroe variables. For exmaple, you may hear about studies that say thing like, "people who eat more vegetables have a lower risk of heart attack after exercise and smoking are controlled for." THis type of statement means that the researchers are using statistical procedures to account for the relationships among some variables in order to parcel out the effects of some other variable where th erelationship might be masked if a simple correlation rather than multiple regression were used.