Measurement
Measurement
(image of pencil and questionnaire)
After you identify a topic for research along with an interesting question to explore or hypothesis to test you must quickly decide how you are going to measure your variable of interest. You might consider:
-
what exactly are the variables I want to study?
-
what is the hypothesized relationship between those variables?
-
Have others developed ways to measure the variable?
-
...and shown that their measures are reliable and valid? (see below)
-
Given your lab facilities and resources, what modes of measurement are most practical for you, e.g., can you afford expensive lab equipment for phyciological measurement? Can the behavior be observed?
How we define and then measue our variables of interest is critical to doing good research.
Constructs and Operational Definitions
Constructs (p. 74-75)
'Construct' is a noun, and in order to understand it, think of the verb 'to construct'.
In psychology we study many hypothetical processes, attributes, mechanisms (think 'intelligence', 'extraversion'. 'psychopathy', 'depression'...) that help us describe, predict, and influence behavior.
They are 'constructed' in the mind of the thinker, and 'hypothetical' in that they can't be directly measured or observed, that is, we can't see or directly measure 'street smarts' or 'eating disorders' or 'stress'...and these constructs are useful so we identify indirect ways to measure tham, and they are often useful for describing, predicting, or influencing human behavior.
Operational Definitions
Operational definitions are necessary since many constructs can’t be directly observed or measured...
Q: How does one measure intelligence, friendship, aggression, anxiety, good parenting, and etc.?
A: We operationalize them! (textbook p.25)
My favorite description of an excellent operational definition is this:
You have done a good job of operationalizing a variable when two independent observes can agree that the phenomenon of interest did or did not occur.
Think about it…two independent observers might not agree whether a class is “easy” or “hard” by looking at the syllabus.
And they might agree about the number of hours students report studying, the average grade earned in the class, the ‘difficulty rating’ that a large number of students give to the class.
We ‘operationalize’ a variable by describing it in a way that can be measured and agreed upon by anyone applying the same measure.
Another example…
Suppose you want to study problem drinking…two independent observers might not agree on whether a person “drinks too much.” And they can agree on the amount of alcohol consumed per occasion, day, week or month, or on blood alcohol level. And we can operationalize problem drinking as consuming more than 10 drinks per week, or by having a blood alcohol over .08
Operationalizing a variable allows us to observe constructs that we cannot measure directly.
And operationalizing is only the beginning -- we also must determine if our measures are valid and reliable
Validity
quick instructor comment -- most textbooks discuss reliability first and then validity --- this is because a measure that is not reliable can never be valid; and in order to be valid, a measure MUST be reliable...so reliabilityis considered first. Your textbook covers validity first, so it will also be considered first here.
Your measure demonstrates validity if it measures what it purports to measure
There are several types of validity (textbook pp. 77 – 84)
Face validity
technically, it is not truly a measure of validity, and the name has stuck…A face valid measure looks like it measures what it claims to measure.
Suppose a measure consisted of four questions:
1. Who is Minerva McGonagall?
2. What is the relationship between Harry Potter and Serius Black?
3. Name one of the co-founders of Hogwarts.
4. Which of the following is NOT a member of the Order of the Phoenix? Sturgis Podmore, Arabella Figg, Tom Riddle?
This measure would have poor face validity as a measure of intelligence, and high face validity as a measure of knowledge about characters in the Harry Potter Series.
Concurrent validity
Are scores obtained on the measure consistent with already established measuresof the same construct? For example, are scores on my new measure of adult intelligence consistent with scores on the WAIS-IV?
Predictive Validity
The mesausre predicts behavior, for instance, the score on a paper and pencil measire of extraversion predicts behavior in a social setting. The scores on a placement exam predict performance in school or at work.
Construct Validity
measurement of a variable is associated with the same behavior we would predict according to theory. For instance, we would expect a valid measure of depression to predict depressive behavor such as crying, and deomonstrating chages in appetite and sleep.
Convergent and Divergent Validity
The measure is correlated with related measures and uncorrelated with unrelated measures. For instance, we would expect a measure of intelligence to be correlated with - but not identical to - a measure of years of education (convergent validity), and uncorrelated with a measure of extraversion (divergent validity).
Reliability
Reliability referes to the degree of stability or consistency of the measure. Just as there are different types of validity there are different types of reliability that are reviewed in chapter 3 (pp. 54 - 88). Make sure you can identify the differetn types including:
- test-retest reliability
- parallel forms reliability
- inter-reater reliability
- split half reliability
And for a nice example illustrating the relationship between validity and reliability, check out the following web page:
http://www.documentingexcellence.com/stat_tool/reliabilityvalidity.htm Links to an external site.
Scales of Measurement
The scale of a measure is the set of categories that one can use the measure for as a means of classifying individuals with respect to the construct being measured. For a simple example, a measure of sex classifies individuals a male or female. A scale of temperature would be used to classify things as hot or cold, a scale of IQ score points would be used to classify intelligence, etc.
Measures can be described as having a nominal, ordinal, interval or ratio scale (pp. 89 - 92)
Nominal - also called 'categorical' - the scale is in terms of membership in a group, for instance, the answer to the question "which political party do you belong to' has an answer that is on a nominal scale. The answer is usually a 'name' of a category or group (e.g., democrat, republican, socialist, libertarian) and it cannot easily be answered in terms of a number. Gender, ethnicity, and handedness are nominal measures.
Ordinal - categories are organized sequentially but the difference between on group and the next is not necessarily equal. For instance, if we rate the top 40 songs, we can't say for sure that the 10th song is twice as popular as the 20th song.
Interval and ratio- all categories are the same size, so we can measure the distance between them on a scale. A ratio scale has a true Zero and an interval has a more arbitrary zero (for example, take temperature -- the Kelvin scale has an absolute value of zero, while the Farenheit scale has an arbitrary value of zero). See p. 90 =91 for more info.
(image and description of measurement scales)
Test Yourself
Go to the following link to complete a brief matching exercise on scales of measurement:
https://materia.ucf.edu/play/eNZMO/measurement-scales
Modalities of Measurement
The modality of a measure refers to the WAY we measure it (p. 03 - 96)
Some common modes:
- self-report: a person reports their behavior on a questionnaire or rating scale. For instance, neasures of depression, polical affiliation, attitudes, preferences
- Physiological: we measure a construct according to a biological variable, e.g., testosterone level, heart rate, galvanic skin response.
- Behavioral: we observe a person and record their behavior, e.g., number of times they smile at someone, number of inches away from their neighbor they sit, number of steps they walk in a day.
Other Aspects of Measurement
Review pages 96 - 102 in your textbook such that you can discuss:
- Reasons why it is a good practice ot have multiple measures
- What is meant by sensitivity and range effects
- Describe 'experimental bias' and different ways a researcher can influence a participant's behavior