Within-Subjects Designs

Chapter 9: Within-Subjects Experimental Designs

In contrast to between-subjects designs, where two or more groups of individuals are each participate in a different treatment condition, in a within-subjects design each individual participates in all treatment condtions.

Characteristics of within subject designs (p. 254)

A within subjects design is also known as a repeated measures design. This is because each participant is measured repeatedly as he or she particpates in each treatment condtion. Compare this to the between-subjects or indpendent measures design where each particpant has a single score.

Note that while within-subjects designs can be used for both experimental and non-experimental research, in this chapter only experimental designs are considered.

In a within-subjects design there is only one group of particpants and each one of them participates in every level of the independent variable. Imagine a between-subjects design where one group participated in one of two weight loss programs and weight loss was compared at the end of eight weeks.

In a corresponding within-subjects design, each particpant would participate in one weight loss program for eight weeks and weight loss would be measured, then all of them would particpate in a second weight loss program for eight weeks and weight loss would eb measured again.

The nature of your population of interest, the treatment condtions, and the variables being measured will all determine whether you would be best off using a between-subjects versus a within-subjects design.

Advantages

The single most important advantage of a within-subjects design is that you do not have to worry about individual differences confounding your results because all treatment groups include the exact same partcipants. You also don't have to worry about variance within groups because each particpant serves as his or her own control so we can measure and remove such variance. For example, if two different participants, we'll call the Romeo and Juliet, recieve treatments A & B, some of the observed differences on our dependent variable may be due to the treatments. But, they could also be due to differences between Romeo and Juliet, perhaps gender, or height, or family history affects scores on the dependent variable - we can't be sure which differences are due to the treatment and chich are due to individual differences. But if Romeo and Julier each recieve treatments A & B, we can be confident that differences in Romeo's scores on the dependent variable are due to treatment alone because he has no individual differences from himself. THe same can be said of Juliet. Make sure you understnad this point. It is described on pages 255 - 258 in your textbook.

Another advantage is that you don't need as many particpants since they all particpate in all treatments. This can be helpful if the population youa re studying is small, such as people with a rare disorder.

Disadvantages

The main disadvantages of within-subjects designs is the potneital for confounds due to time-related and environmental effects. Because each participant is in each condition they spend a longer duration of time in the experiment so there is a greater chance that they will quit (particpation attrition) or that conditions will be differnt from one treatment to the next, or that the order in whcih they recieve each treatment will affect the outcome, and so on. These are primarily threats to the internal validity of the study.

Threats to internal validity for within-subject designs (p. 261)

Conflunding from environmental variables

These threats are similar to those for between-subjects designs. If, for example, one treatment is administered in a different room than the others, or at a different time of day than the others, then we cannot be certain if observed differences are due to the treatment or to environmental differences.

Confounding from time-related factors

Particpants in a within-subjects design are measured over time. Since many events can affect a particpant over time, there is always a chance that observed changes in scores on the dependent variable are due to a time related effect rather than to treatment differences. Some common time-related effects include

  •  History - events outside the study affact a particpant; stress may be higher after a terrorist attack or during final exams rather than due to the effects of treatment.
  •  Maturation - a third grader may read better after three months because she is maturing rather than because of a reading intervention
  •  Instrumentation - measuring instruments, inclduding researchersthemselves - may affect scores
  •  Testing effects - later test scores may be affected by earlier ones, as in 'practice effects'
  •  Statistical regression - extreme scores tend to become less extreme over time due to statistical regression

Order effects (p. 262)

Order effects are especially noteworthy because particpation in one condition of the experiment can directly affect participation in another treatment condition. Note that this isnot a problem is a between-subjects design...

Consider practice effects. If our outcome measure is a reading test that we adminsiter to particpants after they particpate in each of three reading interventions, their score may improve on each administration because of the effects of practice taking the test.

Alternately, a person might perfomr worse on each administration due to fatigue.

Both practice effects and fatigue are examples of progressive error.

Carryover effects occur when the order of the treatments makes a difference. That is, when the effects of particpation on the first treatment carryover into the second treatment, and make the outcome of the second treatment better or worse than if had been the only treatment administered. One type of carryover effect is a contrast effect, which occurs when a perception of one tretment is affected by its contrast with another.

Order effects can be confounding variables because they can cause changes in the dependent variable from one tretament condtion to the next that are not due to the treatment alone. Your textbook, on -. 263, provides a detailed example of how an order effect can be a confound.

Dealing with time-related and order effects (p. 265)

 Controlling time

Time-related effects such as history and maturation are only a problem in experiments tha tlast a long time. That is, if a study lasts a week it is unlike that maturation will be a factor. History is another matter. For example, if we were measuring stress in partcipants and the study alsted only three days, but in between our measurements on day 1 and day three an event such as a major terrorist attack or a hurricane occurred, then history woudl inedeed be a confound. However, in general, history will have less of an effect in shorter as compared to longer experiments.

The only potential problem with making the duration fo the expeirment shorter is that there may be an increased risk of order effects. For example, if we want to compare the effects of two drugs on depression, we must allow enough time for the first drug to be completely out of the particpants system before administering the second. Or if we want to consider the impact of two political speeches on mood we must make sure the person's mood is back to baseline or that they are not still thinking about the first speech and thus there may be a contrast effect.

As always, the telative risk of having a shorter Vs. longer experiment must be balanced.

Switch to a between subjects design!

You may be surprised to see this as a way to deal with order efects. And, knowing what kind fo design to select is an important part of doign effective research. If order effects are likely to be a major problem, then a within-subjects expeirmental design may not be a good idea and another design shoudl be considered instead.

Counterbalancing and time related efects

Counterbalancing is a way of matching treatments with respect to time.

In a study with two treatment condtions we would conterbalance by having half of the partcipants undegro treatment condtion 1 followed by treatment condtion 2, and the other half of them undergo treatment condtion 2 first followed by treatment condtion 1.

Counterbalancing eliminates time effects as a confound because effects of time, such as effects of history, maturation, instrumentation and so on will be equal in each treatment condition.

Note that even though we appear to have two groups, it is still a with-in subjects design because all partcipants (one group) receive all levels of the independent variable - only the order is different.

Counterbalancing and order effects

Counterbalancing does not eliminate order effects alotgether, but it does elminate them as a confound. It does this because the order effects are equally spread between those who recieved each treatment condtion in each possible order.

 Limitations of Counterbalancing (p. 269)

Counterbalancing is a routine procedure for reducing variance in within-subjects designs, and like almost any research procedure it has important limitations. It can be especially problematic when:

  • the absolute value of the mean is important. That is, counterbalancing keeps the difference between the treatment condition means constant, but it can increase or decrease the absolute value of the mean treatment effect.

 

  • it increases the within treatment variance by a large amount.  When treatments are counterbalanced some particpants will ahve higher scores in one treatment condition that the other due to order effects and also due to the difference between treatments. The variance due to order effects will make it harder to see a real difference between the treatments.

 

  • The order effects are asymmetrical. For instance, suppose having treatment 1 first improves effects of treatment 2, and having treatment 2 first has no effect on treatment 1. Sine the order effects are not symmetrical, counterbalancing will not balance the order effects. And it real practice we can't always predict whether ornot order effects will be symmertical.

Counterbalancing and the number of treatments (p. 270)

This is also a limitation of counterbalancing, and it is deserving of its own section since there are so many methods of dealing with it...

As the number of treatment conditions goes up the number of different ways to counterbalance increases dramatically.

The equation for figuring out how many different sequences there are, where 'n' is the number of treatment conditions is n!

If you know your math, you may recall that ! is the symbol for 'factorial' which means, n! = n x (n-1) x  (n-2) x (n-3) x (n-4)... x 1

No problem for two treatment conditions, 2! = 2 x (2-1) x1 = 2      For 2 groups there are only 2 sequences, AB and BA

For three treatment conditions, 3! =3 x  (3-1) x (2-1) x 1 = 6          For 3 groups there are 6 sequences ABC, ACB, BAC, BCA, CAB, CBA

For only 4 treatment condtions, 4! = 4 (4-1) x (3-1) x (2-1) x 1 = 24   For four groups there are 24 sequences!

For five treatment conditions, 5! =5(5-1) x (4-1) x (3-1) x (2-1) x1 = 60   For five treatment groups there are 60 sequences

For 6 treatment condtions there are 720 sequences, and....you get the idea.

For even four treatment condtions, 24 groups would be needed for a completely counterbalanced design.

Obviously, this would be impractical because too many partcipants would be needed to do such an experiement.

So researchers deal with this by using partial counterbalancing

Partial Counterbalancing

In partical counterbalancing the number of sequences for treatment condtions is the minimum number of sequences needed so that each treatment condtion occurs each postion at least once. This sounds more complicated than it is.

For example, with four treatment condtions A, B, Cand D with only four sequences, ABCD, CADB, BDAC & DCBA each condition appears in teh first, second, third, and fourth postion at least once.

THis is not a perfect solution. For one thing, there are many combinations of ABCD we could use to create four groups where each sequences appears in each postion at least once.

So how do we decide which combination of treatment orders to use?

One method is to use a latin square

A latin square is a matrix where for a sequence of numbers each member of the sequence appears ine ach postion at least once. One way to create a latin square is by starting with one sequence, say for five treatment condtions ABCDE and making this the first row, and then starting the second row with the LAST member of the sequence, in this example E and then continuing from the beginning, so the second row would be EABCD and the third row would be DEABC and the fourth row CDEAB and the fifth row BCDEA, which produces a matrix whereeach treatment condition appears in evey position fo a sequence at least once.

As your textbook authors point out, that while the above is indeed an example of a latin square, it would not be very good for sequencing treatment condtions because the sequences meet the minimum requirement of partial counterbalancing and no more. For example, even while treatment B appears in every position at least once it is never just before treatments D or E. We could improve it by using a random process to rearrange the columns and then the rows.

A latin square still has the problem of the potential for asymmetrical effects, since not every treatment condition sequence is used. But it is a reasonable compromise for reducing order effects, in between no counterbalancing and full counterbalancing.

By the way, if all of this seems very complicated, you may understand Latin Squares better than you think...

sudoku.png

(image of sudoku game)

If you have ever played Sudoku, then you understand latin squares. Sudoku is nothing more than a special case of a latin square, it is a 9 x 9

latin square, with an additional restriction that each of the smaller 3 x 3 matrices contained within it must contain all of the numbers 1 - 9

 Statistical Analysis of Two-Treatment Within-Subjects Designs

A two treatment within-subject design, where one group of partcipants receives each of two treatment conditions, is the simplest case of a with-in subject experimental design.

If the dependent variable can be expressed as a numeric mean, then either a repeated measures t-test or a single factor Analysis of Variance, also known as ANOVA - repeated measures, can be used to analyze the data.

If the depedent variable is represented on an ordinal scale, then a Wilcoxan test is used to analyze for significant differences.

Finally, if the dependent variable is expressed as a direction of difference, for example whether particpants do better or worse after treatment, then a sign test is used to see if the treatment differences are significant.

By the way, at this point in the class, you only need to know which test is used for which type of design/variable. Later chapters will spend more time discussing statistical analysis...

The two-treatment within-subjects design has some of the same advantages and disadvantages as the two treatment single factor between-subjects design. It's simplicity is its strength and its weakness.

It is easy to do, easy to understand the results, and sicne there are only two conditions the researcher can maximize between treatment differences so the results are more likely to be statistically significant.

On the other hand, with only two treatment conditions we can only say that the treatments are different; we cannot know exactly how the independent variable affects the dependent variable, that is, we can't say much about the functional relationship between the variables.

Multiple Treatment Within-Subject Designs (p. 273)

When the variables in a multiple treatment within-subject design are measured on an interval or ratio scale, then the appropriate statistical test is a repeated measures analysis of variacne to test for significant differences between the treatment means.

Similar to between-subject designs with multiple treatment conditions, in a within-subjects experiment with multiple treatment condtions we can more clearly establish if there is a causal relationships between the independent and dependet variables. This si often accomplished by having particpants experience a series of treatment condtions each with a different level of the independent varialbe. If the researcher can show that the depedent variable reliabilyc hanges each time the level of the iendepednent variable is changed, then this provides compelling evidence that there is a functional relationship between the two variables.

THe problem with this appproach parallels that of the multiple treatmenst between-subjects design. Namely, if the differences between treatments are small, then it will be more difficult to detect small outcomes.

In within-subject designs multiple treatments can also mean that it takes a long time to complete the study and attrition or fatigue might become problematic as partcipants drop out or get tired of partcipating and lose motivation.

Also, as discussed in detail above, when there are many treatment condtions counterbalancing becomes increasingly complex.

Comparing Within-Subject and Between-Subject Experimental Designs

The pros and cons of each design have been discussed above, and the final section of chapter 9 is more about differences between the two designs and what factors might make a researcher choose one design over the other.

There are three major factors that differentiate the designs.

  1. Individual differences
  2. Time-related factors
  3. Number of particpants

Individual differences can be confounding variables in a between-subjects design, and are never a confound in a within-subjects design because each particpant serves as his or her own control. This makes it easier to detect a differecne between treatments in a within-subjects design. If large individual differences are expected, then a within-subjects design is often preferred.

Time related factors and order effects can complicate within-subject designs and are not mcu of a factor in between-subjects designs because particpants are only measured once. If there is any expectation that particpation in one treatment condtion will influence particpation in another treatment codntion, then a between-subjects design should be used.

Fewer partcipants are needed for a within-subjects design since each particpant is in all treatment condtions. If it is difficult to find particpants, say they ahve a rare condition, or are members of a small demographic group, then a within-subjects group may be preferable.

THe specfic research question being asked might also be more amenable to one design than the other.

Your textbook provides an example of this type on p. 275.

Consider another example. Imagine you want to study the effectiveness of a new treatment for anxiety disorders. If you want to know if the treatment is effective at all you might use a within-subjects design. One common design is called a waitlist control design, where particpants first do nothing for some period of time - that is, they have zero level of the independent variable, while they are waiting for treatment, and then they undergo treatment. There is a risk of time effects, that particpants get better in the waitlist condition because time has passed, and the other advantages of a within-subjects design remain. It is ethical when they would be on a waitlist not matter what treatment they were seeking.

If instead of asking, "does the treatment have any effect?" we want to know "does the new treatment work as well as or better than established treatments for anxiety disorders, then a between-subjects design might be better where one group receives the experimental treatment and is compared to a group that receives an established treatment.

Matched-Subjects Designs

In a matched subjecys design some of the advantages of both types of designs are maintained. It is used when a particular individual difference variable or variables are considered especially likely to influence the outcome, and a between-subjects design is desired.

There are two groups, and each partcipant in one group is matched with a participant in the other group with respect to the variable of interest.

In amore basic matching procedure the researcher only makes sure that each group has the same number of particpants with each level of the variable of interest. For example, that there are the same number of particpants with IQ's of 90-100, 100-110, and 110-120.

In a matched sample design, each particpant in one group would be matched with a particpant in the other group with the same IQ score and their scores would be compared.

Particpants can be matched with respect to more than one variable. For example, a male participant who is Asian-American with an IQ of 112 would be matched with another Asian-American male with an IQ of 112.

THis strategy attemts to maintain the advantages of a between-subjects design while creating groups that are more equivalent. However, they do not approach the equivlanece of between-subjects because the partcipants are only alike with respect to a small number of individual differences. THey are never as alike as one individual being measured twice! ALso, it can be time consuming and expensive to find particpants who are good 'matches.' That said, the strategy is useful when a fiarly small number of individual differences may be very important and a between-subjects design is desired.

 

 

 

 

 

 

 

 

 

 

 

Statistical Analyses of two-treatment within-subjects designs

 

Two treatment designs

Multiple Treatment Designs

Comparing within-subjects and between subjects designs

Matched subjects designs