by Emily Sholtis & Matthew Pepper, Basis Policy Research
Does playing chess result in better student outcomes? Whether those outcomes are cognitive like mathematics achievement or non-cognitive like confidence, it seems like a straightforward question to answer. Unfortunately, like many educational interventions, rigorously evaluating the impact of chess in schools faces significant obstacles. In this week’s blog we will review the strengths and limitations of three different methodological approaches to measuring the impact of chess on student achievement.
When determining the impact of an intervention the key question is always “compared to what?” We understand the impact of teaching chess because we compare doing so to a scenario in which chess was not taught. Ideally, the difference in measures between receiving chess instruction and not receiving chess instruction is ascribed to impact of chess. Each of these three methodological approaches uses a different approach of finding that counter-factual, the comparison scenario of no chess.
Randomized Control Study
The ideal experimental design – the “gold standard” – is the randomized control trial (RCT). Randomized control trials are the most rigorous research approach because bias is eliminated by randomly assigning individuals from the same population to receive an intervention. This can be as simple as flipping a coin to determine which students from a particular grade would be assigned to receive chess instruction (the treatment group) and which would not (the counter-factual, or control group). An example of an RCT is Vanderbilt’s evaluation of Tennessee’s Pre-K program. In this study, researchers collected lists of applicants wishing to enroll their children in a state sponsored pre-k program. They then randomly assigned these applicants to a treatment group (accepted into the program) and control group (not accepted in to the program). Using these two truly random groups, researchers then compared these students’ performance as they grew up to assess if the pre-k program was effective. While RCTs are superior from a research perspective, they can be challenging to facilitate because parents and schools may not buy-in to a study in which half of the participants – the control group – will be denied access to a potentially beneficial intervention.
In contrast to RCTs, quasi-experimental studies compare a group of individuals who receive an intervention to a group that is statistically similar to the intervention group. For example, scores from a math test of one hundred students who receive chess instruction for six months would be compared to math scores from one hundred students who did not receive chess instruction and who have similar ages, family income, gender, and ethnicities as the treatment group. Note, however, that participation in the two groups is dependent on student and parent self-selection. As a result, unobserved characteristics, such as parental engagement or student motivation, can bias the study’s results. Stated differently, even if the two groups are similar on observable characteristics, students in a chess club may already be more motivated and interested in math compared to a statistically-matched group. Quasi-experimental approaches are often the only way to measure the results of an intervention retrospectively or one that is widely applied, but results from these studies are not considered as rigorous and often generate significant controversy because researchers argue over the extent that unobservable characteristics bias results.
A good example of this method is the Center for Research on Education Outcomes’ Online Charter School Study. In this study, researchers were attempting to compare the performance of students in online charter school to peers in other schools. Because researchers could not feasibly or ethically assign students randomly to schools, they had to rely on a quasi-experimental design. To do so, they used advanced statistics to match each virtual school student to a similar student in a traditional school based on their academic and demographic characteristics. Differences between these groups were then ascribed to the impact of the online charter school.
A regression discontinuity approach is a special type of quasi-experimental design that depends on a natural or policy cut-off that allows researchers to compare students who receive an intervention to a very similar group of students who did not. Patrick McEwan and Joseph S. Shapiro’s “The Benefits of Delayed Primary School Enrollment” is an excellent example of this. In the study they used the school system kindergarten enrollment cutoff, to create treatment and control groups. For example, if a district’s kindergarten enrollment cut-off is September 1st then students with a birthday of Sept 1st – 10th would be similar to students with a birthday of August 20th – 31st in every way except for starting school earlier. These two groups are nearly identical except for the treatment. Using two similar groups like these, the researchers were able to assess if delaying kindergarten enrollment improved student academic outcomes. Regression discontinuities like this represent an improvement over quasi-experimental designs that merely match students on observable characteristics, but they require large samples in order to work and can only work when there is a policy threshold that can be exploited to create separate but closely identical groups.
There are numerous other considerations for research design – such as validity, sample size, and fidelity of implementation – but here we merely attempt to sketch approaches and limitations of measuring the impact of an educational intervention like chess. Determining the impact of chess is neither easy nor straightforward, but well-designed studies and those implemented with fidelity can rigorously contribute to the broader knowledge base. Continue following this blog for reviews of research studies that explore the relationship between chess and student outcomes.