Would You Like Some Data With That Learning Game?

UPDATE, JULY 5 2015:

I went looking again for the research paper when I found that the pre-publication draft was no longer available through BrainQuake. A new version of the results is now downloadable through their “Backed by Science” page; after taking a look at it, I must disappointedly confess that I find it to be deliberately misleading regarding the kinds of conclusions that can be drawn from the study. It isn’t simply that language such as “dramatic math learning results that no one had believed were possible” are outlandish overstatements. It is hand-waving over the definition of “comparison group,” and corresponding outright dishonesty about the study’s rigor.


The paper identifies the intervention (Wuzzit Trouble) and comparison (no game) groups, “the only difference between the two groups being the use of the new approach.” But this is not the only difference. The intervention group started with weaker math skills than the comparison group – they had been separated for that reason by the teacher. Since math skills are the exact variable the study attempts to isolate, this difference essentially renders the study’s data meaningless.

The current version of the paper on BrainQuake’s site lauds the study extensively for use of a comparison group and uses it to justify comparing the study to medical research. It states, “any study that does not involve the use of a comparison group is automatically suspect.”

I made the assumption that the work on Wuzzit Trouble was being done in good faith, and gave the authors the benefit of the doubt when blogging about it in May. After seeing the changes, I can no longer be so generous. The Wuzzit Trouble study certainly justified further investigation, but it absolutely is not comparable to medical research, and the claims being made about it are ridiculously inflated. If educational games hope to be taken seriously, the field must raise its research standards dramatically, and cease its absurd self-congratulation over the flimsiest of results.

ORIGINAL, MAY 22, 2015

Have you heard about that 2014 Stanford study on Wuzzit Trouble?

If you’re interested in the role games can play in education, then you probably have. The study findings suggest that even short (regular) sessions with a well-crafted math game can effect significant improvement in kids’ math proficiency.

Graph of study results from the summary paper

This article gets cited a lot — on Forbes Tech, on the Games and Learning blog, on Wuzzit Trouble’s press page, of course — but it isn’t easy to find, as is so frustratingly often the case with academic articles cited and re-cited all over the internet. The reference looks like this:

Holly Pope, Jo Boaler, & Charmaine Mangram, Wuzzit Trouble: The Influence of a Digital Math Game on Student Number Sense, Stanford University, February 2015

BrainQuake is Wuzzit Trouble’s publisher, and Stanford professor Dr. Keith Devlin designed the game; he offers a summary of the study findings through BrainQuake (no longer available). The closest I could get to the actual paper was a pre-publication draft, also through BrainQuake (no longer available on their site).

As usual, the headlines covering this research are a tad out-of-proportion with the significance of the findings, and I’m going to talk about that a bit more, but two important things need to be said first: (1) these research findings are exciting, and (2) our field needs a great deal more research that uses hard data and gives us quantifiable results.

A lot of teachers who use games as instructional tools have a lot of anecdotal evidence about how great it is, how much it engages students, and how it’s the future of education. As you know, I am one of those teachers — not only have my students enjoyed learning through games, but I enjoy teaching with them. I think we’ve reached a point, though, where we really have a need for quantitative studies that can show us hard data about the benefits of games for our students. The Wuzzit Trouble study offers some confirmation of what we already strongly believed: there is something valuable about games, and that something is inherent in play. What games can teach is much more than numbers, dates, and other memorizable pieces of information: it is flexible, creative thinking. How did the Wuzzit Trouble study try to measure that? They had one group of students study math without a game, and another group study the same math and also play the game for a total of 120 minutes. At the end of the study, they asked students a question (“question 4”) for which their gameplay and classroom work hadn’t explicitly prepared them, and looked for answers that demonstrated the kind of number sense and creative logic they hoped gameplay would instill in them. They also asked more standard questions and questions that did resemble problems students solved in the game, but they evaluated the data for these questions separately.

I am excited about these findings, and I am confident that further research will support them. However, the Wuzzit Trouble study has several characteristics that mean it isn’t really appropriate for making the kinds of generalized claims that people want to use it for. Firstly, the Wuzzit Trouble study involved just 57 kids. These 57 kids were all third-graders at one school called the Big Dipper Academy, a school that requires students to pass an entrance exam before they are admitted, and whose student body is therefore “academically, racially, and socioeconomically uncharacteristic of the larger district” (from the pre-publication draft, page 6). The school is high-performing in mathematics, so our students already have a better-than-average grasp of the material in question. Furthermore — and most frustratingly — the participating students didn’t start out on equal footing in math. The control group was a class of math “high performers,” while the treatment group was made up of those students who were having the most difficulty with math. This means that the treatment group had much more room for improvement, while the control group already had a good understanding of the math concepts being taught. Let’s talk about that improvement graph:

graphThe y-axis is mean total score on the test in question, either pre- or post- experiment. Devlin says:

“The graph summarizes the study‚Äôs key finding. In particular, the treatment group, the class that the teacher described as the weaker of the two, had almost caught up with their friends in the hitherto stronger comparison class” (from Devlin’s study summary, page 1).

Hooray for the treatment group kids! It certainly looks like the game kicked these kids into high gear, but what we would really need in order to draw that conclusion is data on another class (even better, several other classes) of students who were also struggling with math. As it stands, we don’t have any standard by which to gauge the improvement the graph shows us. If we’re comparing low-achievers and high-achievers, we’ll need to see how gameplay affects both groups. And what about students who don’t attend a high-achieving school, who are at a much greater disadvantage than these “strugglers” who are already scoring under two points behind the high achievers on the pre-test? Can the game help them?

For me, this study looks like an inviting gateway to further studies. Let’s take our enthusiasm for its encouraging results and channel them into more opportunities to get data. If you’re using games in your classroom, keep some numbers — better yet, get in touch with some researchers who might be interested in your data. Check out research at the GlassLab (that’s games learning and assessment), where there’s even a contact link to the Center for Technology and Learning.


Leave a Reply

Your email address will not be published. Required fields are marked *