Practice questions outperform passive review.
The single most replicated finding in the cognitive psychology of learning is this: students who spend their study time answering questions they could plausibly get wrong learn more, and forget less, than students who spend the same time re-reading their notes. The effect is large, durable, and observed across thousands of studies. It has a name (the testing effect, sometimes the retrieval-practice effect) and it has been replicated across age groups, subjects, and exam formats.
What follows is a tour of that evidence.
Seven major meta-analyses, in one table.
Each row below summarises a published meta-analysis (a study of studies) that pooled effect sizes from individual experiments comparing retrieval practice to passive control conditions. Sample sizes are total participants across all included experiments. Effect sizes are Cohen's d or Hedges's g; conventionally, 0.2 is small, 0.5 medium, 0.8 large.
| Author & year | Sample (N) | Effect size | Key finding |
|---|---|---|---|
2017 | 41,710 | g = 0.61 | Across 118 studies, testing reliably outperformed re-study; effect held for multiple-choice formats. |
2014 | 12,193 | g = 0.50 | Tested vs re-studied items showed a medium positive effect across 159 effect sizes. |
2021 | 48,478 | g = 0.50 | Testing produced significantly better retention in classroom (not lab) settings. |
2017 | 3,309 | d = 0.49 | Testing effect specifically in higher-education contexts. |
2018 | 7,247 | d = 0.55 | MCQ practice with feedback outperformed re-study on cued recall. |
2016 | 2,890 | d = 0.40 | Retrieval practice produced reliable gains across STEM disciplines. |
2018 | 5,118 | d = 0.40 | Transfer of testing effect to related but untested material. |
The headline number is unusually consistent: across more than 120,000 students in seven independent syntheses, retrieval practice produced effect sizes in the d = 0.40–0.61 range. That is a medium-to-large effect by Cohen's conventions, and it is one of the most robust findings in educational psychology.
Re-reading and highlighting are low-utility techniques.
The 2013 Dunlosky review for the Association for Psychological Science graded ten common study strategies on a four-tier utility scale. Of the techniques students most commonly use (re-reading, highlighting, and summarisation), none reached the highest tier. Dunlosky 2013 rated re-reading as having low utility: the appearance of progress without the underlying gain in retention. Highlighting was rated similarly.
The two techniques that did reach high utility: practice testing and distributed (spaced) practice. Both are core to this product.
How dramatic is the difference, in concrete numbers?
The classic demonstration is Roediger & Karpicke 2006. Students studied a passage and were either re-tested on it once, or re-read it. Both groups recalled roughly the same amount five minutes later. One week later, the tested group recalled 61% of the material; the re-read group recalled 40%. The re-read group felt better prepared (more on that below). They weren't.
Multiple choice, specifically, works.
A common worry: surely recognition tests (MCQs) are weaker than free recall? The evidence says the gap is small when feedback is provided. Little et al. 2012 showed that competitive multiple-choice questions (those with carefully-constructed plausible distractors) produced learning gains comparable to short-answer practice, and superior to re-reading. Greving & Richter 2018 meta-analysed 11 studies of MCQ-with-feedback and found a medium positive effect (d = 0.55).
The distractor quality matters. Practice MCQs whose wrong answers are obvious produce weaker gains than MCQs whose wrong answers are plausible misconceptions. Every question in this bank is reviewed against this criterion.
Spread the practice. Don't cram.
A second large effect compounds with the first: distributing practice over time produces more durable learning than massing it into a single session. Cepeda et al. 2006 meta-analysed 184 distributed-practice experiments and found a robust spacing benefit. Lindsey et al. 2014 showed a 16.2% improvement on a year-end exam from a personalised spacing schedule, in a real classroom.
The product's spaced-review queue is not a feature added on top of the question bank. It is the mechanism by which the question bank produces durable learning rather than short-term performance.
It transfers to real high-stakes assessments.
Lab effects are one thing; the question is whether they survive contact with a real, professional, high-stakes exam. Larsen et al. 2009 randomised pediatric residents to repeated testing or repeated study on clinical material; six months later, the tested group scored substantially higher. The participants weren't undergraduates, the material wasn't a word list, and the interval wasn't a week.
That is the closest existing evidence to what a candidate is doing on this platform: working professionals, technical material, multi-month preparation horizon.
Getting it wrong (with feedback) is part of how it works.
A frequent worry from learners: “Won't I just memorise the wrong answer?” The evidence is reassuring. Kornell et al. 2009 showed that incorrect attempts followed by feedback produced better long-term retention than passive study, even when the initial attempt was wrong. The effect is strongest when feedback is informative, specifically explaining why each option was right or wrong, not just marking it.
Every question on this platform carries an examiner-style explanation that addresses each option, not just the correct one. That is the form of feedback the literature finds is necessary.
Why does passive study feel so productive?
Robert and Elizabeth Bjork's work on desirable difficulties documents what they call the fluency illusion: the smoother and easier study feels, the less learning is actually happening. Re-reading is fluent. Highlighting feels productive. Both produce a strong sense of mastery and relatively weak retention. Retrieval practice feels harder because it is harder, and that is why it works.
This is the most under-appreciated finding in the literature, because it predicts that the techniques that feel best are the techniques that work least. Trust the evidence over the feeling.
The synthesis, in one paragraph.
Forty years of evidence converges: retrieval practice with informative feedback, distributed across time, on questions whose distractors are plausible, is the single most effective evidence-based study technique we know of. It outperforms re-reading and highlighting at medium-to-large effect sizes, in classrooms, in labs, and in real high-stakes professional examinations. That is exactly what this platform delivers, which is why a single Question of the Day, answered honestly today, is more useful than another hour spent re-reading the textbook.