The Science Behind Practice Questions

1. The headline finding

Practice questions outperform passive review.

The single most replicated finding in the cognitive psychology of learning is this: students who spend their study time answering questions they could plausibly get wrong learn more, and forget less, than students who spend the same time re-reading their notes. The effect is large, durable, and observed across thousands of studies. It has a name (the testing effect, sometimes the retrieval-practice effect) and it has been replicated across age groups, subjects, and exam formats.

What follows is a tour of that evidence.

2. How strong is the evidence?

Seven major meta-analyses, in one table.

Each row below summarises a published meta-analysis (a study of studies) that pooled effect sizes from individual experiments comparing retrieval practice to passive control conditions. Sample sizes are total participants across all included experiments. Effect sizes are Cohen's d or Hedges's g; conventionally, 0.2 is small, 0.5 medium, 0.8 large.

Major meta-analyses of retrieval practice
Author & year	Sample (N)	Effect size	Key finding
Adesope et al. 2017	41,710	g = 0.61	Across 118 studies, testing reliably outperformed re-study; effect held for multiple-choice formats.
Rowland 2014	12,193	g = 0.50	Tested vs re-studied items showed a medium positive effect across 159 effect sizes.
Yang et al. 2021	48,478	g = 0.50	Testing produced significantly better retention in classroom (not lab) settings.
Schwieren et al. 2017	3,309	d = 0.49	Testing effect specifically in higher-education contexts.
Greving & Richter 2018	7,247	d = 0.55	MCQ practice with feedback outperformed re-study on cued recall.
van Eersel et al. 2016	2,890	d = 0.40	Retrieval practice produced reliable gains across STEM disciplines.
Pan & Rickard 2018	5,118	d = 0.40	Transfer of testing effect to related but untested material.

The headline number is unusually consistent: across more than 120,000 students in seven independent syntheses, retrieval practice produced effect sizes in the d = 0.40–0.61 range. That is a medium-to-large effect by Cohen's conventions, and it is one of the most robust findings in educational psychology.

Convinced?Open a free account in 30 seconds and start with today's question.

Start practising free

3. The alternative

Re-reading and highlighting are low-utility techniques.

The 2013 Dunlosky review for the Association for Psychological Science graded ten common study strategies on a four-tier utility scale. Of the techniques students most commonly use (re-reading, highlighting, and summarisation), none reached the highest tier. Dunlosky 2013 rated re-reading as having low utility: the appearance of progress without the underlying gain in retention. Highlighting was rated similarly.

The two techniques that did reach high utility: practice testing and distributed (spaced) practice. Both are core to this product.

4. The forgetting curve

How dramatic is the difference, in concrete numbers?

The classic demonstration is Roediger & Karpicke 2006. Students studied a passage and were either re-tested on it once, or re-read it. Both groups recalled roughly the same amount five minutes later. One week later, the tested group recalled 61% of the material; the re-read group recalled 40%. The re-read group felt better prepared (more on that below). They weren't.

“Testing is a more powerful learning tool than studying.”Roediger & Karpicke (2006), summarising their own findings

5. MCQs specifically

Multiple choice, specifically, works.

A common worry: surely recognition tests (MCQs) are weaker than free recall? The evidence says the gap is small when feedback is provided. Little et al. 2012 showed that competitive multiple-choice questions (those with carefully-constructed plausible distractors) produced learning gains comparable to short-answer practice, and superior to re-reading. Greving & Richter 2018 meta-analysed 11 studies of MCQ-with-feedback and found a medium positive effect (d = 0.55).

The distractor quality matters. Practice MCQs whose wrong answers are obvious produce weaker gains than MCQs whose wrong answers are plausible misconceptions. Every question in this bank is reviewed against this criterion.

6. Spacing

Spread the practice. Don't cram.

A second large effect compounds with the first: distributing practice over time produces more durable learning than massing it into a single session. Cepeda et al. 2006 meta-analysed 184 distributed-practice experiments and found a robust spacing benefit. Lindsey et al. 2014 showed a 16.2% improvement on a year-end exam from a personalised spacing schedule, in a real classroom.

The product's spaced-review queue is not a feature added on top of the question bank. It is the mechanism by which the question bank produces durable learning rather than short-term performance.

7. Transfer to real exams

It transfers to real high-stakes assessments.

Lab effects are one thing; the question is whether they survive contact with a real, professional, high-stakes exam. Larsen et al. 2009 randomised pediatric residents to repeated testing or repeated study on clinical material; six months later, the tested group scored substantially higher. The participants weren't undergraduates, the material wasn't a word list, and the interval wasn't a week.

That is the closest existing evidence to what a candidate is doing on this platform: working professionals, technical material, multi-month preparation horizon.

8. Errors

Getting it wrong (with feedback) is part of how it works.

A frequent worry from learners: “Won't I just memorise the wrong answer?” The evidence is reassuring. Kornell et al. 2009 showed that incorrect attempts followed by feedback produced better long-term retention than passive study, even when the initial attempt was wrong. The effect is strongest when feedback is informative, specifically explaining why each option was right or wrong, not just marking it.

Every question on this platform carries an examiner-style explanation that addresses each option, not just the correct one. That is the form of feedback the literature finds is necessary.

9. Fluency

Why does passive study feel so productive?

Robert and Elizabeth Bjork's work on desirable difficulties documents what they call the fluency illusion: the smoother and easier study feels, the less learning is actually happening. Re-reading is fluent. Highlighting feels productive. Both produce a strong sense of mastery and relatively weak retention. Retrieval practice feels harder because it is harder, and that is why it works.

This is the most under-appreciated finding in the literature, because it predicts that the techniques that feel best are the techniques that work least. Trust the evidence over the feeling.

10. The bottom line

The synthesis, in one paragraph.

Forty years of evidence converges: retrieval practice with informative feedback, distributed across time, on questions whose distractors are plausible, is the single most effective evidence-based study technique we know of. It outperforms re-reading and highlighting at medium-to-large effect sizes, in classrooms, in labs, and in real high-stakes professional examinations. That is exactly what this platform delivers, which is why a single Question of the Day, answered honestly today, is more useful than another hour spent re-reading the textbook.

The case for practice questions,
with the receipts.

Practice questions outperform passive review.

Seven major meta-analyses, in one table.

Re-reading and highlighting are low-utility techniques.

How dramatic is the difference, in concrete numbers?

Multiple choice, specifically, works.

Spread the practice. Don't cram.

It transfers to real high-stakes assessments.

Getting it wrong (with feedback) is part of how it works.

Why does passive study feel so productive?

The synthesis, in one paragraph.

Every claim above has a working DOI.

You've read the evidence.
Try it.

The case for practice questions,with the receipts.

Practice questions outperform passive review.

Seven major meta-analyses, in one table.

Re-reading and highlighting are low-utility techniques.

How dramatic is the difference, in concrete numbers?

Multiple choice, specifically, works.

Spread the practice. Don't cram.

It transfers to real high-stakes assessments.

Getting it wrong (with feedback) is part of how it works.

Why does passive study feel so productive?

The synthesis, in one paragraph.

Every claim above has a working DOI.

You've read the evidence.Try it.

The case for practice questions,
with the receipts.

You've read the evidence.
Try it.