Ora’s Physician-Reviewed AI Flashcards Have 449 Fewer Medical Errors than the Student-Crowdsourced AnKing Deck.
Ora AI Research Team. Independent random samples, blinded multi-model AI screening, strict top-tier literature adjudication.
A blinded audit on independently-sampled flashcards (counting only items directly contradicted by a top-tier journal, current major-society guideline, or authoritative textbook) finds Ora's in-production flashcards at a 0.64% strict-confirmed error rate vs AnKing at 1.96%: a 3.07× ratio (Δ 1.32 pp; p ≈ 0.000056). The audited AnKing snapshot includes superseded content dating up to 19 years before the audit. Ora's regeneration pipeline closes the feedback loop in days; the 15 strict-flagged in-production cards are queued for physician review.
strict-error-rate ratio
(15 of 2,347 in-production)
(49 of 2,498)
in audited AnKing cards
Absolute Δ 1.32 pp (95% CI 0.69–1.95); ratio 3.07×; p ≈ 0.000056. Extrapolated to AnKing's full ~34,000-card deck at the audited rates, the 1.32 pp gap projects to roughly 449 more strict-literature-contradicted cards than Ora would carry at its 0.64% rate (1.96% × 34,000 ≈ 666 vs 0.64% × 34,000 ≈ 217). Three of the 18 originally strict-flagged Ora cards had already been suspended by Ora's QA pipeline before the audit; the remaining 15 are in physician-review queue.
Selected examples; bar widths scale to 19 years. 35 of 49 AnKing strict-confirmed cards (71%) cite a current major-society guideline, regulatory labeling, or scientific statement as the contradicting source. Ora's 15 in-production strict-confirmed cards are first-generation factual issues, not guideline-supersession.
Under the strictest literature bar (top-tier journal, current society guideline, or current authoritative textbook only), Ora's in-production flashcards carry 3.07× fewer factual errors than AnKing in the audited snapshots. 71% of AnKing's strict-confirmed errors are contradicted by a current major-society guideline or scientific statement, with documented drift up to 19 years despite continuous community editing. Ora's 15 remaining in-production strict-confirmed cards are queued for physician review on a sub-week turnaround.
Why crowdsourced editing lags. Why an AI pipeline doesn't.
AnKing is the dominant shared deck in US medical education: roughly 34,000 cards (AnKing v12) revised through an AnkiHub workflow where students propose edits, the community reviews, and the maintainer publishes a release. The structural cost is latency: a new guideline must reach a contributor's attention, be translated into a card edit, survive review, and propagate through release cycles. Multi-year drift is the predictable consequence.
Ora's flashcards are drafted by a physician-trained AI from a continuously refreshed literature corpus and regenerated against current standards on a sub-week cadence. New guidelines flow into card content without waiting for a contributor to notice, propose, review, and release. The direction survives every sensitivity check: each grader independently flagged Ora at a lower rate, the ratio holds in both AnKing-mapped and unmapped subsets and across 14 of 18 NBME topics, and the strict-bar funnel tightens the headline at each layer: both rates fall, the gap widens.
Method
- Two independent random samples drawn at the variant level from each corpus, stratified across 18 depth-2 NBME organ-system topics; image-only cards excluded symmetrically.
- N. 2,500 Ora + 2,498 AnKing originally drawn. Headline denominator excludes 153 already-suspended Ora cards (not in circulation), yielding 2,347 in-production Ora vs 2,498 AnKing.
- Layer 1. Each card independently graded by three blinded subagents (Claude Opus 4.8, GPT 5.5, Gemini 3.1 Pro). Any-one-flag advances.
- Layer 2. Each flagged card adjudicated against PubMed-indexed primary literature, current society guidelines, and authoritative references. Binary ruling.
- Layer 3 (audit). Twelve parallel subagents re-judged every Layer-2 confirmed error against a four-bucket rubric (real / guideline-update / pedantic / false alarm).
- Layer 4 (strict). Re-verified every retained real error against a top-tier-only bar: NEJM/JAMA/Lancet-tier journals, current major-society guidelines, or current authoritative textbooks. Default: card stands unless rock-solid contradiction.
- Primary. Share of each arm with a Layer-4 strict-confirmed error. Two-proportion z-test; Wald 95% CIs.
- Sensitivity. Re-ran headline under each Layer-1 model as sole screener and on the AnKing-mapped-only subset. Direction unchanged in every case.
- Falsification. Probed sub-corpus concentration, per-grader asymmetry, topic reversals, deck-version drift, and tertiary-source leakage (zero observed).
This is a point-in-time snapshot against current published standards, not a permanent claim about either deck. Both decks evolve: Ora regenerates against a refreshed corpus; AnKing reflects the community release current at audit time. Neither deck is error-free, and the absolute Ora rate (0.64%) is not zero. The Ora denominator excludes already-suspended cards; AnKing has no platform-level suspension flag. Three of 18 NBME topics reverse the headline (Nervous System the only reversal on a non-trivial denominator), reported transparently rather than excluded. Layer-4 strict verdicts are conservative (default-defended) and applied symmetrically across arms.
References
- Lu M, Farhat JH, Beck Dallaghan GL. Enhanced Learning and Retention of Medical Knowledge Using the Mobile Flash Card Application Anki. Med Sci Educ. 2021;31(6):1975–1981. doi:10.1007/s40670-021-01435-w
- American Heart Association. Diagnosis, Workup, Risk Reduction of TIA in the Emergency Department: AHA Scientific Statement. Stroke. 2023.
- Wilson W, et al. Prevention of Infective Endocarditis: AHA Guidelines. Circulation. 2007;116(15):1736–1754.
- CDC. Acanthamoeba Keratitis — Multiple States, 2005–2007. MMWR. 2007;56(21):532–534.
- National Board of Medical Examiners. Constructing Written Test Questions (NBME Item-Writing Guidelines). nbme.org IWG Gold Book
- AnkiHub Community. AnkiHub's Role in the AnKing Step Deck (community change-note process). community.ankihub.net