Ora AI Outperforms UWorld in the First Multi-Institutional RCT of a USMLE Question Bank
Phelps R, et al. Toward AI-Powered Precision Medical Education: A Multi-Institutional Randomized Controlled Trial of an Adaptive Question Bank for the USMLE. Preprint, pending peer review. June 2026.
In a preregistered, blinded-analysis, multi-institutional RCT of 155 medical students, at-risk students improved 2.4× more with Ora than UWorld over the 14-day intervention (+4.4 vs +1.8 questions, p < .001). At 10-month real-exam follow-up, 100% of Ora-assigned Step 1 respondents passed on first attempt (vs 91% UWorld, n.s.), and students with predominant Ora exposure scored 11.0 points higher on first-attempt Step 2 CK (p = .027).
pass rate (vs 91%, n.s.)
vs. UWorld
students (p < .001)
10-mo follow-up
100% (13/13) Ora-assigned vs 91% (10/11) UWorld-assigned passed on first attempt. Fisher's exact p = .46 (n.s. given small n). National avg first-time MD pass rate ~91% (USMLE.org).
Ora +11.0 pts vs UWorld (263.4 vs 252.4). Approximately 0.7 SD on the Step 2 CK scoring scale. National 2024-25 mean: 250 (USMLE Score Interpretation Guidelines).
UWorld's gain is essentially flat across populations (+1.9 → +1.8); Ora's gain rises with student need (+3.2 → +4.4), the adaptive-targeting signal. Adjusted Ora − UWorld effect among at-risk students from per-protocol ANCOVA: +2.61 questions, 95% CI 1.09 to 4.13, p < .001. At-risk = pretest < 36/60 (the approximate NBME first-attempt passing threshold); at-risk students are a subset of the all-students population.
Study design
- Two-arm parallel-group RCT comparing Ora AI vs UWorld.
- 1:1 randomization, computer-generated, stratified by exam level (Step 1 / Step 2 CK).
- Preregistered on AsPredicted (#246348).
- Blinded analysis: arm-coded (A/B) dataset analyzed by Stanford Department of Statistics co-authors, blinded to arm assignment.
- IRB exempt under 45 CFR 46.104(d)(1)-(2); all participants provided electronic consent.
- 155 US medical students enrolled (Ora: 77; UWorld: 78); 121 per-protocol; 51 long-term respondents at 10 months.
- 14-day intervention period during summer 2025.
- Engagement criteria: ≥10 study days and ≥400 questions; verified via platform logs (Ora) and self-report (UWorld).
- Baseline well-matched across pretest score, exam cohort, assessment order, and URiM proportion (all p > .38).
- Short-term outcome: 60-item NBME Free 120 posttest (counterbalanced; Block A↔B).
- Long-term outcome: actual first-attempt USMLE Step 1 pass and Step 2 CK score, self-reported at 10 months.
- Primary analysis: ANCOVA adjusted for pretest, exam cohort, and assessment order; arm × pretest interaction tested per protocol.
- Conducted in R 4.5.3 (emmeans, car, effectsize, mediation).
The 14-day intervention represents a fraction of a typical board prep cycle. Long-term follow-up sample (n = 51) was modest and self-reported on an anonymous survey. UWorld-arm engagement was self-reported while Ora-arm engagement was platform-logged, an asymmetry that may affect engagement-related interpretations. The study was not adequately powered for subgroup analyses (URiM students showed a directional but non-significant larger benefit). Findings warrant confirmatory replication in a larger trial. Preprint pending peer review.