2Key Engineering Research Center of Mobile Health Management System, Ministry of Education, Hangzhou, China
Abstract
Introduction: Since the release of DeepSeek, it has attracted substantial global attention and has increasingly been explored as a tool for medical diagnosis, showing promising potential for clinical applications. To comprehensively evaluate the effectiveness, potential, and limitations of DeepSeek in medical diagnosis, thereby informing future research and real-world implementation and supporting the development of AI-assisted diagnostic care. Materials and Methods: We searched Web of Science Core Collection, Embase, MEDLINE, Scopus, IEEE Xplore, and medRxiv from inception to August 8, 2025. Two authors independently screened studies, extracted data according to predefined inclusion and exclusion criteria, and assessed study quality using the Prediction model Risk of Bias Assessment Tool. Results: Twenty-four studies were included, evaluating 6 DeepSeek model variants; DeepSeek-R1 was the most frequently assessed. Quality appraisal indicated a high risk of bias in 13 studies (54%). DeepSeek’s performance varying across medical specialties. Overall performance did not differ significantly between DeepSeek and physicians (p=0.07); however, DeepSeek did not reach physician-level performance, with diagnostic accuracy 7.7% points lower than physicians.
Discussion and Conclusion: DeepSeek demonstrated no statistically significant difference compared with physicians, yet it remained below physician performance. At present, it should not replace expert clinicians. Nevertheless, it may serve as a valuable adjunct in non-specialist settings and as an educational tool for medical trainees.
