A Systematic Review and Meta-Analysis of Diagnostic Performance Comparison between DeepSeek and Physicians

Zeng, Jianwen; Zhu, Xule; Liu, Xin; Shen, Shiying; Li, Sixie; Cao, Shihua

doi:10.14744/lhhs.2026.81418

Jianwen Zeng¹ ,

Xule Zhu¹ ,

Xin Liu¹ ,

Shiying Shen¹ ,

Sixie Li¹ ,

Shihua Cao²

¹School of Public Health and Nursing, Hangzhou Normal University, Hangzhou, China
²Key Engineering Research Center of Mobile Health Management System, Ministry of Education, Hangzhou, China

Lokman Hekim Health Sciences 2026; 6(2): 323-333 DOI: 10.14744/lhhs.2026.81418

Full Text PDF

Abstract

Introduction: Since the release of DeepSeek, it has attracted substantial global attention and has increasingly been explored as a tool for medical diagnosis, showing promising potential for clinical applications. To comprehensively evaluate the effectiveness, potential, and limitations of DeepSeek in medical diagnosis, thereby informing future research and real-world implementation and supporting the development of AI-assisted diagnostic care. Materials and Methods: We searched Web of Science Core Collection, Embase, MEDLINE, Scopus, IEEE Xplore, and medRxiv from inception to August 8, 2025. Two authors independently screened studies, extracted data according to predefined inclusion and exclusion criteria, and assessed study quality using the Prediction model Risk of Bias Assessment Tool. Results: Twenty-four studies were included, evaluating 6 DeepSeek model variants; DeepSeek-R1 was the most frequently assessed. Quality appraisal indicated a high risk of bias in 13 studies (54%). DeepSeek’s performance varying across medical specialties. Overall performance did not differ significantly between DeepSeek and physicians (p=0.07); however, DeepSeek did not reach physician-level performance, with diagnostic accuracy 7.7% points lower than physicians.
Discussion and Conclusion: DeepSeek demonstrated no statistically significant difference compared with physicians, yet it remained below physician performance. At present, it should not replace expert clinicians. Nevertheless, it may serve as a valuable adjunct in non-specialist settings and as an educational tool for medical trainees.

Keywords: DeepSeek; Diagnosis; Large language model; Meta-analysis; Systematic review