A new study has raised concerns about the reliability of artificial intelligence chatbots in healthcare, finding that they fail to correctly identify or properly rank possible diagnoses in more than 80% of early-stage medical cases.
The findings suggest that while AI tools are increasingly used for symptom checking and health advice, they remain unreliable in situations where patient information is limited or symptoms are still developing.
Struggles With Early Clinical Reasoning
The research, conducted by scientists at Mass General Brigham and published in JAMA Network Open, evaluated 21 large language models across multiple clinical case scenarios.
According to the study, AI systems consistently struggled to generate accurate differential diagnoses when only early or partial patient information was available.
Researchers reported that the models failed to produce appropriate diagnostic reasoning in more than 80% of early-stage cases, although performance improved significantly when full clinical data were provided.
Expert Findings and Methodology
The study tested leading AI systems using structured clinical vignettes designed to simulate real-world diagnostic progression.
Researchers gradually introduced patient information, beginning with symptoms, then adding physical examination details and laboratory results.
The results showed that while many models were able to arrive at correct final diagnoses when given complete information, they struggled significantly with early-stage reasoning — a critical phase in real-world medical decision-making.
A study author noted that off-the-shelf large language models are not yet ready for unsupervised clinical deployment.
Accuracy Improves With More Data
Findings also showed a sharp contrast in performance depending on the amount of clinical information available.
When given full patient data, some models achieved high diagnostic accuracy. However, when only initial symptoms were provided, failure rates rose sharply.
Researchers said this reflects a key limitation of current AI systems, which perform better in structured environments than in uncertain, real-world clinical conditions.
Experts Call for Human Oversight
Medical experts involved in the study emphasised that AI should be used as a supportive tool rather than a replacement for professional diagnosis.
They warned that relying solely on chatbots for early medical guidance could increase the risk of misdiagnosis or missed warning signs, particularly in complex cases.
Read Also:
- UN AI Panel Begins Global Study Focused on Human-Centred AI Impact
- Microsoft unveils Copilot Health in latest push into AI‑powered healthcare
- Dangers In Asking AI Chatbots for Personal Advice
Conclusion
While AI chatbots are becoming increasingly common in healthcare interactions, the study highlights a major limitation in their clinical reasoning. Researchers stress that human medical judgment remains essential, especially in early-stage diagnosis, where uncertainty is highest.
Source
Mass General Brigham / JAMA Network Open study on large language models and clinical reasoning in early diagnostic tasks
https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2847679
Senior Reporter/Editor
Bio: Ugochukwu is a freelance journalist and Editor at AIbase.ng, with a strong professional focus on investigative reporting. He holds a degree in Mass Communication and brings extensive experience in news gathering, reporting, and editorial writing. With over a decade of active engagement across diverse news outlets, he contributes in-depth analytical, practical, and expository articles exploring artificial intelligence and its real-world impact. His seasoned newsroom experience and well-established information networks provide AIbase.ng with credible, timely, and high-quality coverage of emerging AI developments.