Schulich Medicine & Dentistry researchers asked ChatGPT for medical diagnoses. Here’s what they found.

a male using ChatGPT
(Matheus Bertelli/Pexels)


By Cynthia Fazio

For symptoms like a runny nose and a cough, some might think it’s a common cold, and doesn’t require a doctor’s visit. In these cases, many people turn to sites, such as Google and WebMD for additional reassurance.

Now, with advancements in Artificial Intelligence (AI), some might be tempted to switch from “Dr. Google” to “Dr. ChatGPT.” But can OpenAI's AI-powered chatbot provide accurate medical advice?

Researchers from Schulich School of Medicine & Dentistry set out to answer that question and explore whether ChatGPT can become a reliable resource in health-care and medical education.

Dr. Amrit Kirpalani, assistant professor Department of Paediatrics Dr. Amrit Kirpalani, assistant professor Department of Paediatrics

The study, led by Dr. Amrit Kirpalani, assistant professor in the Department of Paediatrics, was recently published in PLOS One and found that ChatGPT was only 49 per cent accurate when it came to providing the right diagnosis.

ChatGPT is not yet ready to be used as a reliable medical diagnostic tool for complicated cases. But, Kirpalani’s study did find it was able to take complex medical topics and synthesize them in an easy-to-understand manner, an ability that could be beneficial for instructors and health-care providers seeking to deliver medical information in a digestible format.

“To me, the most relevant finding is that ChatGPT delivered its answers in a very simple and easy-to-understand way,” said Kirpalani. “I think that’s important because you can see the potential for it to be used as a great tool to help people learn and understand medical cases – but it can also be very convincing even when it's wrong."

The study asked ChatGPT to diagnose 150 cases through Medscape Clinical Challenges, which are designed to test the diagnostic skills of health-care professionals. Medscape is a public platform with many complex cases, where clinicians vote on what they think is the right answer. The research team, which included third-year medical students Ali Hadi, Edward Tran and Branavan Nagarajan, created instructions asking ChatGPT to choose the correct diagnosis in a multiple-choice format and to provide a rationale.

The chatbot was given information, including patients’ histories, physical examination results, and laboratory or imaging test results. The researchers found it struggled with interpreting test results and sometimes overlooked critical information that was relevant to the diagnosis. However, the chatbot was helpful in providing next diagnostic steps and making medical information more accessible.

More research needed to ‘use AI responsibly’

It is clear from the study that further research and advancements are needed before AI can be used as another tool to help with medical diagnoses. And as new AI models advance and improve, Kirpalani emphasizes the importance of AI literacy.

“AI literacy is important for patients, for providers, for educators and for students because we need to understand how we can use AI responsibly and how it can be applied and leveraged for health-care and medical education purposes.”

Regardless of the accuracy of these online resources, Kirpalani stressed the need to evaluate and double check responses from the internet against reliable, peer-reviewed sources to ensure people have the correct information.

“I would say we're maybe already at the point where we need guidance around prompt engineering – where instructions are developed that can be interpreted and understood by a generative AI model,” said Kirpalani. “We are going to need a lot of oversight on how it's being used to ensure patient safety and to make sure that [this kind of AI technology] will be thoughtfully rolled out.”