ChatGPT Aces US Medical Licensing Exam & Diagnoses Rare Condition in Seconds

April 10, 2023 thetechtribune

Dr. Isaac Kohane, a computer scientist at Harvard and a physician, along with two colleagues, conducted an experiment to evaluate the performance of the newest artificial intelligence model from OpenAI, GPT-4, in a medical setting. The primary objective was to determine how well the AI model could diagnose medical conditions, and the results were impressive. Kohane, in the forthcoming book, “The AI Revolution in Medicine,” co-authored by independent journalist Carey Goldberg and Microsoft vice president of research Peter Lee, said that GPT-4 correctly answered US medical exam licensing questions more than 90% of the time, surpassing its predecessors, GPT-3 and -3.5, and some licensed doctors.

Moreover, GPT-4 is not just a good test-taker and fact finder; it is also an excellent translator, able to translate discharge information for patients who speak different languages, such as Portuguese. Additionally, it can break down technical jargon into easily understandable language for a 6th grader. The book’s authors explain that GPT-4 can give doctors helpful bedside manner suggestions, offering guidance on how to communicate with patients about their conditions in a compassionate and understandable way. The AI model can read lengthy reports or studies and summarize them quickly, explaining its reasoning in a way that appears human-like.

However, GPT-4’s intelligence is “limited to patterns in the data and does not involve true understanding or intentionality,” as the model itself acknowledges. Although it can diagnose medical conditions, Kohane discovered that it could make errors, which could be catastrophic in prescribing or diagnosing patients’ medical conditions. As a result, Kohane is anxious that GPT-4’s medical expertise is made available to millions of families, and there is no guarantee or certification that GPT-4’s advice will be safe or effective.

The book is filled with examples of GPT-4’s mistakes, ranging from simple clerical errors to math mistakes. Sometimes, the AI model “hallucinates,” making up answers or disobeying requests, which raises significant ethical concerns. While GPT-4 has the potential to free up valuable time and resources in the clinic, allowing clinicians to spend more time with their patients instead of their computer screens, the authors urge us to consider the implications of smarter and smarter machines, eventually surpassing human intelligence in almost every dimension. It is essential to reflect on how we want that world to work and establish safeguards to ensure AI models like GPT-4’s advice are safe and effective.