ChatGPT-4, the latest version of Open AI’s massively popular dialogue-based artificial intelligence (AI) model, delivered a “low performance” when tasked with answering questions about echocardiography, according to a new research letter published in JACC: Cardiovascular Imaging.[1]
“ChatGPT possesses the capacity to swiftly generate curated content, which lends to its utilization as a writing tool and, in some cases, as an author,” wrote co-author Arun Umesh Mahtani, MD, with the department of internal medicine at Richmond University Medical Center, and colleagues. “Currently, JACC journals prohibit the use of large language models (LLMs) and other AI methods for writing and authorship while requiring disclosure and responsibility for data integrity, if utilized. At the same time, the use of LLMs in publishing is expected to evolve.”
Mahtani et al. asked ChatGPT to answer 150 questions one might find in a board certification exam and explain its answers in a way that suggests a full understanding of the knowledge contained in a popular textbook, Clinical Echocardiography Review: A Self-Assessment Tool. This included open-ended (OE) questions, multiple choice without forced justification (MC-NJ) questions and multiple choice with forced justification (MC-J) questions. While some of the 150 questions were clinical vignettes, others were fact-based or formula-based questions.
Overall, ChatGPT provided acceptable answers to 47.3% of OE questions, 53.3% of MC-NJ questions and 55.3% of MC-J questions. Among the OE questions it answered correctly, 81.6% were fact-based questions, 12.6% were clinical vignettes and 5.6% were formula-based questions. A similar breakdown was seen in MC-NJ and MC-J questions.