Despite its meteoric rise in popularity, ChatGPT, an artificial intelligence chatbot developed by OpenAI, has come under scrutiny in a recent study. Researchers at Brigham and Women’s Hospital, affiliated with Harvard Medical School, revealed that the cancer treatment plans generated by ChatGPT contained numerous errors. The study, published in JAMA Oncology and reported by Bloomberg, highlighted that approximately one-third of the chatbot’s responses related to cancer treatment plans contained inaccuracies.
The study unveiled that ChatGPT displayed a perplexing mixture of correct and incorrect information, making it challenging to ascertain the veracity of its recommendations. Among the 104 queries examined, nearly 98% of the responses included treatment suggestions aligned with the guidelines set by the National Comprehensive Cancer Network. However, the researchers expressed concern over the integration of erroneous information alongside accurate recommendations, rendering the errors difficult to detect, even for experts.
Dr. Danielle Bitterman, a coauthor of the study, noted the disconcerting blend of misinformation with accurate content, emphasizing that while large language models aim to produce persuasive responses, they are not designed to offer precise medical advice. Dr. Bitterman underscored the critical significance of addressing error rates and response instability, particularly in the clinical domain, as they pose safety concerns.
ChatGPT’s foray into the limelight occurred upon its November 2022 launch, rapidly amassing 100 million active users within two months. However, this study underscores the ongoing challenges faced by generative AI models, such as ChatGPT, known for occasionally presenting misleading or factually incorrect information. For instance, Google’s rival AI model, Bard, triggered a $120 billion decline in the company’s stock value after issuing an inaccurate response regarding the James Webb space telescope.
While AI integration into healthcare systems is already underway, particularly for administrative streamlining, the discrepancies found in models like ChatGPT may impede their readiness for clinical deployment. Notably, GPT-4, a more recent version of the model, showcased impressive clinical judgment, even surpassing some medical professionals in diagnostic accuracy. Nonetheless, the errors identified in ChatGPT underscore the need for further refinement before AI models can confidently be considered as substitutes for medical professionals.
The prevalence of inaccuracies within AI-generated responses points to the complexity of seamlessly merging cutting-edge technology with complex medical decision-making. OpenAI, the creator of ChatGPT, acknowledges that its models should not be employed for medical diagnoses or treatment services for severe medical conditions. As the medical community and tech developers navigate this intricate landscape, the pursuit of accurate, reliable AI-powered healthcare solutions remains a paramount goal.