AI chatbots, such as OpenAI’s ChatGPT, are being sold as revolutionary tools that will help workers become more efficient in their jobs, and perhaps replace those humans entirely in the future. But a new study has found that ChatGPT gets programming questions wrong 52% of the time.
Key Findings of the Study
The Purdue University study was presented earlier this month at the Computer-Human Interaction Conference in Hawaii. The study examined 517 programming questions on Stack Overflow, which were then submitted to ChatGPT.
“Our analysis found that 52% of ChatGPT responses contained incorrect information and 77% were verbose,” the new study explains. “However, participants in our study still preferred ChatGPT responses 35% of the time due to their completeness and well-articulated writing style.”
Surprisingly, the programmers in the study did not always catch the mistakes made by the AI chatbot. “In 39% of cases, they did not notice misinformation in ChatGPT responses,” the study says. “This speaks to the need to combat misinformation in ChatGPT’s answers to programming questions and raise awareness of the risks associated with seemingly correct answers.”
Industry Implications and Future Outlook
Of course, this is just one study, but it does highlight the problems that anyone using these tools can face. Big tech companies are pouring billions of dollars into artificial intelligence in an attempt to create the most reliable chatbots. Meta*, Microsoft, and Google are in a race to dominate an emerging space that could radically change our relationship with the internet. But there are a number of obstacles on this path.
The main one is the unreliability of artificial intelligence, especially if the user asks a truly unique question. Google’s new AI-powered search consistently returns junk, often from unreliable sources. This week, for example, there were several cases where Google Search presented satirical articles from The Onion as reliable information.
For its part, Google defends itself by arguing that the incorrect answers are an anomaly. “The examples we’ve seen tend to be very rare queries and are not representative of most people’s experiences,” a Google spokesperson said early last week. “The vast majority of AI responses provide high-quality information with links that take you further online.”
But this defense that “unusual queries” show “unusual responses” is somewhat ridiculous. Are users expected to ask these chatbots only the most mundane questions? How is this possible for tools that are supposed to be revolutionary?
NIX Solutions concludes that while AI chatbots like ChatGPT offer significant promise, their current limitations must be acknowledged and addressed. We’ll keep you updated as more research and developments unfold in this rapidly evolving field.