In-Depth
'Microsoft Acquires Google' (& Other Copilot Hallucinations)
Hallucinations have plagued AI coding assistants since day one as large language models (LLMs) are notorious for generating nonsensical and bizarre suggestions of all kinds -- and asserting made-up facts with admirable confidence.
Coders are learning to live with this and double-check suggested code, and systems like GitHub Copilot are getting better at understanding context and providing more relevant suggestions. But hallucinations persist, and this reporter has found them to be problematic and a source of amusement at the same time in one particular non-coding use case: writing prose in Visual Studio Code.
I earlier shared my scheme of using VS Code for journalism, having customized it for that task with a host of extensions for spell checking, HTML boilerplate code and so on, along with user-provided code snippets for things like displaying images. I even used AI to help me create my own .vsix extensions to serve as "super macros" to automate some work.
I recently subscribed to GitHub Copilot and installed the companion GitHub Copilot Chat tool for some hands-on reporting projects, but also to see how AI worked for writing. I found it to be a great help in suggesting sentences and paragraphs.
It can also go wildly off the rails.
The problem has been documented extensively in GitHub issues like Github Copilot Chat often hallucinating and thinking it's the wrong language and Copilot chat hallucinating and inventing contents of an entire non-existent private Github repo and GitHub Chat hallucinates contents of user's code.
A GitHub post about prompt engineering from last year explains: "But be warned: LLMs will also sometimes confidently produce information that isn't real or true, which are typically called 'hallucinations' or 'fabulations.'"
On the writing side of things, I found Copilot to be eerily prescient in suggesting sentences and paragraphs as I typed, seemingly able to read my mind and anticipate what I was going to write, especially after I had typed some words to provide context.
If that context was lacking, however, Copilot provided its own, almost seeming to enjoy its little hallucination jokes.
Sometimes I start an article by writing a headline, and sometimes I just start typing the article body and fill in the headline later. With no context provided by a headline, Copilot is liable to suggest just about anything at the end of my cursor.
I discovered that when I just started an article by typing "Microsoft today announced ..." and Copilot suggested Microsoft had acquired GitHub for $7.5 billion in stock. After the first and each following paragraph, I would just click Enter two times and it would suggest the next paragraph, completing an entire article though it ran out of gas and abandoned any paragraph breaks in favor of a massive stream-of-AI-consciousness blast of text.
That buyout actually happened in 2018, but it wasn't what I was going to write about, as it's kind of old news.
I decided to test the hallucination bounds and give the AI its free reins to see what it would do with the headline: "Microsoft Acquires Google." It happily took the lead and went nuts, providing quotes from Microsoft CEO "Steve Ballmer" (he quit in 2014) and setting the price at $1.2 billion, which seems like the steal of the century to me.
Google CEO "Larry Page" (it's been Sundar Pichai since 2015) was "excited about this new chapter in Google's history" while Ballmer said Microsoft "sees a lot of opportunities in the search market."
I decided to see what Copilot would do with "OpenAI Announced ..." and it said the company announced GPT-3 (that happened in 2020 and it now offers GPT-4 and GPT-4 Turbo).
Why Copilot appears to have a knowledge cut-off date instead of being able to browse the internet to find more current info like some other systems can, I don't know. Some of its outdated suggestions would probably be perfect if the latest data was available.
The AI also took the lead and suggested a lede (opening paragraph) while I was writing this article, for which I supplied the headline first. It's not half bad, as you can see, but it wasn't correct.
I discovered other hallucinations yesterday when I was writing about AI cloud security tools for sister publication Virtualization & Cloud Review. The AI kept trying to be helpful and for some reason insisted on offering URLs for more information that mapped to imaginary articles on another sister publication, RedmondMag.com. Both suggested text snippets below have completely fake URLs:
- Redmondmag.com reported on the release of a new AI-powered cloud security tool from Palo Alto Networks, which is designed to help organizations secure their cloud environments by using machine learning to identify and stop threats.
- just announced the release of Cloudflare AI Security, which uses machine learning to identify and stop threats in real time. The company says it's the first AI-powered security solution that can be deployed at the edge of the network, providing protection for cloud, on-premises and hybrid environments.
So at this point, Copilot for writing in VS Code is a mixed bag for me. It can offer completely accurate and helpful suggestions or it can write like Dr. Hunter S. Thompson on a bad acid trip.
I assume the experience is the same for coders. Let me know by weighing in with a comment below as we wait for some of these data scientist geniuses to figure out the hallucination problem.
About the Author
David Ramel is an editor and writer at Converge 360.