News

GitHub Copilot Chat Tackles Java 'One Billion Row Challenge'

Microsoft's Antonio Goncalves put the advanced GitHub Copilot Chat AI tool to work in a coding challenge, and he was impressed with the results.

The One Billion Row Challenge (1BRC) is a Java programming challenge announced early this year by Gunnar Morling that involves processing a text file with 1 billion rows, calculating the min, mean and max temperature value per weather station, and displaying the results sorted alphabetically by station. The goal was to create the fastest implementation. Goncalves took a stab at the challenge to see how Copilot Chat could help. While the Copilot/Chat tools work with Visual Studio 2022 and Visual Studio Code, he chose a non-Microsoft IDE from JetBrains that is also supported, the IntelliJ IDEA.

Copilot
[Click on image for larger view.] Copilot (source: Microsoft).

The base algorithm that challenge creator Morling provided to start with took 4 minutes and 50 seconds to run. Some developers actually managed to process 1 billion rows of data in less than 2 seconds, with the top entry being 00:01.535 (in the minutes:seconds.milliseconds format).

Goncalves, a principal software engineer in Microsoft's Developer Division, didn't come close to two seconds, but he was nevertheless impressed that his code took less than 60 seconds to run on his Mac M1 running on Sonoma with 8 cores and 64Gb of RAM. So he created a pull Rpequest on the 1BRC repository and Morling merged it. His code ran slower on the target platform (Hetzner AX161 server with eight cores): 1 minute and 9 seconds. So he was disappointed that it ran slower than it did on his machine, but pleased overall with the overall experience and performance.

The Code & Copilot Chat
[Click on image for larger view.] The Code & Copilot Chat (source: Microsoft).

"My algorithm is indeed slower than the top ones listed on the leader board," he said in a March 7 blog post. "But it only took me a couple of hours to write, and the code produced by GitHub Copilot is easy to read and to understand ... and still 4 times faster than the baseline."

He summarized his multi-step process that included optimizing the algorithm and the JVM (Java Virtual Machine) itself. And he provided feedback on using GitHub Copilot Chat with a dialogue over several hours that effectively maintained the context of the conversation, providing relevant suggestions and solutions based on the current coding challenge context:

During all the process I was the one in charge of the code. I was the one who decided to accept or reject the suggestions made by GitHub Copilot Chat, or to use a profiler or not. Sometimes GitHub Copilot would give me a suggestion that I would reject because I knew it would not improve the code. Sometimes I would just take control of the code and change it directly in the IDE. Sometimes I would impose my choices to GitHub Copilot (e.g. Being written in Java 21, please use records instead of classes). Sometimes GitHub Copilot gave me a suggestion that I knew wouldn't improve the code, so I rejected it with a thumbs down (which helps Copilot provide better responses in the future).

He also noted the fastest algorithms used different low-level techniques:

  • Partitioning the file into ranges equal to the number of available processors
  • Extracting and storing the weather station names using sun.misc.Unsafe as sequences of integers
  • Using parallelism, branchless code and implementing SWAR (SIMD as a Register)
  • Implementing their own “very simple” HashMap backed by an array
  • Creating code without branches and instead performing a few complex arithmetic and bit operations
  • Compiling Java into native code using GraalVM

More about the challenge can be found in its 1brc GitHub repo.

About the Author

David Ramel is an editor and writer at Converge 360.

comments powered by Disqus

Featured

  • Mastering Blazor Authentication and Authorization

    At the Visual Studio Live! @ Microsoft HQ developer conference set for August, Rockford Lhotka will explain the ins and outs of authentication across Blazor Server, WebAssembly, and .NET MAUI Hybrid apps, and show how to use identity and claims to customize application behavior through fine-grained authorization.

  • Linear Support Vector Regression from Scratch Using C# with Evolutionary Training

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the linear support vector regression (linear SVR) technique, where the goal is to predict a single numeric value. A linear SVR model uses an unusual error/loss function and cannot be trained using standard simple techniques, and so evolutionary optimization training is used.

  • Low-Code Report Says AI Will Enhance, Not Replace DIY Dev Tools

    Along with replacing software developers and possibly killing humanity, advanced AI is seen by many as a death knell for the do-it-yourself, low-code/no-code tooling industry, but a new report belies that notion.

  • Vibe Coding with Latest Visual Studio Preview

    Microsoft's latest Visual Studio preview facilitates "vibe coding," where developers mainly use GitHub Copilot AI to do all the programming in accordance with spoken or typed instructions.

  • Steve Sanderson Previews AI App Dev: Small Models, Agents and a Blazor Voice Assistant

    Blazor creator Steve Sanderson presented a keynote at the recent NDC London 2025 conference where he previewed the future of .NET application development with smaller AI models and autonomous agents, along with showcasing a new Blazor voice assistant project demonstrating cutting-edge functionality.

Subscribe on YouTube