News

GitHub Copilot Chat Tackles Java 'One Billion Row Challenge'

Microsoft's Antonio Goncalves put the advanced GitHub Copilot Chat AI tool to work in a coding challenge, and he was impressed with the results.

The One Billion Row Challenge (1BRC) is a Java programming challenge announced early this year by Gunnar Morling that involves processing a text file with 1 billion rows, calculating the min, mean and max temperature value per weather station, and displaying the results sorted alphabetically by station. The goal was to create the fastest implementation. Goncalves took a stab at the challenge to see how Copilot Chat could help. While the Copilot/Chat tools work with Visual Studio 2022 and Visual Studio Code, he chose a non-Microsoft IDE from JetBrains that is also supported, the IntelliJ IDEA.

Copilot
[Click on image for larger view.] Copilot (source: Microsoft).

The base algorithm that challenge creator Morling provided to start with took 4 minutes and 50 seconds to run. Some developers actually managed to process 1 billion rows of data in less than 2 seconds, with the top entry being 00:01.535 (in the minutes:seconds.milliseconds format).

Goncalves, a principal software engineer in Microsoft's Developer Division, didn't come close to two seconds, but he was nevertheless impressed that his code took less than 60 seconds to run on his Mac M1 running on Sonoma with 8 cores and 64Gb of RAM. So he created a pull Rpequest on the 1BRC repository and Morling merged it. His code ran slower on the target platform (Hetzner AX161 server with eight cores): 1 minute and 9 seconds. So he was disappointed that it ran slower than it did on his machine, but pleased overall with the overall experience and performance.

The Code & Copilot Chat
[Click on image for larger view.] The Code & Copilot Chat (source: Microsoft).

"My algorithm is indeed slower than the top ones listed on the leader board," he said in a March 7 blog post. "But it only took me a couple of hours to write, and the code produced by GitHub Copilot is easy to read and to understand ... and still 4 times faster than the baseline."

He summarized his multi-step process that included optimizing the algorithm and the JVM (Java Virtual Machine) itself. And he provided feedback on using GitHub Copilot Chat with a dialogue over several hours that effectively maintained the context of the conversation, providing relevant suggestions and solutions based on the current coding challenge context:

During all the process I was the one in charge of the code. I was the one who decided to accept or reject the suggestions made by GitHub Copilot Chat, or to use a profiler or not. Sometimes GitHub Copilot would give me a suggestion that I would reject because I knew it would not improve the code. Sometimes I would just take control of the code and change it directly in the IDE. Sometimes I would impose my choices to GitHub Copilot (e.g. Being written in Java 21, please use records instead of classes). Sometimes GitHub Copilot gave me a suggestion that I knew wouldn't improve the code, so I rejected it with a thumbs down (which helps Copilot provide better responses in the future).

He also noted the fastest algorithms used different low-level techniques:

  • Partitioning the file into ranges equal to the number of available processors
  • Extracting and storing the weather station names using sun.misc.Unsafe as sequences of integers
  • Using parallelism, branchless code and implementing SWAR (SIMD as a Register)
  • Implementing their own “very simple” HashMap backed by an array
  • Creating code without branches and instead performing a few complex arithmetic and bit operations
  • Compiling Java into native code using GraalVM

More about the challenge can be found in its 1brc GitHub repo.

About the Author

David Ramel is an editor and writer at Converge 360.

comments powered by Disqus

Featured

  • Compare New GitHub Copilot Free Plan for Visual Studio/VS Code to Paid Plans

    The free plan restricts the number of completions, chat requests and access to AI models, being suitable for occasional users and small projects.

  • Diving Deep into .NET MAUI

    Ever since someone figured out that fiddling bits results in source code, developers have sought one codebase for all types of apps on all platforms, with Microsoft's latest attempt to further that effort being .NET MAUI.

  • Copilot AI Boosts Abound in New VS Code v1.96

    Microsoft improved on its new "Copilot Edit" functionality in the latest release of Visual Studio Code, v1.96, its open-source based code editor that has become the most popular in the world according to many surveys.

  • AdaBoost Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the AdaBoost.R2 algorithm for regression problems (where the goal is to predict a single numeric value). The implementation follows the original source research paper closely, so you can use it as a guide for customization for specific scenarios.

  • Versioning and Documenting ASP.NET Core Services

    Building an API with ASP.NET Core is only half the job. If your API is going to live more than one release cycle, you're going to need to version it. If you have other people building clients for it, you're going to need to document it.

Subscribe on YouTube