News

GitHub Copilot Chat Tackles Java 'One Billion Row Challenge'

Microsoft's Antonio Goncalves put the advanced GitHub Copilot Chat AI tool to work in a coding challenge, and he was impressed with the results.

The One Billion Row Challenge (1BRC) is a Java programming challenge announced early this year by Gunnar Morling that involves processing a text file with 1 billion rows, calculating the min, mean and max temperature value per weather station, and displaying the results sorted alphabetically by station. The goal was to create the fastest implementation. Goncalves took a stab at the challenge to see how Copilot Chat could help. While the Copilot/Chat tools work with Visual Studio 2022 and Visual Studio Code, he chose a non-Microsoft IDE from JetBrains that is also supported, the IntelliJ IDEA.

Copilot
[Click on image for larger view.] Copilot (source: Microsoft).

The base algorithm that challenge creator Morling provided to start with took 4 minutes and 50 seconds to run. Some developers actually managed to process 1 billion rows of data in less than 2 seconds, with the top entry being 00:01.535 (in the minutes:seconds.milliseconds format).

Goncalves, a principal software engineer in Microsoft's Developer Division, didn't come close to two seconds, but he was nevertheless impressed that his code took less than 60 seconds to run on his Mac M1 running on Sonoma with 8 cores and 64Gb of RAM. So he created a pull Rpequest on the 1BRC repository and Morling merged it. His code ran slower on the target platform (Hetzner AX161 server with eight cores): 1 minute and 9 seconds. So he was disappointed that it ran slower than it did on his machine, but pleased overall with the overall experience and performance.

The Code & Copilot Chat
[Click on image for larger view.] The Code & Copilot Chat (source: Microsoft).

"My algorithm is indeed slower than the top ones listed on the leader board," he said in a March 7 blog post. "But it only took me a couple of hours to write, and the code produced by GitHub Copilot is easy to read and to understand ... and still 4 times faster than the baseline."

He summarized his multi-step process that included optimizing the algorithm and the JVM (Java Virtual Machine) itself. And he provided feedback on using GitHub Copilot Chat with a dialogue over several hours that effectively maintained the context of the conversation, providing relevant suggestions and solutions based on the current coding challenge context:

During all the process I was the one in charge of the code. I was the one who decided to accept or reject the suggestions made by GitHub Copilot Chat, or to use a profiler or not. Sometimes GitHub Copilot would give me a suggestion that I would reject because I knew it would not improve the code. Sometimes I would just take control of the code and change it directly in the IDE. Sometimes I would impose my choices to GitHub Copilot (e.g. Being written in Java 21, please use records instead of classes). Sometimes GitHub Copilot gave me a suggestion that I knew wouldn't improve the code, so I rejected it with a thumbs down (which helps Copilot provide better responses in the future).

He also noted the fastest algorithms used different low-level techniques:

  • Partitioning the file into ranges equal to the number of available processors
  • Extracting and storing the weather station names using sun.misc.Unsafe as sequences of integers
  • Using parallelism, branchless code and implementing SWAR (SIMD as a Register)
  • Implementing their own “very simple” HashMap backed by an array
  • Creating code without branches and instead performing a few complex arithmetic and bit operations
  • Compiling Java into native code using GraalVM

More about the challenge can be found in its 1brc GitHub repo.

About the Author

David Ramel is an editor and writer for Converge360.

comments powered by Disqus

Featured

  • Creating Reactive Applications in .NET

    In modern applications, data is being retrieved in asynchronous, real-time streams, as traditional pull requests where the clients asks for data from the server are becoming a thing of the past.

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

  • .NET 9 Preview 3: 'I've Been Waiting 9 Years for This API!'

    Microsoft's third preview of .NET 9 sees a lot of minor tweaks and fixes with no earth-shaking new functionality, but little things can be important to individual developers.

  • Data Anomaly Detection Using a Neural Autoencoder with C#

    Dr. James McCaffrey of Microsoft Research tackles the process of examining a set of source data to find data items that are different in some way from the majority of the source items.

Subscribe on YouTube