News

GitHub Copilot Chat Tackles Java 'One Billion Row Challenge'

Microsoft's Antonio Goncalves put the advanced GitHub Copilot Chat AI tool to work in a coding challenge, and he was impressed with the results.

The One Billion Row Challenge (1BRC) is a Java programming challenge announced early this year by Gunnar Morling that involves processing a text file with 1 billion rows, calculating the min, mean and max temperature value per weather station, and displaying the results sorted alphabetically by station. The goal was to create the fastest implementation. Goncalves took a stab at the challenge to see how Copilot Chat could help. While the Copilot/Chat tools work with Visual Studio 2022 and Visual Studio Code, he chose a non-Microsoft IDE from JetBrains that is also supported, the IntelliJ IDEA.

Copilot
[Click on image for larger view.] Copilot (source: Microsoft).

The base algorithm that challenge creator Morling provided to start with took 4 minutes and 50 seconds to run. Some developers actually managed to process 1 billion rows of data in less than 2 seconds, with the top entry being 00:01.535 (in the minutes:seconds.milliseconds format).

Goncalves, a principal software engineer in Microsoft's Developer Division, didn't come close to two seconds, but he was nevertheless impressed that his code took less than 60 seconds to run on his Mac M1 running on Sonoma with 8 cores and 64Gb of RAM. So he created a pull Rpequest on the 1BRC repository and Morling merged it. His code ran slower on the target platform (Hetzner AX161 server with eight cores): 1 minute and 9 seconds. So he was disappointed that it ran slower than it did on his machine, but pleased overall with the overall experience and performance.

The Code & Copilot Chat
[Click on image for larger view.] The Code & Copilot Chat (source: Microsoft).

"My algorithm is indeed slower than the top ones listed on the leader board," he said in a March 7 blog post. "But it only took me a couple of hours to write, and the code produced by GitHub Copilot is easy to read and to understand ... and still 4 times faster than the baseline."

He summarized his multi-step process that included optimizing the algorithm and the JVM (Java Virtual Machine) itself. And he provided feedback on using GitHub Copilot Chat with a dialogue over several hours that effectively maintained the context of the conversation, providing relevant suggestions and solutions based on the current coding challenge context:

During all the process I was the one in charge of the code. I was the one who decided to accept or reject the suggestions made by GitHub Copilot Chat, or to use a profiler or not. Sometimes GitHub Copilot would give me a suggestion that I would reject because I knew it would not improve the code. Sometimes I would just take control of the code and change it directly in the IDE. Sometimes I would impose my choices to GitHub Copilot (e.g. Being written in Java 21, please use records instead of classes). Sometimes GitHub Copilot gave me a suggestion that I knew wouldn't improve the code, so I rejected it with a thumbs down (which helps Copilot provide better responses in the future).

He also noted the fastest algorithms used different low-level techniques:

  • Partitioning the file into ranges equal to the number of available processors
  • Extracting and storing the weather station names using sun.misc.Unsafe as sequences of integers
  • Using parallelism, branchless code and implementing SWAR (SIMD as a Register)
  • Implementing their own “very simple” HashMap backed by an array
  • Creating code without branches and instead performing a few complex arithmetic and bit operations
  • Compiling Java into native code using GraalVM

More about the challenge can be found in its 1brc GitHub repo.

About the Author

David Ramel is an editor and writer at Converge 360.

comments powered by Disqus

Featured

  • IDE Irony: Coding Errors Cause 'Critical' Vulnerability in Visual Studio

    In a larger-than-normal Patch Tuesday, Microsoft warned of a "critical" vulnerability in Visual Studio that should be fixed immediately if automatic patching isn't enabled, ironically caused by coding errors.

  • Building Blazor Applications

    A trio of Blazor experts will conduct a full-day workshop for devs to learn everything about the tech a a March developer conference in Las Vegas keynoted by Microsoft execs and featuring many Microsoft devs.

  • Gradient Boosting Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the gradient boosting regression technique, where the goal is to predict a single numeric value. Compared to existing library implementations of gradient boosting regression, a from-scratch implementation allows much easier customization and integration with other .NET systems.

  • Microsoft Execs to Tackle AI and Cloud in Dev Conference Keynotes

    AI unsurprisingly is all over keynotes that Microsoft execs will helm to kick off the Visual Studio Live! developer conference in Las Vegas, March 10-14, which the company described as "a must-attend event."

  • Copilot Agentic AI Dev Environment Opens Up to All

    Microsoft removed waitlist restrictions for some of its most advanced GenAI tech, Copilot Workspace, recently made available as a technical preview.

Subscribe on YouTube