News

Analysis of GitHub C# Code Reveals Most Popular NuGet Packages, Tabs vs. Spaces, More

Perhaps answering the tabs-vs.-spaces indentation question forever, a developer has used Google's BigQuery analytics tool to investigate all things related to the C# programming language in GitHub's vast trove of open source code projects.

To get the most pressing question out of the way first: It's spaces over tabs by a wide margin.

Moving on to perhaps more practical findings, the most often used NuGet package is Newtonsoft.Json, used -- as the name suggests -- for working with JSON data. Some developers find it especially useful for wrestling with the results of a REST API call to get the returned JSON response in whatever format is needed, such as a string array to populate a ListView.

Those findings and many more were published last week by London-based C# jockey Matt Warren -- a Microsoft MVP who contributes to the BenchmarkDotNet project -- in a post titled "Analyzing C# code on GitHub with BigQuery."

Top NuGet Packages
[Click on image for larger view.] Top NuGet Packages (source: Matt Warren).

"Just over a year ago Google made all the open source code on GitHub available for querying within BigQuery and as if that wasn't enough you can run a terabyte of queries each month for free!" he said in last week's post.

"So in this post I am going to be looking at all the C# source code on GitHub and what we can find out from it. Handily a smaller, C# only, dataset has been made available (in BigQuery you are charged per byte read), called fh-bigquery:github_extracts.contents_net_cs and has:

  • 5,885,933 unique '.cs' files
  • 792,166,632 lines of code (LOC)
  • 37.17 GB (37,174,783,891 bytes) of data

Which is a pretty comprehensive set of C# source code!"

Following are some of his findings.

  • Nearly 83 percent of files (4.6 million total -- those with more than 10 lines starting with a tab or a space) use spaces for indentation. This has been covered before, with a Stack Overflow analysis even resulting in the claim that developers who use spaces make more money than their tabbing brethren. A larger BigQuery analysis of even more GitHub code (way beyond just C#, to the tune of 1 billion files) also showed a developer preference for spaces.
  • Json.NET is the clear winner in the most-popular-NuGet package contest, found in an analysis of "packages.config" files with 104,808 entries. Newton.Json was first with a count of 45,055 entries, followed by Microsoft.Web.Infrastructure (16,022) and Microsoft.AspNet.Razor (15,109). In fact, after Newtonsoft, all other leading packages came from Microsoft until jQuery at No. 10 (10,646 entries).
  • The most prevalent "using" statement included at the top of a C# file (after weeding out those that come in automatically with every new project and are unlikely to be removed by many developers) is for using System.IO (a count of 407,848), followed by System.Collections (289,867) and System.Reflection (218,369). NUnit.Framework is No. 1 among those that aren't System, Microsoft or Windows namespaces, followed by UnityEngine and Xunit:
  • Top Using Statements (that aren't System, Microsoft or Windows namespaces)
    [Click on image for larger view.] Top Using Statements (that aren't System, Microsoft or Windows namespaces) (source: Matt Warren).
  • The most widely thrown Exception is ArgumentNullException (a count of 699,526), followed by ArgumentException (361,616) and NotImplementedException (340,361).
  • There are 218,643 files (out of 5,885,933) that have at least one usage of async or await in them.
  • 90 percent of the repositories (with C# files) have 95 files or fewer. 95 percent have 170 files or fewer and 99 percent have 535 files or fewer.
  • The top three largest repositories, by number of C# files, are: https://github.com/xen2/mcs -- C# components of the Mono project -- (23,389 files); https://github.com/mater06/LEGOChimaOnlineReloaded -- server for LEGO Legends of Chima Online! -- (14,241); and https://github.com/Microsoft/referencesource -- .NET Reference Source -- (13051).
  • The most popular repository with C# code in it is, ironically, a Google repository: https://github.com/grpc/grpc, which contains code from different languages for using Google's gRCP library for its open source remote procedure call (RPC) system. With 11,075 stars and 237 files, it beat out https://github.com/dotnet/coreclr -- the .NET Core runtime, called CoreCLR -- (8,576 stars, 6,503 files) and https://github.com/dotnet/roslyn -- the .NET Compiler Platform -- (8,422 stars, 6,351 files).

Read Warren's post to find data on the most popular C# class names, most common file names, how many lines of code are in a typical C# file, preference for functional code (measuring use of Lambda operator) and more.

About the Author

David Ramel is an editor and writer at Converge 360.

comments powered by Disqus

Featured

  • Compare New GitHub Copilot Free Plan for Visual Studio/VS Code to Paid Plans

    The free plan restricts the number of completions, chat requests and access to AI models, being suitable for occasional users and small projects.

  • Diving Deep into .NET MAUI

    Ever since someone figured out that fiddling bits results in source code, developers have sought one codebase for all types of apps on all platforms, with Microsoft's latest attempt to further that effort being .NET MAUI.

  • Copilot AI Boosts Abound in New VS Code v1.96

    Microsoft improved on its new "Copilot Edit" functionality in the latest release of Visual Studio Code, v1.96, its open-source based code editor that has become the most popular in the world according to many surveys.

  • AdaBoost Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the AdaBoost.R2 algorithm for regression problems (where the goal is to predict a single numeric value). The implementation follows the original source research paper closely, so you can use it as a guide for customization for specific scenarios.

  • Versioning and Documenting ASP.NET Core Services

    Building an API with ASP.NET Core is only half the job. If your API is going to live more than one release cycle, you're going to need to version it. If you have other people building clients for it, you're going to need to document it.

Subscribe on YouTube