Visual Studio Toolbox
Semantic Code Highlighting
We often take syntax highlighting for granted -- it just makes code so much more readable that it's become hard to live without. I'll take a look at a little history behind it and some new developments that might make code coloring even more useful.
- By Terrence Dorsey
Confession time: I have a bit of an obsession with code editor fonts, themes and syntax highlighting. So my interest was piqued recently when I came across of sudden flurry of blog posts about semantic code highlighting -- a technique for coloring code based not on fixed keywords, but on the parsed meaning of the code I'd just written.
I'll get back to just how cool this is in a minute. First, I'll step back and take a quick look at why syntax highlighting or code coloring is helpful to programmers, and where the idea for it came from in the first place.
Ido Gendel, translated from the Israeli online magazine nana10.co.il by "BenBE," puts it pretty succinctly in the article, "A Brief Overview on Syntax Highlighting": "Excluding, perhaps, programs written in Assembly, software code text is intended for the people writing it much more than for the computers running it. ... The only reason to "prettify" the code and to comment it is to make it more readable for humans."
Making code readable is incredibly important -- hence the endless debates about naming things, where to put braces, whether semicolons are evil and so on. So let's take a step back and see where the idea of syntax highlighting came from and why we might be on the verge of a major step forward in code readability today.
A Little History
When I was a kid, I loved to visit the Lawrence Hall of Science in Berkeley. On one of the lower levels, they had a computer lab for us to play with. The "computers" were really just paper-based teletype terminals connected to a mainframe in another room. We'd type on the keyboard, the machine would clack-clack-clack on the paper roll, and it was just incredible. That was my first computing experience.
The output was even more primitive than the green-screen boxes that followed, but it worked and you didn't know any better. Or I didn't, anyway. Turns out, smart folks were already frustrated by their ability to write and read code on these devices and were busy innovating the future of polychromatic code formatting you know today.
If you head over to the Wikipedia entry on Syntax highlighting, you'll learn that the idea of incorporating syntax-aware features into programming environments dates back to the 1960s.
Wilfred Hansen started creating the Emily code editor for his Stanford doctoral dissertation. Emily built on the concept of hierarchic text to provide syntax-conforming options to the programmer while editing code, somewhat similar to what we'd recognize as code-completion, or IntelliSense today. You can read more about it in Hansen's own 1971 abstract, "Emily - An Editor for Structured Text."
The important finding was that "the user took slightly longer with Emily, but made fewer mistakes."
Ido Gendel covers more of the history of syntax highlighting in the article, "A Brief Overview on Syntax Highlighting," I mentioned previously. One important point is that "it seems reasonable that the first highlighted code actually appeared in print" -- capitalization, boldface and italics and probably the use of different typefaces in books and magazines from which us old-timers often typed out our first programs.
The Wikipedia article states that computer scientist and human-computer interaction researcher Ben Shneiderman discussed "color coding of text strings to suggest meaning" in the 1985 edition of H. Rex Hartson's "Advances in Human-Computer Interaction," but I haven't been able to track this down yet.
At about this same time, the LEXX editor became possibly the first to use syntax-specific coloring (see Figure 1). As M. F. Cowlishaw explains in "LEXX -- A Programmable Structured Editor" (PDF available here):
"[T]here are many advantages in using an editor that has specialized knowledge of the data being edited. The editor can improve the presentation of the data in a variety of ways, using appropriate formatting, color cuing, and fonts; it can provide checking of the syntax of the data, or even of the semantics ... All of these improve the usability of the tool."
The rest, as you know, is history. Syntax-based coloring and formatting turned out to be a brilliant idea. I know WordPerfect certainly had blossomed with a spectrum of markup-related coloring by the early '90s, and while the programmers I worked with at the time preferred (and indoctrinated me in the effective use of) the wonderful blue-and-white world of Brief, the wonderful world of colorized code was just around the corner.
Today, code coloring is mostly taken for granted, except for those rare moments when your favorite editor can't figure out what to do with an unfamiliar file type and you end up looking at a sea of undifferentiated characters.
The highlighting you're probably most accustomed to seeing is "syntax" highlighting, which typically colors elements of the code based on straightforwardly parsed word lists and the positions of word in relation to keywords, operators, symbols and other syntactic or grammatical elements of the code.
To make this a lot more clear, take a look at some code examples. Figure 2 shows a C++ code example from SmokeParticles.cpp, part of the Doom 3 code repository on GitHub. This particular example has no code highlighting or coloring at all.
Figure 3 shows the same C++ code, this time with common syntax highlighting applied. In this case I'm using the iPlastic theme provided by default with Sublime Text.
Note that coloring here is, for the most part, only applied to keywords known ahead of time by the editor: void, if, for, int and so on. This is fine, but I know what a for loop looks like. I don't need the editor to point out every example.
Semantic coloring, on the other hand, attempts to parse out what's important about your code -- what it means. I don't know whether the idea originated here, but the 2009 blog post, "C++ IDE Evolution: From Syntax Highlighting to Semantic Highlighting," discussing semantic highlighting features for the KDevelop IDE, seems to have inspired much subsequent work on this subject.
Syntax-based coloring "can only highlight by what the code looks like, not by what the code means, since that requires wider knowledge," notes Zwabel. The real power of semantic coloring is "Local Variable Colorization [which] assigns a semi-unique color to each variable in a local context. This allows much easier ways of distinguishing those variables, largely without reading their full name at all."
To be more specific, syntax coloring puts the highlight on language-specific keywords, operators and similar elements, which have the same meaning in anyone's code. Semantic coloring puts the highlight on the elements you're adding to the code: your function and variable names, for instance. It's less useful to see every instance of a for loop than it is to highlight every instance of your own super important variable throughout the code. That's what helps you better understand the code and follow logic and data through it.
Figure 4 shows an example of that C++ file from Doom 3, this time using the same iPlastic theme, but using semantic parsing of the code.
More recently, Evan Brooks revisited the idea of coding in color in a blog post that does a great job of illustrating clearly the benefits of adding a much more broad spectrum of coloration to code. Brooks also does a great job of walking through the logic of semantic coloring in a straightforward and understandable manner. "Each variable has its own color, so I can see where it's used at a glance. Even when skimming the code, I can see how data flows through the function."
In fact, the discussion around Brooks' post led directly to some fantastic, real-world developments. I'll get back to that in a moment.
Semantic Coloring in Visual Studio
The Visual Studio development team was pretty quick to jump on the semantic highlighting bandwagon. The top item in Sumit Kumar's "First Look at the New C++ IDE Productivity Features in Visual Studio 11" back in 2011 -- the release you now know as Visual Studio 2012 -- was Semantic Colorization, helpfully bundled under the headline "Code Understanding Enhancements." Even better, this feature is turned on by default, as shown in Figure 5.
A useful and related feature in Visual Studio 2012 was Reference Highlighting. "When you place your text cursor on a symbol, all the instances of that symbol in the file get highlighted." Pretty slick and, again, very helpful for extending the readability and navigation of data and logic flow through your code.
In my humble opinion, the highlighting in Visual Studio, even with these semantic enhancements, remains a touch on the conservative side. However, Visual Studio includes extensive in-app settings and extensible APIs for highlighting that enable you to tweak away. It's certainly a step in the right direction.
Semantic Coloring in Other Editors
As I mentioned earlier, the Evan Brooks post touched off an explosion of recent development efforts around semantic highlighting. One of the first I saw, and one I've been using regularly over the past few months, is the Colorcoder package for Sublime Text, by Valerij Primachenko.
One of its great features is that Colorcoder doesn't replace your existing themes and syntax settings, but instead adds its code parsing and highlighting capabilities to them. Colorcoder is what I used, on top of the existing theme, to create Figure 4.
For Xcode users, Kolin Krewinkel created Polychromatic, a plug-in which gets the gold-star treatment as a must-have Xcode Plugins from none other than Mattt Thompson and NSHipster.
Even the Vim and Emacs crowd are getting in on the action (see the "Emacs Code Coloring Is Outdated" post by Philippe Faes if you don't believe me).
Rainbow Road Ahead
O.K., maybe semantic highlighting isn't the most revolutionary thing to happen in programming environments, but I can definitely vouch for the added productivity it's brought to my work in the few months I've been using it actively. Considering how much time I spend trying to make sense of other people's code -- and anything I wrote more than a week ago, for that matter -- anything that helps bring clarity to the process is welcome.
Give it a try and see for yourself.
Terrence Dorsey is a technical writer, editor and content strategist specializing in technology and software development. Over the last 25-plus years he has worked on developer-focused projects at ESPN, The Code Project, and Microsoft. Read his blog at http://terrencedorsey.com or follow @tpdorsey on Twitter.