On VB

When Hexadecimal is Just Not Enough

Joe Kunk looks at how to manage a numeric system that extends to the entire alphabet.

One of the things I really enjoy is taking a familiar concept and seeing just how far I can push it. It is one of those boyhood traits that I have never outgrown. Like the time that I discovered that ice-fishing with a gas lantern in a well-ventilated and well-insulated ice-fishing shanty would keep the seating area comfortably t-shirt warm for hours. Add a portable music player and a thermos full of my favorite beverage and I had one mighty fine ice-fishing experience. Did I mention that I don't eat fish?

As a computer science student at Michigan State University, I was quickly taught the binary, octal, and hexadecimal number systems. As we know, hexadecimal is the base 16 numeric system and is represented by the standard 0-9 numeric sequence, extended by the letters A-F to provide the needed 16 digits. I remember wondering why they stopped at F and didn't go all the way to Z. We know that the reason is because each hexadecimal digit can be represented in exactly 4 bits and that aligns well with the word boundaries of most if not all computer systems.


A colleague asked for my advice recently on how to satisfy a challenging business requirement. He needed the ability to store a lot of large numbers into a handheld device that had limited memory and may not be synchronized with a computer for up to 24 hours at a time. Bar codes could be used, but the device user must also enter the number from the barcode into the handheld to verify that it read correctly, since the processes used on the material could damage the barcode and cause it to misread. The handheld device has a full alphanumeric keyboard.

Alphadecimal as a Numeric Representation
You may have already guessed my recommendation. Why not represent the numbers in base 36, using the digits 0-9 extended by the letters A-Z. I called it "alphadecimal" since it uses the full alphabet. Just as I was feeling clever, I typed "alphadecimal" into the Bing search engine and it brought up an article on Base 36 in Wikipedia; so much for coining a new term. That article indicates that the Service Tag on Dell computers is a five or seven digit alphadecimal number and that the URL shortening service TinyURL makes use of alphadecimal as well. Alphadecimal has the advantage of being shorter and easier to type for larger numbers.

For utilizing alphadecimal, the programming task consists primarily of a pair of conversion routines between decimal and alphadecimal. Any needed mathematical functions for alphadecimal can be performed by converting the operands to decimal, performing the operation, and converting the result back to alphadecimal.

I have prepared a Visual Studio 2010 Windows Forms demo application written in Visual Basic that converts between decimal and alphadecimal values, which you can download using the Download link at the top of this article. The demo screen is shown in Figure 1. You can enter a decimal number on the left and click the "To Alphadecimal" button to convert the value to alphadecimal or the reverse by entering an alphadecimal figure on the right and clicking "To Decimal".


[Click on image for larger view.]
Figure 1.The Alphadecimal Demo screen.

You see that under Base 36, "10" is the same as the numeric base or radix value, just as "10" results in the decimal value 16 in hexadecimal notation. I used the Consolas font to provide a high visual contrast between zeros and the letter "O" in the displayed values.

Figure 2 shows the maximum 64-bit integer that can be represented by the demo application and its equivalent alphadecimal value. You can produce this number by double-clicking the Decimal textbox in the demo application. You see that Base 36 notation requires 6 digits less to represent this figure than decimal notation requires, with the added benefit of alphadecimal being much easier to write or type than the decimal figure.


[Click on image for larger view.]
Figure 2. The maximum 64-bit value supported by the demo.

Coding the AlphaDecimal class
Following recommended practice, I created a class project AlphaDecimal that contains the main process logic, leaving the Windows form class to solely manage the screen interactions. Each of the functions are implemented as Shared methods making access to them easier by eliminating the need to instantiate the AlphaDecimal class. I set Option Explicit to On and Option Strict to On for the AlphaDecimal project.

The class level variable AlphaDecimalCharacters defines the actual digits and exact sequence of the digits comprising the alphadecimal notation. The numeric base or radix is defined as a constant of long integer 36. See Listing 1.

    'Defines the digit sequence for the alphadecimal numbering
Public Shared AlphaDecimalCharacters As Char() = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ".ToCharArray
Public Const Radix As Long = 36

The single Decode method returns the long integer equivalent of the supplied alphadecimal value. It supports negative numbers and will ignore dollar signs, commas and plus sign characters. The heavily commented routine shows how the numeric value is computed based on the digit character and its position within the AlphaDecimalCharacters string as a character array. The core logic of the Decode routine is shown in Listing 2.

        'Decode one digit at a time from left to right
For i As Integer = 0 To AlphaDecimalValue.Length - 1
'Get the digit
Dim Digit As Char = Convert.ToChar(AlphaDecimalValue.Substring(i, 1))

'Determine which position it is in the array defined at top of class
Dim index As Integer = Array.IndexOf(AlphaDecimalCharacters, Digit)

'The leftmost digit of the value is string positon 0
'PlaceValue is its proper "power of 36" value based on its position
Dim PlaceValue As Long = AlphaDecimalValue.Length - i - 1


'Numeric value of the digits is its position in AlphaDecimalCharcters array
'times 36 raised to the PlaceValue power.
Dim DigitValue As Long = Convert.ToInt64(index * (Radix ^ PlaceValue))

'Add its numeric value to the accumulated total
ReturnValue += DigitValue
Next

The overloaded AlphaDecimal.Encode method takes a number of parameter value formats including string and all supported 16-bit, 32-bit, and 64-bit values. The overload for 64-bit signed long is the location of the actual encoding logic. The core logic of the Encode routine is shown in Listing 3.

        If (DecimalValue < Radix) Then
ResultValue = AlphaDecimalCharacters(Convert.ToInt32(DecimalValue))
Else
While (DecimalValue <> 0)
Dim Remainder = DecimalValue Mod Radix
ResultValue = AlphaDecimalCharacters(Convert.ToInt32(Remainder)) & ResultValue
DecimalValue = Convert.ToInt64(Math.Truncate(DecimalValue / Radix))
End While
End If

Extending the Alphadecimal Concept
I introduced alphadecimal as a logical extension of the usefulness of the hexadecimal notation, taking base 16 up to base 36. Is that the end of the road, or are there still interesting ways to extend the concept in new directions? Actually, base 36 is just the beginning of the fun we can have.

Base 36 utilizes just the uppercase letters of the alphabet. There is nothing preventing us from using the lowercase letters as well, creating a base 36 + 26 = base 62. Is that the limit? No, the only real limit is the total number of the characters that can be guaranteed to be on the keyboard of each user and can be reliably distinguished when printed for the user. Depending on your preferences, that could add an additional 20 characters for a base 82 notation. I admit that may be taking it a bit extreme, but it certainly can be done.

You may have noticed that in Listing 1, the digits used in the notation and their order are explicitly defined by the AlphaDecimalCharacters string. Is there any requirement that the digits be in this particular sequence? With the algorithm used in the demo application, they could be in any sequence. Thus, if so inclined, you could define the character "1" to be the 10th digit in the numeric sequence, for example. In effect, you are scrambling the digits so that numerical sequences would make no logic sense to the casual observer. Think of it as the 21st century version of a child's secret decoder ring. Direct character mappings are relatively easy to decipher, but it could still be a very easy and useful numeric obfuscation technique.

Conclusion
I hope you found the concept of alphadecimal both interesting and useful. We saw how an understanding of numeric base notation could be used to represent decimal numbers in new ways.

I finish with a word of caution. One of the side-effects of using a numerical representation that utilizes the full alphabet is that certain numeric values form English words when encoded. If you are using alphadecimal numbering in a production application, there are certain decimal values that you may want to avoid using. I will leave the exercise of determining the alphadecimal value for the decimal numbers 739172, 1329077, 23508730562, and 44250925282014 to the reader as an illustration.


Reader Comments:

Sun, Oct 3, 2010 Vijay Jagdale

tut, tut Joe ;) FYI, javascript has built-in radix converter. Copy this into your browser address bar: javascript:(739172).toString(36) and javascript:parseInt('fooledyou',36)

Fri, Sep 24, 2010 Ray Cassick Buffalo

One issue I always seem to have about using larger numbering systems like this is the confusion when using other characters. I always try to set my systems up to not use thing like 0 and O and 1 and i or l and so on, Cuts down a bit on the available ranges and makes the code a but harder to deal with because you have the gaps to deal with, but I think it adds to the reliabillity.

Thu, Sep 9, 2010 Tony Anaheim, CA

I know this is already obvious to many readers, but I'll post it anyway: This is a great technique for representing numeric values with fewer digits that what would be required when using the decimal system. I've used systems of various 'bases' in several projects over the years. What would be a terrible mistake is to think this is a good way to save storage space in a database (or file-stream, or whatever). I know that the article's author was not suggesting that, but I can still imagine someone mistakenly coming to that conclusion based on the observation that the "alphadecimal" representation of a number often has fewer 'digits' than the decimal version (that we normally see). What must be remembered is that the binary representation (of integers, in their native in-memory format) is essentially just the "base-256" version of the same idea, and is the *most* efficient for storage (in memory or in files). For Unsigned Integers, just FOUR chars [BYTES!] gives a range from zero to over 4-Billion. It must be stressed that the techniques in this article are only advantageous when you are limited in the range of characters you can use as 'numerals' for some reason (eg: human readable alpha-numerics, or available chars on a keyboard, as mentioned in the article). As an aside: I can't tell you how many times I've seen database tables with fields defined as CHAR(10), populated with purely decimal integers, when the range of INT (4-bytes only) would have sufficed.

Fri, Jul 23, 2010 Jason

Interesting article. I've toyed around with this concept in a manufacturing package generating unique serial numbers. A six-digit, case sensitive alphanumeric key presents you with a usable range of a bit over 56.8 billion unique combinations. That's pretty good bang for the buck.

Thu, Jul 22, 2010 Eric Vogel Okemos, MI

28152101 23449975106 25502

Wed, Jul 21, 2010 Scott Kilgore Atlanta, GA

I would like to see a base 35 representation of latitude/longitude. (Consider zero and the letter O to be the same thing to avoid the inevitable ambiguity). A locational accuracy of 2.6 feet can be achieved with 11 alphadecimal characters. Imagine how much easier it would be to enter a waypoint into a GPS. Just tell the truck driver that the end of your driveway is at 3A2-HV1-8J4N0. No more lat/long, no more street addresses, zip codes, etc. Now if we can get ISO to standardize it, and GPS manufacturers to go along...

Wed, Jul 21, 2010 James WA

A friend of mine and I wrote a program to do arbitrary number base conversions in QBasic almost 20 years ago. Note that you can simplify a lot by knowing that the highest exponent needed to convert to an arbitrary base is the floor of exponent = log( number ) / log( base ). Also, if limiting to 0-9 & A-Z, you can skip the array and do some cute math with ASCII character values to get the symbol.

Wed, Jul 21, 2010 Joe Kunk Okemos Michigan USA

Using numbering systems with a different radix is a useful technique and I am pleased to make it easily available to Visual Basic .Net developers with this article. It is interesting to hear how others have used this concept - I appreciate your comments and I am curious to know of any other uses among our readers.

Wed, Jul 21, 2010 Richard

I did something similar in VB6 eight years ago:

http://www.devx.com/vb2themax/Tip/19316

Tue, Jul 20, 2010 Rob Perkins

I did it as well, using a 20 radix, to reduce the size of machine identifying numbers to something people could actually remember and type.

Tue, Jul 20, 2010 Jerry E. Shepherd West Valley Utah

It was interesting to see this article. I developed my own system about 5 years ago that I was incorporated in an educational program I was developing. I used it primarily to be able to address any screen location with two digits which resulted in a big savings in memory and storage requirements.

Add Your Comments Now:

Your Name:(optional)
Your Email:(optional)
Your Location:(optional)
Comment:
Please type the letters/numbers you see above