Classic VB Corner

Searching Within Byte Arrays

Still using Instr to search within binary arrays? Stop that! Use InstrB instead.

It's funny sometimes how long old habits take to die. Prior to VB4, the only documented method for working with binary data was within String variables. But the 32-bit version of VB4 introduced us to a new type of character data -- Unicode, which was comprised of not one, but two bytes per character.

Unfortunately, Microsoft set a horrific precedent and chose to redefine the fundamental String data type, rather than provide a new one. UniMess, as it came be known, effectively broke every line of BASIC binary file i/o code ever written.

But that was 1995. We all know better now, right? If you need to handle binary data today, you use a Byte array, right? Surprisingly, not everyone does. And even of those who do, not all are unaware of some very expedient techniques for working with them.

One of the things all BASIC programmers have always enjoyed was just how blazingly fast the Instr function could scan a string for a matching set of characters. I think this one feature, perhaps more than any other, is why many folks still use String buffers, rather than Byte arrays, to work with file data. But did you know that there's a Byte-specific version of Instr that's every bit as fast?

A very common need when working with binary files is scanning for a given signature pattern of bytes that signal the start or end of some particular data you're looking for. It really can be this easy:

' InstrB returns a one-based offset, so we need to 
' correct for using zero-based arrays.
Offset = InStrB(1, Buffer, Search, vbBinaryCompare) – 1
If Offset >= 0 Then
   ' We have our match… 

Where Buffer and Search are both 0-based Byte arrays, the first being the content to be searched and the second being the pattern of bytes being searched for. InstrB, like Instr, is a 1-based function, however, and pays no attention at all to the lower-bounds of either of your input Byte arrays. This means we need to adjust accordingly. You can start the search at a different location within the Buffer by simply adjusting the first parameter.

Another task that can seem somewhat inelegant is filling the Search buffer with the sequence of bytes you're looking for. Say you need to find &h4A, &h46, &h49, &h46, you could do something like this:

Dim b(0 To 3) As Byte
b(0) = &H4A
b(1) = &H46
b(2) = &H49
b(3) = &H43

But that's sort of ugly, especially if there are more than a handful of bytes involved. Given the loss of the DATA statement, we need to invent our own method to replicate its utility for cases like this. My first inclination was to throw all the hexadecimal codes together in a string, and pass that to a function that broke that down into a Byte array:

Private Function ByteArray(ByVal HexValues As String, _
   Data() As Byte) As Boolean
   
   Dim i As Long, n As Long
   n = Len(HexValues)
   ' Input string must be a multiple of two chars.
   If (n > 0) And (n Mod 2 = 0) Then
      ReDim Data(0 To n \ 2 - 1) As Byte
      For i = 0 To UBound(Data)
         Data(i) = Val("&h" & Mid$(HexValues, i * 2 + 1, 2))
      Next i
      ByteArray = True
   End If
End Function

I agree, it looks ugly, but it's the very essence of elegance to use. You'd call it like this:

   Dim Search() As Byte
   If ByteArray("4A464946", Search) Then

Another option would be to pass the bytes as part of a ParamArray to a function like this:

Private Function ByteArray(Data() As Byte, ParamArray Bytes()) _
   As Boolean
   
   Dim i As Long
   If UBound(Bytes) >= 0 Then
      ReDim Data(0 To UBound(Bytes))
      For i = 0 To UBound(Bytes)
         ' Mask off all but lowest byte.
         Data(i) = Bytes(i) And &HFF
      Next i
      ByteArray = Data
   End If
End Function

In this case, you'd build your Byte array by passing all the bytes directly as part of the function's parameter list:

   Dim Search() As Byte
   If ByteArray(Search, &h4A, &h46, &h49, &h46) Then

I chose to pass the buffer to be filled as one of the parameters, and return a Boolean indicating success or failure as the return code, because there are simply situations where it may not succeed. But if you're willing to accept the responsibility to pass valid parameters, and especially if you're using VB6, which can directly return an array as a function result, there are shortcuts you can take with that design:

Private Function ByteArray(ParamArray Bytes()) As Byte()
   ReDim Data(0 To UBound(Bytes)) As Byte
   Dim i As Long

   For i = 0 To UBound(Bytes)
      ' Mask off all but lowest byte.
      Data(i) = Bytes(i) And &HFF
   Next i
   ByteArray = Data
End Function

Constructing the function like that would allow direct assignments to arrays, or direct use within the InstrB (or other) function(s). In such a case, you'd probably want it to throw an error if the incoming ParamArray were empty, so there's no need to check the UBound. And finally, just to cover all the bases, reading an entire file into a byte array is incredibly simple:

Private Function ReadFileB(ByVal FileName As String, _
   Data() As Byte) As Boolean

   Dim hFile As Long
   On Error GoTo Hell
   hFile = FreeFile
   Open FileName For Binary As #hFile
      ReDim Data(0 To LOF(hFile) - 1) As Byte
      Get #hFile, , Data
   Close #hFile
Hell:
   ReadFileB = Not CBool(Err.Number)
End Function

Given it's a reasonable size, of course. Please don't try this with files more than a couple hundred megabytes, okay? ;-)

About the Author

Karl E. Peterson wrote Q&A, Programming Techniques, and various other columns for VBPJ and VSM from 1995 onward, until Classic VB columns were dropped entirely in favor of other languages. Similarly, Karl was a Microsoft BASIC MVP from 1994 through 2005, until such community contributions were no longer deemed valuable. He is the author of VisualStudioMagazine.com's new Classic VB Corner column. You can contact him through his Web site if you'd like to suggest future topics for this column.

comments powered by Disqus

Featured

  • Compare New GitHub Copilot Free Plan for Visual Studio/VS Code to Paid Plans

    The free plan restricts the number of completions, chat requests and access to AI models, being suitable for occasional users and small projects.

  • Diving Deep into .NET MAUI

    Ever since someone figured out that fiddling bits results in source code, developers have sought one codebase for all types of apps on all platforms, with Microsoft's latest attempt to further that effort being .NET MAUI.

  • Copilot AI Boosts Abound in New VS Code v1.96

    Microsoft improved on its new "Copilot Edit" functionality in the latest release of Visual Studio Code, v1.96, its open-source based code editor that has become the most popular in the world according to many surveys.

  • AdaBoost Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the AdaBoost.R2 algorithm for regression problems (where the goal is to predict a single numeric value). The implementation follows the original source research paper closely, so you can use it as a guide for customization for specific scenarios.

  • Versioning and Documenting ASP.NET Core Services

    Building an API with ASP.NET Core is only half the job. If your API is going to live more than one release cycle, you're going to need to version it. If you have other people building clients for it, you're going to need to document it.

Subscribe on YouTube