Classic VB Corner

Searching Within Byte Arrays

Still using Instr to search within binary arrays? Stop that! Use InstrB instead.

It's funny sometimes how long old habits take to die. Prior to VB4, the only documented method for working with binary data was within String variables. But the 32-bit version of VB4 introduced us to a new type of character data -- Unicode, which was comprised of not one, but two bytes per character.

Unfortunately, Microsoft set a horrific precedent and chose to redefine the fundamental String data type, rather than provide a new one. UniMess, as it came be known, effectively broke every line of BASIC binary file i/o code ever written.

But that was 1995. We all know better now, right? If you need to handle binary data today, you use a Byte array, right? Surprisingly, not everyone does. And even of those who do, not all are unaware of some very expedient techniques for working with them.

One of the things all BASIC programmers have always enjoyed was just how blazingly fast the Instr function could scan a string for a matching set of characters. I think this one feature, perhaps more than any other, is why many folks still use String buffers, rather than Byte arrays, to work with file data. But did you know that there's a Byte-specific version of Instr that's every bit as fast?

A very common need when working with binary files is scanning for a given signature pattern of bytes that signal the start or end of some particular data you're looking for. It really can be this easy:

' InstrB returns a one-based offset, so we need to 
' correct for using zero-based arrays.
Offset = InStrB(1, Buffer, Search, vbBinaryCompare) – 1
If Offset >= 0 Then
   ' We have our match… 

Where Buffer and Search are both 0-based Byte arrays, the first being the content to be searched and the second being the pattern of bytes being searched for. InstrB, like Instr, is a 1-based function, however, and pays no attention at all to the lower-bounds of either of your input Byte arrays. This means we need to adjust accordingly. You can start the search at a different location within the Buffer by simply adjusting the first parameter.

Another task that can seem somewhat inelegant is filling the Search buffer with the sequence of bytes you're looking for. Say you need to find &h4A, &h46, &h49, &h46, you could do something like this:

Dim b(0 To 3) As Byte
b(0) = &H4A
b(1) = &H46
b(2) = &H49
b(3) = &H43

But that's sort of ugly, especially if there are more than a handful of bytes involved. Given the loss of the DATA statement, we need to invent our own method to replicate its utility for cases like this. My first inclination was to throw all the hexadecimal codes together in a string, and pass that to a function that broke that down into a Byte array:

Private Function ByteArray(ByVal HexValues As String, _
   Data() As Byte) As Boolean
   
   Dim i As Long, n As Long
   n = Len(HexValues)
   ' Input string must be a multiple of two chars.
   If (n > 0) And (n Mod 2 = 0) Then
      ReDim Data(0 To n \ 2 - 1) As Byte
      For i = 0 To UBound(Data)
         Data(i) = Val("&h" & Mid$(HexValues, i * 2 + 1, 2))
      Next i
      ByteArray = True
   End If
End Function

I agree, it looks ugly, but it's the very essence of elegance to use. You'd call it like this:

   Dim Search() As Byte
   If ByteArray("4A464946", Search) Then

Another option would be to pass the bytes as part of a ParamArray to a function like this:

Private Function ByteArray(Data() As Byte, ParamArray Bytes()) _
   As Boolean
   
   Dim i As Long
   If UBound(Bytes) >= 0 Then
      ReDim Data(0 To UBound(Bytes))
      For i = 0 To UBound(Bytes)
         ' Mask off all but lowest byte.
         Data(i) = Bytes(i) And &HFF
      Next i
      ByteArray = Data
   End If
End Function

In this case, you'd build your Byte array by passing all the bytes directly as part of the function's parameter list:

   Dim Search() As Byte
   If ByteArray(Search, &h4A, &h46, &h49, &h46) Then

I chose to pass the buffer to be filled as one of the parameters, and return a Boolean indicating success or failure as the return code, because there are simply situations where it may not succeed. But if you're willing to accept the responsibility to pass valid parameters, and especially if you're using VB6, which can directly return an array as a function result, there are shortcuts you can take with that design:

Private Function ByteArray(ParamArray Bytes()) As Byte()
   ReDim Data(0 To UBound(Bytes)) As Byte
   Dim i As Long

   For i = 0 To UBound(Bytes)
      ' Mask off all but lowest byte.
      Data(i) = Bytes(i) And &HFF
   Next i
   ByteArray = Data
End Function

Constructing the function like that would allow direct assignments to arrays, or direct use within the InstrB (or other) function(s). In such a case, you'd probably want it to throw an error if the incoming ParamArray were empty, so there's no need to check the UBound. And finally, just to cover all the bases, reading an entire file into a byte array is incredibly simple:

Private Function ReadFileB(ByVal FileName As String, _
   Data() As Byte) As Boolean

   Dim hFile As Long
   On Error GoTo Hell
   hFile = FreeFile
   Open FileName For Binary As #hFile
      ReDim Data(0 To LOF(hFile) - 1) As Byte
      Get #hFile, , Data
   Close #hFile
Hell:
   ReadFileB = Not CBool(Err.Number)
End Function

Given it's a reasonable size, of course. Please don't try this with files more than a couple hundred megabytes, okay? ;-)

About the Author

Karl E. Peterson wrote Q&A, Programming Techniques, and various other columns for VBPJ and VSM from 1995 onward, until Classic VB columns were dropped entirely in favor of other languages. Similarly, Karl was a Microsoft BASIC MVP from 1994 through 2005, until such community contributions were no longer deemed valuable. He is the author of VisualStudioMagazine.com's new Classic VB Corner column. You can contact him through his Web site if you'd like to suggest future topics for this column.

comments powered by Disqus

Featured

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

  • .NET 9 Preview 3: 'I've Been Waiting 9 Years for This API!'

    Microsoft's third preview of .NET 9 sees a lot of minor tweaks and fixes with no earth-shaking new functionality, but little things can be important to individual developers.

  • Data Anomaly Detection Using a Neural Autoencoder with C#

    Dr. James McCaffrey of Microsoft Research tackles the process of examining a set of source data to find data items that are different in some way from the majority of the source items.

  • What's New for Python, Java in Visual Studio Code

    Microsoft announced March 2024 updates to its Python and Java extensions for Visual Studio Code, the open source-based, cross-platform code editor that has repeatedly been named the No. 1 tool in major development surveys.

Subscribe on YouTube