The Data Science Lab

Artificial Immune Systems for Intrusion Detection Using C#

Dr. James McCaffrey from Microsoft Research presents a demonstration program that models biological immune systems to identify network intrusion threats. The demo illustrates challenges with artificial immune systems as well as promising new approaches.

An artificial immune system (AIS) for intrusion detection is a software system that loosely models some parts of the behavior of the human immune system to protect computer networks from viruses and similar cyber-attacks.

This article presents a demo program that illustrates the main ideas of artificial immune systems. The demo is not a practical system for intrusion detection. The demo is intended to help you understand how commercial systems work.

The best way to see where this article is headed is to take a look at the screenshot of the demo program in Figure 1. The demo program begins by loading a set of six normal data patterns:

Loading antigen self-set ('normal' patterns)
0: 100101101001
1: 110010101100
2: 101100110101
3: 001101011011
4: 010101001101
5: 001010100100

These patterns represent normal, non-threat incoming TCP/IP network packets in binary form. This is called the self-set in AIS terminology. Of course, in a real AIS system, the self-set would likely contain tens or hundreds of thousands of patterns and each pattern would be much larger (typically 48-256 bits) than the 12 bits used in the demo.

Next the demo creates three artificial lymphocytes:

Creating lymphocyte set
0: antibody = 1111 age = 0  stimulation = 0
1: antibody = 1000 age = 0  stimulation = 0
2: antibody = 1110 age = 0  stimulation = 0

Each lymphocyte has a simulated antibody that has four bits (again artificially small), and an age and a stimulation field. The antibody is essentially a detector of patterns that are suspicious. The lymphocytes are created so that none of them detect any of the patterns in the self-set. For example, lymphocyte [0] has antibody = 1111 but none of the six items in the self-set has four consecutive 1s.

Figure 1: Artificial Immune System Demo
[Click on image for larger view.] Figure 1: Artificial Immune System Demo

After the system has been initialized, the demo program begins a tiny simulation with six input patterns. The first incoming pattern is:

Incoming pattern = 111011111000

Detected by lymphocyte 0 new stimulation = 1
Lymphocyte 0 not over stimulation threshold
Detected by lymphocyte 1 new stimulation = 1
Lymphocyte 1 not over stimulation threshold
Detected by lymphocyte 2 new stimulation = 1
Lymphocyte 2 not over stimulation threshold

The incoming pattern is detected by the antibody in lymphocyte [0] because the incoming pattern has "1111." The incoming pattern is also detected by lymphocyte [1] ("1000") and lymphocyte [2] ("1110"). A single detection does not trigger an alert by a lymphocyte. Instead, each lymphocyte has a threshold number of detections that must be reached before an alert is triggered. The demo lymphocytes all have a threshold of 3.

The second incoming pattern is:

Incoming pattern = 011100011100

Incoming pattern not detected by lymphocyte 0
Detected by lymphocyte 1 new stimulation = 2
Lymphocyte 1 not over stimulation threshold
Detected by lymphocyte 2 new stimulation = 2
Lymphocyte 2 not over stimulation threshold

The incoming pattern is not detected by lymphocyte [0] so its stimulation value stays at 1. The incoming pattern is detected by lymphocytes [1] and [2] so both of their stimulation values increment to 2.

The third incoming pattern is:

Incoming pattern = 110000011001

Incoming pattern not detected by lymphocyte 0
Detected by lymphocyte 1 new stimulation = 3
Lymphocyte 1 stimulated! Check incoming as possible intrusion!
Incoming pattern not detected by lymphocyte 2

The incoming pattern is detected by lymphocyte [1] so its stimulation value is incremented to 3 and an alert is triggered that the incoming pattern is suspicious and should be examined.

The final, fourth incoming pattern is:

Incoming pattern = 111110110101

Detected by lymphocyte 0 new stimulation = 2
Lymphocyte 0 not over stimulation threshold
Incoming pattern not detected by lymphocyte 1
Detected by lymphocyte 2 new stimulation = 3
Lymphocyte 2 stimulated! Check incoming as possible intrusion!

The incoming pattern is detected by lymphocyte [2], so its stimulation reaches the threshold value of 3, and an alert is triggered.

This article assumes you have at least intermediate level programming skill with a C-family language, preferably C#, but does not assume you know anything about artificial immune systems.

The code for the demo program is a bit too long to be presented in its entirety in this article. The complete code is available in the accompanying file download, and is also available online.

The Human Immune System
The key elements of the human immune system are illustrated in Figure 2. Harmful items are proteins called antigens. In the figure, the antigens are colored red and have sharp corners. The human body also contains many non-harmful antigens called self-antigens, or just self-items. These are naturally occurring proteins and in the figure are colored green and have rounded sides.

Antigens are detected by lymphocytes. Each lymphocyte has several antibodies which can be thought of as detectors. Each antibody is specific to a particular antigen. Typically, because antibody-antigen matching is only approximate, a lymphocyte will not trigger a reaction when a single antibody detects a single antigen. Only after several antibodies detect their corresponding antigens will a lymphocyte become stimulated and trigger some sort of defensive reaction.

Figure 2: Simplified Immune System
[Click on image for larger view.] Figure 2: Simplified Immune System

Notice that no lymphocyte has antibodies that detect a self-item. Real antibodies are generated by the immune system in the thymus, but any antibodies which detect self are destroyed before being released into the blood stream, a process called apoptosis.

In terms of an intrusion detection system, antigens correspond to TCP/IP network packets that indicate the content contains some sort of harmful data, such as a computer virus. Self-antigens correspond to normal, non-harmful network packets. An antibody corresponds to a bit pattern that approximately matches an unknown, potentially harmful network packet. A lymphocyte represents two or more antibodies/detectors. Apoptosis is modeled using a technique called negative selection.

Overall Program Structure
I used Visual Studio 2022 (Community Free Edition) for the demo program. I created a new C# console application and checked the "Place solution and project in the same directory" option. I specified .NET version 8.0. I named the project ArtificialImmuneSystem. I checked the "Do not use top-level statements" option to avoid the program entry point shortcut syntax.

The demo has no significant .NET dependencies and any relatively recent version of Visual Studio with .NET (Core) or the older .NET Framework will work fine. You can also use the Visual Studio Code program if you like.

After the template code loaded into the editor, I right-clicked on file Program.cs in the Solution Explorer window and renamed the file to the slightly more descriptive ArtificialImmuneSystemProgram.cs. I allowed Visual Studio to automatically rename class Program.

The overall program structure is presented in Listing 1. All the control logic of the demo simulation is in the Main() method. All of the lymphocyte and antibody functionality is in a Lymphocyte class.

Listing 1: Artificial Immune System Demo Program Structure

using System;
using System.Collections.Generic;

namespace ArtificialImmuneSystem
{
  class ArtificialImmuneSystemProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("Begin Artificial Immune System" +
        " for Intrusion Detection demo\n");

      Random rnd = new Random(0);
      int numPatternBits = 12;
      int numAntibodyBits = 4;
      int numLymphocytes = 3;
      int stimulationThreshold = 3;
      int time = 0;
      int maxTime = 4;

      // load the self-set of non-threat patterns
      // create three Lymphocyte objects
      // loop until maxTime is reached
      //   generate random incoming pattern
      //   check if incoming pattern detected
      //   update each Lymphocyte
      // end-loop

      Console.WriteLine("End artificial immune" +
        " system demo");
      Console.ReadLine();
    }  // Main

    public static List<char[]> 
      LoadSelfSet(string dataSource) { . . }

    public static void 
      ShowSelfSet(List<char[]> selfSet) { . . }

    public static List<Lymphocyte> 
      CreateLymphocyteSet(List<char[]> selfSet,
      int numAntibodyBits, int numLymphocytes,
      Random rnd) { . . }

    private static bool 
      DetectsAny(List<char[]> selfSet,
      Lymphocyte lymphocyte) { . . }

    public static void 
      ShowLymphocyteSet(List<Lymphocyte> 
      lymphocyteySet) { . . }

    public static char[] RandomCharArray(int numBits,
      Random rnd) { . . }
  }  // class ArtificialImmuneSystemProgram

  public class Lymphocyte
  {
    public char[] antibody;    // one per lymphocyte
    public int[] searchTable;  // fast detection
    public int age;            // not used; determine death
    public int stimulation;    // controls triggering

    public Lymphocyte(char[] antibody) { . . }

    private int[] BuildTable() { . . }

    public bool Detects(char[] pattern) { . . }

    public override int GetHashCode() { . . }

    public override string ToString() { . . }
  }  // class Lymphocyte

}  // ns

The demo program begins by setting up the parameters for the simulation:

Random rnd = new Random(0);
int numPatternBits = 12;
int numAntibodyBits = 4;
int numLymphocytes = 3;
int stimulationThreshold = 3;
int time = 0;
int maxTime = 4;

The Random object is used to generate random incoming bit patterns. The meaning of the other parameters should be clear from their names.

Creating the Self-Set and the Lymphocytes
The Main() method creates the self-set of historical, non-threat patterns using these statements:

Console.WriteLine("Loading antigen self-set" +
  " ('normal' historical patterns)");
List<char[]> selfSet = LoadSelfSet(null);
ShowSelfSet(selfSet);

The LoadSelfSet() function has six hard-coded patterns where each pattern is an array of char values:

List<char[]> result = new List<char[]>();
result.Add("100101101001".ToCharArray());
result.Add("110010101100".ToCharArray());
result.Add("101100110101".ToCharArray());
result.Add("001101011011".ToCharArray());
result.Add("010101001101".ToCharArray());
result.Add("001010100100".ToCharArray());

In a non-demo scenario, the self-set would be loaded from a text file, or perhaps be supplied by some other system. The demo uses arrays of type char, but there are several other ways to represent the patterns.

The Lymphocyte objects are created like so:

Console.WriteLine("Creating lymphocyte set using" +
  " negative selection" +
  " and r-chunks detection");
List<Lymphocyte> lymphocyteSet = 
  CreateLymphocyteSet(selfSet, numAntibodyBits,
  numLymphocytes, rnd);
ShowLymphocyteSet(lymphocyteSet);

The Lymphocytes are created so that none of them detect any of the patterns in the self-set. This is called negative selection. Each Lymphocyte object has an antibody which has length (4) that is less than the length of the patterns in the self-set and the length of incoming patterns (both 12). This is called r-chunks detection.

The Simulation
The artificial immune system simulation is essentially a while-loop:

while (time < maxTime) {
  Console.WriteLine("======");
  Console.WriteLine("time = " + time);
  char[] incoming = RandomCharArray(numPatternBits, rnd);
  Console.WriteLine("Incoming pattern = " + 
    new String(incoming) + "\n");
  // process each Lymphocyte
  ++time;
  Console.WriteLine("======");
} // end loop

An incoming pattern is an array of 12 random "0" or "1" characters. Each of the three Lymphocyte objects in the lymphocyteSet List collection is examined:

 for (int i = 0; i < lymphocyteSet.Count; ++i) {
   if (lymphocyteSet[i].Detects(incoming) == true) {
     Console.Write("Detected by lymphocyte " + i);
     ++lymphocyteSet[i].stimulation;
   Console.WriteLine(" new stimulation = " +
     lymphocyteSet[i].stimulation);
   . . .

The critical code occurs in the Lymphocyte.Detects() method. If a Lymphocyte detects the current incoming pattern, the Lymphocyte's stimulation counter is increments. If the new stimulation value reaches the simulation threshold value, and an alert is triggered:

  if (lymphocyteSet[i].stimulation >= stimulationThreshold)
    Console.WriteLine("Lymphocyte " + i + " stimulated!" +
      " Check incoming as possible intrusion!");
  else
    Console.WriteLine("Lymphocyte " + i +" not over threshold");

The demo program does not use the Lymphocyte age field. Some non-demo AIS systems kill off a simulated lymphocytes if it reaches a maximum age, or if the lymphocyte's stimulation value hasn't changed over a specified length of simulation time steps, or if the lymphocyte has triggered a specified number of consecutive alerts.

The Lymphocyte Detects() Method
In a non-demo artificial immune system, the length of the incoming bit patterns can be very large, and the length of simulated antigens can be large too. Therefore, it's important to use an efficient algorithm to determine if the simulated antigen pattern, such as "1000," is contained in an incoming pattern such as "111011111000."

The Detects() method of the Lymphocyte class uses the Knuth-Morris-Pratt pattern matching algorithm. This requires a search table that is constructed from the antibody pattern. There are other efficient pattern matching algorithms that can be used, depending on how the bit patterns are stored. The demo uses arrays of type char which is simple, but not particularly efficient.

Because each Lymphocyte object has a single antibody bit pattern, there's a one-to-one correspondence between Lymphocyte and antibody, and so the terms are often used interchangeably. When the set of Lymphocyte objects is created by the CreateLymphocyteSet() helper method, the demo program prevents duplicates by computing a hash value based on the antibody.

So, What Does It All Mean?
Roughly 20 years ago, there was quite a bit of optimism in the research community that intrusion detection systems modeled on biological immune systems could lead to great advances in cybersecurity. Unfortunately, the first wave of AIS systems did not succeed nearly as well as hoped for.

The demo program presented in this article reveals several problems. Systems where data is represented using bit patterns are appealing from a research point of view, but such systems just don't seem to work well in practice. It's one thing to imagine a hypothetical scenario of incoming bit patterns, but it's quite another thing to try and implement a bit-based system in the day-to-day chaos of a modern IT environment.

But in recent years, artificial immune systems are receiving renewed attention. Instead of assuming that data is represented as bit patterns, new AIS algorithms imagine that information is represented by traditional integer and floating-point data. There are some promising ideas where a self-set is used to generate a non-self-set using an evolutionary algorithm, and then the self and non-self sets are used to train a neural binary classification system for intrusion detection.

comments powered by Disqus

Featured

Subscribe on YouTube