In-Depth

Text Processing, Type Definition, I/O and Visualization in F#

What you can do with most programming languages can be accomplished in F#'s functional programming paradigm. Here's how to handle some simple operations, which might look familiar to you already.

In prior articles (here and here) I described some interesting and useful features that the functional programming paradigm -- and specifically F# -- can offer when developing computer applications. I'd like to delve into a bit of detail on some useful examples this time. Specifically, let's look at text processing, type definition, input/ output and visualization.

Text Processing
Texts in computer science is represented in strings. Simply, a string is defined as a finite sequence of characters belonging to an alphabet. All text processing tools or operations must rely on strings as their basic processing unit, and one of these tools is the regular expression. A regular expression is a pattern or expression that implicitly describes a set of strings that belong to a regular language; thus, regular expressions represent regular languages. This powerful string processing tool is part of the .NET platform and it's also part of the F# language.

To be able to use regular expression in F# one first needs to open or import the System.Text.RegularExpressions namespace. Here's an example of a regular expression that recognizes any word in lower case letters being created:

open System
open System.Text.RegularExpressions

[<EntryPoint>]
let main argv = 
    let reg = new Regex("[a-z]+") 
    printf "%A" (reg.IsMatch("jordan"))
    // Read user input
    let input = Console.ReadLine()
    0 // return an integer exit code

The IsMatch method returns true if any substring of the string supplied as an argument matches the regular expression pattern ([a-z]+). Hence, after executing that code the output will be true. In case the string passed as an argument changes from "jordan" to "23" then no matching can be found and the return value will be false.

Text transcription is an operation that can be easily achieved using regular expressions. This code illustrates a transcrip function that takes arguments str (the string to be transcript) and pattern (the pattern to be matched and replaced):

let transcrip str pattern = (new Regex(pattern)).Replace(str, "23") 

printf "%A" (transcrip "Hi Mr. Jordan!" "Jordan")

After executing this code the result is shown in Figure 1.

Text Transcription Result
[Click on image for larger view.] Figure 1: Text Transcription Result

Regular expressions can also be use to separate, split or divide text under a certain criteria. One may need to split text by the occurrence of numbers, specific letters, etc. In such cases a regular expression and the execution of the Split method, which returns an array of strings, could serve as a solution.

In this code, the regular expression splitreg matches the empty string or any digit:

let splitreg = new Regex(" |[0-9]")
let splitted = splitreg.Split("1 2 3 Mr.Jazz!")
    
printfn "%A" splitted

The Split method divides the input string every time it finds an empty string or a digit. Since there are three digits and three empty strings from left to right before reaching substring "Mr.Jazz!", and they are all consecutives, the output results in what you see in Figure 2.

Results of Split Method
[Click on image for larger view.] Figure 2: Results of Split Method

After exploring the possibilities of text processing in F# let us dive into the opportunities for defining custom types.

Type Definition
F# provides the possibility of defining custom types by using the type keyword followed by the type name, an equal sign and the definition itself. A type definition can be used for defining aliases for existing types. This type of definition includes tuples and could be useful and meaningful as shown in the next example where a type person consisting of a 2-tuple string is created and later used as a type constraint:

type person = string * string
let f (p : person) =
        printfn "%A" p

The syntax x1 * x2* … * xn indicates an n-tuple.

Another type definable in F# is the record type; similar to a tuple because it groups several types into a single type but dissimilar in the sense that names must be provided for each field. The next example illustrates the creation of a record named album and its later use.

type album = { year: int list; artist: string list}
let jazz = { new album with year = [1998; 1999; 2000] and artist = ["Sting"; "Duke Ellington"; "Chris Botti"] }

Records and aliases are similar to structs and classes in C#. Apart from these two there is one last type that one may define in F#: the union or sum type. Union types represent a manner for uniting types with different structures. Their definition consists of a set of constructors separated by a vertical bars. A constructor is composed of a name (in capital letters, unique within a type), optionally the of keyword and finally the types that form the constructor separated by asterisks, in case there is more than one type. This code illustrates how to define a Binary Tree union type:

type BinaryTree =
    | Node of int * BinaryTree * BinaryTree
    | Empty 

Logically, a binary tree is either an Empty tree or a Node with an integer value and two children which are also binary trees. Considering the binary tree in Figure 3.

Binary Tree
[Click on image for larger view.] Figure 3: Binary Tree

A declaration for this structure is here:

let tree = Node(2, Node(3, Node(4,Empty,Empty), Empty), Node(5,Empty,Empty))

A function defining an in-order printing of node values in the tree is presented in the next lines of code:

let rec printInOrder tree =
        match tree with
        | Node (data,left,right) -> printInOrder left
                                    printfn "Node %d" data
                                    printInOrder right
        | Empty -> ()

printInOrder tree

The output after executing the previous function is shown in Figure 4.

PrintInOrder Function Result
[Click on image for larger view.] Figure 4: PrintInOrder Function Result

An in-order path in a binary tree recursevely visits the left node, then the current node and finally the right node thus for the previous tree the in-order path is 4, 3, 2, 5.

Input / Output
I/O features are not highly related to the functional programming paradigm. But F# is not purely functional – rather, it's a hybrid and accordingly it incorporates features from the imperative programming paradigm. Several I/ O functions provided in the language corroborate this statement. The printfn function used in this and prior articles represents probably the most common alternative for printing various types in F#; printfn is a simple function that provides simple formatting. Here are its differentiating format indicators:

  • %d print an int
  • %s print a string
  • %A print any type using the built in printer
  • %f print a float
  • %g print a float in scientific notation
  • %O print any type using the ToString() method

In the next example (with result shown in Figure 5), a string, an int and a float are printed using the printfn function:

printf "String: %s int: %d, float: %f\n" "NBA" 5 2.3
PrintF Result
[Click on image for larger view.] Figure 5: PrintF Result

Reading from and writing to files can be achieved in F# thanks to the functionality offered by the .NET platform through the System.IO namespace. The function shown in the following code loads the file indicated in the filename parameter then reads the file and yields each line until the end of the stream, all of this inside a sequence expression so the final result is a sequence with lines from the file.

let readfile filename =
        seq { use stream = File.OpenRead filename
              use reader = new StreamReader(stream)
              while not reader.EndOfStream do
                yield reader.ReadLine() }

let lines = readfile @"F:\text.txt"
    
printfn "%A" lines

The use binding is equivalent to the let binding except for the fact that the first disposes the object once the enumeration over the sequence has been completed, closing the file at the end.

Visualization
Providing visualization tools, libraries are not really the strong suit for functional programming languages. Then again, F# is not purely functional and is a .NET language, so it can deal with Windows Forms and many graphics libraries like DirectX, OpenGL, etc. To start building a GUI application in F#, one first needs to open namespaces System.Windows.Forms and System.Drawing, adding their reference to the project if necessary, and then creating the form (result is Figure 6):

let form = new Form(BackColor = Color.LightBlue, Visible=true, Text="My F# form")
Application.Run(form)
Simple Form With Light Blue Background
[Click on image for larger view.] Figure 6: Simple Form With Light Blue Background

Once the form is created, this code adds a picture box as a new control (see Figure 7):

let picturebox = new PictureBox(Width = 200, Height = 200, BackColor = Color.LightGray)
form.Controls.Add(picturebox)
Addition of Picture Box
[Click on image for larger view.] Figure 7: Addition of Picture Box

Finally, using a solid brush a black ellipse is drawn inside the picture box and an almost effortless painting on F# is completed (see Figure 8):

let brush = new SolidBrush(Color.Black)
    picturebox.Paint.Add
        (fun e ->
            e.Graphics.FillEllipse(brush, 50, 50, 50, 50))
Addition of Ellipse
[Click on image for larger view.] Figure 8: Addition of Ellipse

In this article I described the capabilities provided by F# to handle text processing, type definition, I/O operations and visualization through Windows Forms. Now it's time for reader to keep exploring the fascinating, elegant world of functional programming.

About the Author

Arnaldo Perez Castano is a computer scientist based in Cuba. He is the author of a series of programming books -- JavaScript Facil, HTML y CSS Facil, and Python Facil (Marcombo S.A). His expertise includes Visual Basic, C#, .NET Framework, and artificial intelligence, and offers his services as a freelancer through nubelo.com. Cinema and music are some of his passions. Contact him at [email protected].

comments powered by Disqus

Featured

  • Microsoft Revamps Fledgling AutoGen Framework for Agentic AI

    Only at v0.4, Microsoft's AutoGen framework for agentic AI -- the hottest new trend in AI development -- has already undergone a complete revamp, going to an asynchronous, event-driven architecture.

  • IDE Irony: Coding Errors Cause 'Critical' Vulnerability in Visual Studio

    In a larger-than-normal Patch Tuesday, Microsoft warned of a "critical" vulnerability in Visual Studio that should be fixed immediately if automatic patching isn't enabled, ironically caused by coding errors.

  • Building Blazor Applications

    A trio of Blazor experts will conduct a full-day workshop for devs to learn everything about the tech a a March developer conference in Las Vegas keynoted by Microsoft execs and featuring many Microsoft devs.

  • Gradient Boosting Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the gradient boosting regression technique, where the goal is to predict a single numeric value. Compared to existing library implementations of gradient boosting regression, a from-scratch implementation allows much easier customization and integration with other .NET systems.

  • Microsoft Execs to Tackle AI and Cloud in Dev Conference Keynotes

    AI unsurprisingly is all over keynotes that Microsoft execs will helm to kick off the Visual Studio Live! developer conference in Las Vegas, March 10-14, which the company described as "a must-attend event."

Subscribe on YouTube