The Data Science Lab

R Language OOP Using S3

The S3 OOP model is still widely used, so let's use write S3-style OOP code via the R language.

The R language was created primarily to perform statistical analyses in an interactive environment using hundreds of built-in functions, such as chisq.test() for goodness of fit analysis and lm() for linear model analysis. From the very beginning R has had a basic scripting language with loop control structures, if-then decision control, and so on. Oddly, R has several completely different object-oriented programming models.

In this article I'll show you how to write object-oriented programming (OOP) code using the S3 model. Although newer OOP models, notably S4 (the successor to S3), "reference classes" (RC) and R6 are superior in many ways to S3, S3 is still widely used, and many of the most common R language functions were created using S3. Even if you never write S3 code, understanding S3 can help you use the R language more effectively and give you insights into OOP in other languages.

Although the documentation for S3 is quite good, it's very detailed, which obscures some of the main ideas. Also, the documentation assumes you're an experienced R user. I'll explain programming with S3 (technically, it's scripting because R is interpreted rather than compiled) from the point of view of a .NET developer who is relatively new to R.

To get an idea of where this article is headed, take a look at the screenshot of a demo R session in Figure 1. The outer container window is the Rgui.exe shell. Inside the shell, the window on the left is the R Console where you can issue interactive commands. Here, I use the setwd function to set the working directory to the location of my demo R file, then I use the source function to execute the oopDemo.R script.

[Click on image for larger view.] Figure 1. OOP with S3 Demo Session

The right window inside the shell is the source R code for the oopDemo.R script. That script uses an S3 class that defines a Person object. In most realistic scenarios an S3 class would contain numeric arrays and matrices, but a Person class is easy to understand and is a common "Hello World" example for OOP.

Installing R and S3 and S4
If you don't have R on your system, installing R (and uninstalling) is very easy. Do an Internet search for "Install R" and you'll find a page on the cran.r-project.org Web site with a link labeled something like "Download R 3.3.1 for Windows." Click that link and you'll get an option to run a self-extracting installer file named something like R-3.3.1-win.exe. Click on the Run button presented to you by your browser to launch the installer. You can accept all the configuration defaults and installation is very quick.

The demo code has no significant R version dependencies so you can use R version 3.0 or later. The libraries needed to write S3 are included with a default R installation.

To launch the Rgui program, open a file explorer and navigate to the C:\Program Files\R\R-3.3.1\bin\x64 directory. Then double-click on file Rgui.exe and the Rgui shell with an R Console window will launch. You can clear away the wordy start-up messages with a Ctrl+L.

S3 Goals
An S3 class definition is quite a bit different from a class definition in other programming languages. A good way for .NET developers to get a grasp of S3 is to take a C# class definition and then see what a roughly equivalent definition looks like in R.

Consider this C# Person class definition skeleton:

public class Person
{
  public string lastName;
  public DateTime birthDate;

  public Person() { . . } // default ctor
  public Person(string ln, DateTime dob) { . . }
  public void display() { . . }
  public int ageOf() { . . }
}

The C# class encapsulates data fields lastName and birthDate with two constructor methods, a display method and an ageOf method. A roughly equivalent R language S3 class definition skeleton is:

Person = function(ln="NONAME", dob="1900-00-00") { . . }
display = function(obj) { . . }
display.Person = function(obj) { . . }
ageOf = function(obj) { . . }
ageOf.Person = function(obj) { . . }

An S3 class doesn't encapsulate data fields and methods; instead, an S3 class is a collection of ordinary R functions that have some special characteristics. Also, each S3 class method (display and ageOf in the example) is defined by two related functions. Very strange indeed.

The calling code for an S3 class is also quite different from the calling code for a C# class. In C# you could write:

Person p1 = new Person();
Person p2 = new Person("Barker", DateTime.Parse("1990/06/15"));
p2.lastName = "Chang";  // Change name
p2.Display();
int age = p2.AgeOf();

The equivalent code for the S3 class looks like:

p1 <- Person()
p2 <- Person("Barker", "1990-06-15")
p2$lastName <- "Chang"  # change name
display(p2)
age <- ageOf(p2)

Notice that in R, class methods are called using a pattern of methodName(objectName) instead of the C# pattern of objectName.methodName. Also, public data fields in an S3 object are accessed using the "$" operator rather than the "." operator used for C# objects.

The S3 Person Class Definition Code
The complete R code for the demo program is presented in Listing 1.

Listing 1: S3 OOP Demo Program
# oopDemo.R
# R 3.3.1

# S3 OOP
Person = function(ln="NONAME", dob="1900-01-01") {
  this <- list(
    lastName = ln,
    dateBirth = dob
  )
  class(this) <- append(class(this), "Person")
  return(this)
}

display = function(obj) {
  UseMethod("display", obj)
}

display.Person = function(obj) {
  cat("Last name     : ", obj$lastName, "\n")
  cat("Date of birth : ", obj$dateBirth, "\n")
}

ageOf = function(obj) {
  UseMethod("ageOf", obj)
}

ageOf.Person = function(obj) {
  dob <- as.POSIXlt(obj$dateBirth)
  today <- as.POSIXlt(Sys.Date())
  age <- today$year - dob$year
  if (today$mon < dob$mon ||
   (today$mon == dob$mon && today$mday < dob$mday)) {
    age <- age - 1
  } 
  return(age)
}

# =====

cat("\nBegin OOP with S3 demo \n\n")

# S3 initialization
cat("Initializing a default S3 Person object p1 \n") 
p1 <- Person()  # default param values
cat("Person 'p1' is: \n")
display(p1)
cat("\n")

cat("Initializing an S3 Person object p2 \n")
p2 <- Person("Barker", "1990-06-15")
cat("Person 'p2' is: \n")
display(p2)
cat("\n")

# S3 accessing
cat("Changing p2 lastName to 'Chang' \n")
p2$lastName = "Chang"
cat("Person 'p2' is now: \n")
display(p2)

age <- ageOf(p2)
cat("Person 'p2' age is: ", age, "\n")

cat("\nEnd OOP with S3 demo \n\n")
The first part of the S3 Person class definition is:
Person = function(ln="NONAME", dob="1900-01-01") {
  this <- list(
    lastName = ln,
    dateBirth = dob
  )
  class(this) <- append(class(this), "Person")
  return(this)
}

An S3 class is mostly a specialized function that contains an R list. Here, lastName and dateBirth are the class member variables. Notice that you can't specify the data types. Here, default values of "NONAME" and "1901-01-01" are specified for the lastName and dateBirth fields. The word "this" is not a keyword so I could have used "me" (as in VB), "self" (as in Python) or even "foo." All class member variables have public scope; there's no way to make class variables private in S3.

After the list sets up the class fields, the class is wired up using the class and append functions. In my opinion, it's best to think of this part of an S3 class definition as a magic incantation or pattern because it's always the same.

Another weird aspect of S3 classes is that each class method is defined by a pair of functions rather than a single function. The Person class display function is defined as:

display = function(obj) {
  UseMethod("display", obj)
}

display.Person = function(obj) {
  cat("Last name     : ", obj$lastName, "\n")
  cat("Date of birth : ", obj$dateBirth, "\n")
}

Every S3 class method that accesses a class field must have a parameter that represents the object. Here, I use "obj," but it isn't a keyword and I could have used "object" or "o." The first function essentially registers the name "display" with the R runtime as a global method that operates with a Person object. The second function defines the behavior of the method. Notice the required naming convention with the name of the method ("display") followed by a dot and then the name of the class ("Person").

The demo R program defines a method named ageOf that returns the age, in years, of a Person object. The registration function for method ageOf is:

ageOf = function(obj) {
  UseMethod("ageOf", obj)
}
The behavior part of the method is defined:
ageOf.Person = function(obj) {
  dob <- as.POSIXlt(obj$dateBirth)
  today <- as.POSIXlt(Sys.Date())
  age <- today$year - dob$year
  if (today$mon < dob$mon ||
   (today$mon == dob$mon && today$mday < dob$mday)) {
    age <- age - 1
  } 
  return(age)
}

The function uses the built-in POSIXlt function to convert the dateBirth string variable to a structure that has year, mon and mday fields. The calculation just subtracts year values, and then subtracts 1 if the date of birth is before the current date.

Using an S3 Object
The demo program creates a first Person object like so:

cat("\nBegin OOP with S3 demo \n\n")
cat("Initializing a default S3 Person object p1 \n") 
p1 <- Person()  # default param values
cat("Person 'p1' is: \n")
display(p1)
cat("\n")

Because an S3 class is defined using R functions, creating an S3 object looks just like calling a function. Here, because no arguments are passed to the Person constructor, the default values of "NONAME" and "1900-01-01" are used for the lastName and birthDate data fields.

A second Person object is created with this code:

cat("Initializing an S3 Person object p2 \n")
p2 <- Person("Barker", "1990-06-15")
cat("Person 'p2' is: \n")
display(p2)
cat("\n")

For Person object p2, values for the lastName and birthDate are passed by position to the constructor. R supports named parameters so the code could have been written as:

p2 <- Person(ln="Barker", dob="1990-06-15")

The last part of the demo program illustrates accessing an S3 class field and calling a class method:

cat("Changing p2 lastName to 'Chang' \n")
p2$lastName = "Chang"
cat("Person 'p2' is now: \n")
display(p2)
age <- ageOf(p2)
cat("Person 'p2' age is: ", age, "\n")
cat("\nEnd OOP with S3 demo \n\n")

Because all S3 class fields are public they can be accessed directly using the "$" operator. It's possible to write C#-style get and set methods, but in R there's no advantage in doing so.

Object Assignment
An important difference between C# objects and S3 objects is that C# objects are reference objects, but S3 objects are value objects. For example, in C#, if you wrote this code:

p1 = new Person();
p2 = p1;

Then both p1 and p2 point to the same object in memory, so a change to either affects both. But the same code for S3 objects will create an independent copy in p2 and so a change to either p1 or p2 will have no effect on the other object:

p1 <- Person()
p2 <- p1

Wrapping Up
The explanation presented in this article should give you all the information you need to get up and running with S3 classes. Whenever I need to write OOP code in R, I have a difficult time deciding whether to use the S3, S4 or RC model. The RC model is much more like the C# OOP model I'm used to, but based on my experiences, most R programmers come from a strictly R programming background and feel more comfortable with S3 and S4. So, if I'm writing code intended for my own use only, then I'll almost always use the RC model, but if I'm writing code that might be used by R programmers, I'll usually use S3 or S4.

About the Author

Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Azure and Bing. James can be reached at [email protected].

comments powered by Disqus

Featured

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

  • .NET 9 Preview 3: 'I've Been Waiting 9 Years for This API!'

    Microsoft's third preview of .NET 9 sees a lot of minor tweaks and fixes with no earth-shaking new functionality, but little things can be important to individual developers.

  • Data Anomaly Detection Using a Neural Autoencoder with C#

    Dr. James McCaffrey of Microsoft Research tackles the process of examining a set of source data to find data items that are different in some way from the majority of the source items.

  • What's New for Python, Java in Visual Studio Code

    Microsoft announced March 2024 updates to its Python and Java extensions for Visual Studio Code, the open source-based, cross-platform code editor that has repeatedly been named the No. 1 tool in major development surveys.

Subscribe on YouTube