1. OO (Object-oriented) field guide
R has four object oriented systems:
- base type: are mostly manipulated using C code
- S3 type: S3 is a very casual system. It has no formal definition of classes.
- S4 type: S4 has formal class definitions and multiple dispatch
- RC (Reference Classes) type
A class defines the behaviour of objects by describing their attributes and their relationship to other classes.
pryr
provides tools to pry back the surface of R and dig into the details.
Base types
new base types are added very rarely. determine the object’s base type with typeof().
# The type of a function is "closure"
f <- function() {}
typeof(f)
## [1] "closure"
is.function(f)
## [1] TRUE
# The type of a primitive function is "builtin"
typeof(sum)
## [1] "builtin"
is.primitive(sum)
## [1] TRUE
Functions for different base types are almost always written in C. S3 objects can be built on top of any base type, S4 objects use a special base type, and RC objects are a combination of S4 and environments.
To see if an object is a pure base type, check that is.object(x) returns FALSE.
if the object x has the R internal OBJECT, the is.object()
return True
is.object(f)
## [1] FALSE
is.object(1) # FALSE
## [1] FALSE
is.object(as.factor(1:3)) # TRUE
## [1] TRUE
S3 type
S3 is the only OO system used in the base and stats packages.
pryr::otype():
can be used to test if an object is an S3 object in base R.
library(pryr)
##
## Attaching package: 'pryr'
## The following object is masked _by_ '.GlobalEnv':
##
## f
## The following objects are masked from 'package:purrr':
##
## compose, partial
df <- data.frame(x = 1:10, y = letters[1:10])
otype(df) # A data frame is an S3 class
## [1] "S3"
otype(df$x) # A numeric vector isn't
## [1] "base"
otype(df$y) # A factor is
## [1] "S3"
In S3, methods belong to functions, called generic functions. S3 methods do not belong to objects or classes.
ftype()
can describes the object system and to determinate if a function is an S3 method or generic:
ftype(mean)
## [1] "s3" "generic"
ftype(t.data.frame) # data frame method for t()
## [1] "s3" "method"
ftype(t.test) # generic function for t tests
## [1] "s3" "generic"
Defining classes and creating objects
# Create and assign class in one step
foo <- structure(list(), class = "foo")
# ftype(foo)
# Create, then set class
foo <- list()
class(foo) <- "foo"
foo
## list()
## attr(,"class")
## [1] "foo"
determine the class of any object using class(x)
, and see if an object inherits from a specific class using inherits(x, "classname")
.
class(foo)
## [1] "foo"
inherits(foo, "foo")
## [1] TRUE
we can change the class of existing objects:
Creating new methods and generics
To add a new generic, just need to create a function that calls UseMethod().
f <- function(x) UseMethod("f")
f.a <- function(x) "Class a"
f.b <- function(x) "Class b"
a <- structure(list(), class = "a")
class(a)
## [1] "a"
f(a)
## [1] "Class a"
f.a(a)
## [1] "Class a"
f.b(a)
## [1] "Class b"
b <- structure(list(), class = "b")
class(b)
## [1] "b"
f(b)
## [1] "Class b"
f.a(b)
## [1] "Class a"
f.b(b)
## [1] "Class b"
f <- function(x) UseMethod("f")
f.a <- function(x) "Class a"
f.default <- function(x) "Unknown class"
# Method for a class
f(structure(list(), class = "a"))
## [1] "Class a"
# No method for b class, so uses method for a class
f(structure(list(), class = c("b", "a")))
## [1] "Class b"
# No method for c class, so falls back to default
f(structure(list(), class = "c"))
## [1] "Unknown class"
# Force R to call the wrong method, f.a should only use for a class
f.a(structure(list(), class = "c"))
## [1] "Class a"
S4 type
Classes have formal definitions which describe their fields and inheritance structures (parent classes).
Method dispatch can be based on multiple arguments to a generic function
There is a special operator, @, for extracting slots from an S4 object.
Recognising objects, generic functions, and methods
No S4 classes in the base packages
library(stats4)
# From example(mle)
y <- c(26, 17, 13, 12, 20, 5, 9, 8, 5, 4, 8)
nLL <- function(lambda) - sum(dpois(y, lambda, log = TRUE))
fit <- mle(nLL, start = list(lambda = 5), nobs = length(y))
# An S4 object
isS4(fit)
## [1] TRUE
otype(fit)
## [1] "S4"
ftype(nobs)
## [1] "s4" "generic"
# nobs is a function to extract the number of observations from a Fit.
Defining classes and creating objects
An S4 class has three key properties:
- A name: an alpha-numeric class identifier.
- A named list of slots (fields), which defines slot names and permitted classes.
- A string giving the class it inherits from, or, in S4 terminology, that it contains.
setClass("Person",
slots = list(name = "character", age = "numeric"))
setClass("Employee",
slots = list(boss = "Person"),
contains = "Person")
alice <- new("Person", name = "Alice", age = 40)
john <- new("Employee", name = "John", age = 20, boss = alice)
alice
## An object of class "Person"
## Slot "name":
## [1] "Alice"
##
## Slot "age":
## [1] 40
alice@name
## [1] "Alice"
john@boss; slot(john, "boss")
## An object of class "Person"
## Slot "name":
## [1] "Alice"
##
## Slot "age":
## [1] 40
## An object of class "Person"
## Slot "name":
## [1] "Alice"
##
## Slot "age":
## [1] 40
(@ is equivalent to $, and slot() to [[.) but the $ and slot() doesn't work for S4
# john$boss
Creating new methods and generics
we could take union(), which usually just works on vectors, and make it work with data frames:
alist <- c("a", "b", "a")
blist <- c("a", "c", "d")
union(alist, blist)
## [1] "a" "b" "c" "d"
x <- data.frame("SN" = 1:2, "Age" = c(21,15), "Name" = c("John","Dora"))
y <- data.frame("SN" = 1:2, "Age" = c(21,15), "Name" = c("John","Alice"))
union(x, y)
## Warning: Column `Name` joining factors with different levels, coercing to
## character vector
## SN Age Name
## 1 2 15 Alice
## 2 2 15 Dora
## 3 1 21 John
setGeneric("union")
## [1] "union"
setMethod("union",
c(x = "data.frame", y = "data.frame"),
function(x, y) {
unique(rbind(x, y))
}
)
union(x, y)
## SN Age Name
## 1 1 21 John
## 2 2 15 Dora
## 4 2 15 Alice
2. Environments
Environment basics
an environment serve as a bag of names: Each name points to an object stored in memory:
e <- new.env()
e$a <- FALSE
e$b <- "a"
e$c <- 2.3
e$d <- 1:3
ls(e) # show all object in the environment
## [1] "a" "b" "c" "d"
multiple names can point to the same object, Every environment has a parent, another environment.
an environment is similar to a list, with four important exceptions:
Every name in an environment is unique.
The names in an environment are not ordered (i.e., it doesn’t make sense to ask what the first element of an environment is).
An environment has a parent.
Environments have reference semantics.
four special environments:
globalenv(), or global environment, is the interactive workspace. This is the environment in which you normally work.
baseenv(), or base environment, is the environment of the base package. Its parent is the empty environment.
emptyenv(), or empty environment, is the ultimate ancestor of all environments
environment() is the current environment.
search() lists all parents of the global environment.
search()
## [1] ".GlobalEnv" "package:stats4" "package:pryr"
## [4] "package:stats" "package:graphics" "package:grDevices"
## [7] "package:utils" "package:datasets" "package:magrittr"
## [10] "package:forcats" "package:stringr" "package:dplyr"
## [13] "package:purrr" "package:readr" "package:tidyr"
## [16] "package:tibble" "package:ggplot2" "package:tidyverse"
## [19] "package:methods" "Autoloads" "package:base"
Autoloads is used to save memory by only loading package objects
use new.env() to create an environment, use parent.env() to see its parent
e <- new.env()
# the default parent provided by new.env() is environment
parent.env(e)
## <environment: R_GlobalEnv>
e$a <- 1
e$b <- 2
ls(e)
## [1] "a" "b"
e$a
## [1] 1
useful way to view an environment is ls.str()
ls.str(e)
## a : num 1
## b : num 2
extract the value from a name
e$a; e[["a"]]; get("a", envir = e)
## [1] 1
## [1] 1
## [1] 1
use rm() to remove the binding.
rm("a", envir = e)
ls.str(e)
## b : num 2
use exists() to determine if a variable in an environment
x <- 10
exists("x", envir = e)
## [1] TRUE
identical(globalenv(), environment())
## [1] TRUE
Recursing over environments
where() finds the environment where that name is defined, using R’s regular scoping rules:
library(pryr)
x <- 5
where("x")
## <environment: R_GlobalEnv>
where("mean")
## <environment: base>
definition of where()
where
## function (name, env = parent.frame())
## {
## stopifnot(is.character(name), length(name) == 1)
## env <- to_env(env)
## if (identical(env, emptyenv())) {
## stop("Can't find ", name, call. = FALSE)
## }
## if (exists(name, env, inherits = FALSE)) {
## env
## }
## else {
## where(name, parent.env(env))
## }
## }
## <bytecode: 0x7ff93ebb6e40>
## <environment: namespace:pryr>
Not Interesting |
Explicit environments
the original list is not changed because modifying a list actually creates and modifies a copy.
modify <- function(x) {
x$a <- 2
invisible()
}
x_l <- list()
x_l$a <- 1
modify(x_l)
x_l$a
## [1] 1
apply it to an environment, the original environment is modified:
x_e <- new.env()
x_e$a <- 1
modify(x_e)
x_e$a
## [1] 2
3. Functional programming
R is a functional programming (FP) language.
three building blocks of functional programming
- anonymous functions
- closures (functions written by functions)
- lists of functions.
# Generate a sample dataset
set.seed(1014)
df <- data.frame(replicate(6, sample(c(1:10, -99), 6, rep = TRUE)))
names(df) <- letters[1:6]
df
## a b c d e f
## 1 1 6 1 5 -99 1
## 2 10 4 4 -99 9 3
## 3 7 9 5 4 1 4
## 4 2 9 3 8 6 8
## 5 1 10 5 9 8 6
## 6 6 2 1 3 8 5
fix_missing <- function(x) {
x[x == -99] <- NA
x
}
df[] <- lapply(df, fix_missing)
df
## a b c d e f
## 1 1 6 1 5 NA 1
## 2 10 4 4 NA 9 3
## 3 7 9 5 4 1 4
## 4 2 9 3 8 6 8
## 5 1 10 5 9 8 6
## 6 6 2 1 3 8 5
code have more advantages
Anonymous functions
# the anonymous function is same
(function(x) x + 3)(10)
## [1] 13
f <- function(x) x + 3
f(10)
## [1] 13
Closures
closure have two
# two variable inside
power <- function(exponent) {
function(x) {
x ^ exponent
}
}
square <- power(2) # exponent is 2
square(3) # x is 3
## [1] 9
print a closure, can see anything useful:
square
## function(x) {
## x ^ exponent
## }
## <environment: 0x7ff93f8d1b70>
the function itself doesn’t change, the difference is the enclosing environment
environment(square) %>% as.list()
## $exponent
## [1] 2
another way to see the structure
library(pryr)
unenclose(square)
## function (x)
## {
## x^2
## }
<<-
the double arrow operator will keep looking up the chain of parent environments until it finds a matching name
new_counter <- function() {
i <- 0
function() {
i <<- i + 1
i
}
}
counter_one <- new_counter()
unenclose(counter_one)
## function ()
## {
## 0 <<- 0 + 1
## 0
## }
counter_one()
## [1] 1
unenclose(counter_one)
## function ()
## {
## 1 <<- 1 + 1
## 1
## }
counter_one()
## [1] 2
Lists of functions
functions can be stored in lists.
compute_mean <- list(
base = function(x) mean(x),
sum = function(x) sum(x) / length(x),
manual = function(x) {
total <- 0
n <- length(x)
for (i in seq_along(x)) {
total <- total + x[i] / n
}
total
}
)
to call each function use lapply() and need an anonymous function or a new named function,
x <- runif(1e5) # return a uniform distribution data
lapply(compute_mean, function(f) f(x))
## $base
## [1] 0.4994771
##
## $sum
## [1] 0.4994771
##
## $manual
## [1] 0.4994771
lapply(compute_mean, function(f) system.time(f(x)))
## $base
## user system elapsed
## 0.001 0.000 0.001
##
## $sum
## user system elapsed
## 0.002 0.000 0.002
##
## $manual
## user system elapsed
## 0.006 0.000 0.006
x <- 1:10
funs <- list(
sum = sum,
mean = mean,
median = median
)
lapply(funs, function(f) f(x))
## $sum
## [1] 55
##
## $mean
## [1] 5.5
##
## $median
## [1] 5.5
# put a function in the aggrement of a function
midpoint <- function(f, a, b) {
(b - a) * f((a + b) / 2)
}
trapezoid <- function(f, a, b) {
(b - a) / 2 * (f(a) + f(b))
}
midpoint(sin, 0, pi)
## [1] 3.141593
trapezoid(sin, 0, pi)
## [1] 1.923671e-16
4. Functionals
A higher-order function is a function that takes a function as an input or returns a function as output. The complement to a closure is a functional
randomise <- function(f) f(runif(1e3)) # f is a function as input
randomise(mean)
## [1] 0.5078925
lapply() takes a function, applies it to each element in a list, and returns the results in the form of a list.
lapply2 <- function(x, f, ...) {
out <- vector("list", length(x))
for (i in seq_along(x)) {
out[[i]] <- f(x[[i]], ...)
}
out
}
# Create some random data
l <- replicate(20, runif(sample(1:10, 1)), simplify = FALSE)
# With lapply
unlist(lapply(l, length))
## [1] 8 3 7 4 2 2 8 4 7 6 1 6 7 8 2 10 8 6 8 3
# What class is each column?
unlist(lapply(mtcars, class))
## mpg cyl disp hp drat wt qsec
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
## vs am gear carb
## "numeric" "numeric" "numeric" "numeric"
# rcauchy return a cauchy distribution
trims <- c(0, 0.1, 0.2, 0.5)
x <- rcauchy(1000)
unlist(lapply(trims, function(trim) mean(x, trim = trim)))
## [1] 0.31527829 0.07795881 0.04608098 0.04652580
sapply and vapply
vapply() takes an additional argument specifying the output type.
sapply(mtcars, is.numeric)
## mpg cyl disp hp drat wt qsec vs am gear carb
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
args(vapply)
## function (X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)
## NULL
vapply(mtcars, is.numeric, FUN.VALUE = logical(1))
## mpg cyl disp hp drat wt qsec vs am gear carb
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
df <- data.frame(x = 1:10, y = letters[1:10])
sapply(df, class)
## x y
## "integer" "factor"
vapply(df, class, character(1))
## x y
## "integer" "factor"
df2 <- data.frame(x = 1:10, y = Sys.time() + 1:10)
sapply(df2, class)
## $x
## [1] "integer"
##
## $y
## [1] "POSIXct" "POSIXt"
Multiple inputs: Map (and mapply)
# Generate some sample data
xs <- replicate(5, runif(10), simplify = FALSE)
ws <- replicate(5, rpois(10, 5) + 1, simplify = FALSE)
compute the unweighted means:
sapply(xs, mean)
## [1] 0.6038653 0.5604356 0.5082535 0.3454798 0.4212558
calculate weighted.mean()
unlist(lapply(seq_along(xs), function(i) {
weighted.mean(xs[[i]], ws[[i]])
}))
## [1] 0.5771461 0.5824162 0.4680431 0.3401923 0.4564215
# or
unlist(lapply(1:5, function(i) {
weighted.mean(xs[[i]], ws[[i]])
}))
## [1] 0.5771461 0.5824162 0.4680431 0.3401923 0.4564215
Map, a variant of lapply()
Map(weighted.mean, xs, ws) %>% unlist() # Map(f, ...)
## [1] 0.5771461 0.5824162 0.4680431 0.3401923 0.4564215
a <- matrix(1:20, nrow = 5)
apply(a, 1, mean)
## [1] 8.5 9.5 10.5 11.5 12.5
Function operators
chatty <- function(f) {
function(x, ...) {
res <- f(x, ...)
cat("Processing ", x, "\n", sep = "")
res
}
}
f <- function(x) x ^ 2
s <- c(3, 2, 1)
chatty(f)(1)
## Processing 1
## [1] 1
vapply(s, chatty(f), numeric(1))
## Processing 3
## Processing 2
## Processing 1
## [1] 9 4 1
Last modified on 2018-12-06