Notes From Advanced R Part One

Liang / 2018-12-02


1. Data structure

The three properties of a vector are type, length, and attributes.

All objects can have arbitrary additional attributes

Attributes can be accessed individually with attr() or all at once (as a list) with attributes().

y <- 1:10
attr(y, "my_attribute") <- "This is a vector"
attr(y, "my_attribute")
## [1] "This is a vector"

2. Subsetting

outer() function applies a function to two arrays.

x <- c(1, 2.3, 2, 3, 4, 8, 12, 43)
y<- c(2, 4)
outer(x, y, "+")
##      [,1] [,2]
## [1,]  3.0  5.0
## [2,]  4.3  6.3
## [3,]  4.0  6.0
## [4,]  5.0  7.0
## [5,]  6.0  8.0
## [6,] 10.0 12.0
## [7,] 14.0 16.0
## [8,] 45.0 47.0

Multiply array x elements with array y elements:

x %o% y  #equal to outer(x,y,"*")
##      [,1]  [,2]
## [1,]  2.0   4.0
## [2,]  4.6   9.2
## [3,]  4.0   8.0
## [4,]  6.0  12.0
## [5,]  8.0  16.0
## [6,] 16.0  32.0
## [7,] 24.0  48.0
## [8,] 86.0 172.0
vals <- outer(1:5, 1:5, FUN = "paste", sep = ",")
vals
##      [,1]  [,2]  [,3]  [,4]  [,5] 
## [1,] "1,1" "1,2" "1,3" "1,4" "1,5"
## [2,] "2,1" "2,2" "2,3" "2,4" "2,5"
## [3,] "3,1" "3,2" "3,3" "3,4" "3,5"
## [4,] "4,1" "4,2" "4,3" "4,4" "4,5"
## [5,] "5,1" "5,2" "5,3" "5,4" "5,5"

[ will simplify the results to the lowest possible dimensionality. add drop = F

vals[,1]
## [1] "1,1" "2,1" "3,1" "4,1" "5,1"
vals[,1, drop= F]
##      [,1] 
## [1,] "1,1"
## [2,] "2,1"
## [3,] "3,1"
## [4,] "4,1"
## [5,] "5,1"

Subsetting operators diffenent in S3 and S4 objects. In S4 object, @ (equivalent to $), and slot() (equivalent to [[). @ is more restrictive than $ in that it will return an error if the slot does not exist.

upper.tri returns a matrix of logicals the same size of a given matrix with entries TRUE

x <- outer(1:5, 1:5, FUN = "*")
upper.tri(x)
##       [,1]  [,2]  [,3]  [,4]  [,5]
## [1,] FALSE  TRUE  TRUE  TRUE  TRUE
## [2,] FALSE FALSE  TRUE  TRUE  TRUE
## [3,] FALSE FALSE FALSE  TRUE  TRUE
## [4,] FALSE FALSE FALSE FALSE  TRUE
## [5,] FALSE FALSE FALSE FALSE FALSE

diag() function extracts or replaces the diagonal of a matrix, or constructs a diagonal matrix.

diag(3)
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
diag(10,3,4) # guess what?
##      [,1] [,2] [,3] [,4]
## [1,]   10    0    0    0
## [2,]    0   10    0    0
## [3,]    0    0   10    0

Factor: drops any unused levels.

z <- factor(c("a", "b"))
z[1]
## [1] a
## Levels: a b
z[1, drop=T]
## [1] a
## Levels: a

$ diffenent with [[

var <- "cyl"
# Doesn't work - mtcars$var translated to mtcars[["var"]]
mtcars$var
## NULL
# mtcars@var doesn't work
mtcars[[var]]
##  [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

$ does partial matching while [[ doesn’t

x <- list(abc = 1)
x
## $abc
## [1] 1
x$a
## [1] 1
x[["a"]]
## NULL
mod <- lm(mpg ~ wt, data = mtcars)
mod
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Coefficients:
## (Intercept)           wt  
##      37.285       -5.344

summary is a generic function used to produce result summaries of the results of various model fitting functions.

summary(mod)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Subsetting with nothing can be useful in conjunction with assignment because it will preserve the original object class and structure.

mtcars[] <- lapply(mtcars, as.integer)
mtcars
##                     mpg cyl disp  hp drat wt qsec vs am gear carb
## Mazda RX4            21   6  160 110    3  2   16  0  1    4    4
## Mazda RX4 Wag        21   6  160 110    3  2   17  0  1    4    4
## Datsun 710           22   4  108  93    3  2   18  1  1    4    1
## Hornet 4 Drive       21   6  258 110    3  3   19  1  0    3    1
## Hornet Sportabout    18   8  360 175    3  3   17  0  0    3    2
## Valiant              18   6  225 105    2  3   20  1  0    3    1
## Duster 360           14   8  360 245    3  3   15  0  0    3    4
## Merc 240D            24   4  146  62    3  3   20  1  0    4    2
## Merc 230             22   4  140  95    3  3   22  1  0    4    2
## Merc 280             19   6  167 123    3  3   18  1  0    4    4
## Merc 280C            17   6  167 123    3  3   18  1  0    4    4
## Merc 450SE           16   8  275 180    3  4   17  0  0    3    3
## Merc 450SL           17   8  275 180    3  3   17  0  0    3    3
## Merc 450SLC          15   8  275 180    3  3   18  0  0    3    3
## Cadillac Fleetwood   10   8  472 205    2  5   17  0  0    3    4
## Lincoln Continental  10   8  460 215    3  5   17  0  0    3    4
## Chrysler Imperial    14   8  440 230    3  5   17  0  0    3    4
## Fiat 128             32   4   78  66    4  2   19  1  1    4    1
## Honda Civic          30   4   75  52    4  1   18  1  1    4    2
## Toyota Corolla       33   4   71  65    4  1   19  1  1    4    1
## Toyota Corona        21   4  120  97    3  2   20  1  0    3    1
## Dodge Challenger     15   8  318 150    2  3   16  0  0    3    2
## AMC Javelin          15   8  304 150    3  3   17  0  0    3    2
## Camaro Z28           13   8  350 245    3  3   15  0  0    3    4
## Pontiac Firebird     19   8  400 175    3  3   17  0  0    3    2
## Fiat X1-9            27   4   79  66    4  1   18  1  1    4    1
## Porsche 914-2        26   4  120  91    4  2   16  0  1    5    2
## Lotus Europa         30   4   95 113    3  1   16  1  1    5    2
## Ford Pantera L       15   8  351 264    4  3   14  0  1    5    4
## Ferrari Dino         19   6  145 175    3  2   15  0  1    5    6
## Maserati Bora        15   8  301 335    3  3   14  0  1    5    8
## Volvo 142E           21   4  121 109    4  2   18  1  1    4    2
mtcars <- lapply(mtcars, as.integer)
mtcars
## $mpg
##  [1] 21 21 22 21 18 18 14 24 22 19 17 16 17 15 10 10 14 32 30 33 21 15 15
## [24] 13 19 27 26 30 15 19 15 21
## 
## $cyl
##  [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
## 
## $disp
##  [1] 160 160 108 258 360 225 360 146 140 167 167 275 275 275 472 460 440
## [18]  78  75  71 120 318 304 350 400  79 120  95 351 145 301 121
## 
## $hp
##  [1] 110 110  93 110 175 105 245  62  95 123 123 180 180 180 205 215 230
## [18]  66  52  65  97 150 150 245 175  66  91 113 264 175 335 109
## 
## $drat
##  [1] 3 3 3 3 3 2 3 3 3 3 3 3 3 3 2 3 3 4 4 4 3 2 3 3 3 4 4 3 4 3 3 4
## 
## $wt
##  [1] 2 2 2 3 3 3 3 3 3 3 3 4 3 3 5 5 5 2 1 1 2 3 3 3 3 1 2 1 3 2 3 2
## 
## $qsec
##  [1] 16 17 18 19 17 20 15 20 22 18 18 17 17 18 17 17 17 19 18 19 20 16 17
## [24] 15 17 18 16 16 14 15 14 18
## 
## $vs
##  [1] 0 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 0 0 0 1
## 
## $am
##  [1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1
## 
## $gear
##  [1] 4 4 4 3 3 3 3 4 4 4 4 3 3 3 3 3 3 4 4 4 3 3 3 3 3 4 5 5 5 5 5 4
## 
## $carb
##  [1] 4 4 1 1 2 1 4 2 2 4 4 3 3 3 4 4 4 1 2 1 1 2 2 4 2 1 2 2 4 6 8 2

3. Vocabulary

on.exit() can be used in the function to perform some side effect. For example, in addition to returning a value, the following function uses on.exit() to also print two messages.

myfun = function(x){
        on.exit(print("first"))
        on.exit(print("second"), add = TRUE)
        return(x)
}
myfun(2)
## [1] "first"
## [1] "second"
## [1] 2

remove add=TRUE from the second on.exit() usage.

fun = function(x){
        on.exit(print("first"))
        on.exit(print("second"))
        return(x)
}
fun(2)
## [1] "second"
## [1] 2
fun = function(x){
        return(print("first"))
        return(print("second"))
        return(x)
}
fun(2)
## [1] "first"

invisible Return a (temporarily) invisible copy of an object.

f1 <- function(x) x
f2 <- function(x) invisible(x)
f1(1)  # prints
## [1] 1
f2(1)  # does not
f1 <- function(x){
  if( x > 0 ){
     invisible("bigger than 0")
  }else{
     return("negative number")
  }
  "something went wrong"
}
f1(1)
## [1] "something went wrong"
f2 <- function(x){
  if( x > 0 ){
     return(invisible("bigger than 0"))
  }else{
     return("negative number")
  }
}

f2(1) # not visible

invisible not print its return value but pass on the value as usual:

a <- f2(1)
a
## [1] "bigger than 0"

force invisible to visible

f1 <- function() 1
f2 <- function() invisible(1)
f1()
## [1] 1
f2()
(f2())
## [1] 1

4. Style

Nothing New.

Commenting guidelines

# Load data ---------------------------

# Plot data ---------------------------

5. Functions

R function have three parts: body(), formals(), environment().

f <- function(x) x^2
f
## function(x) x^2
formals(f)
## $x
body(f)
## x^2
environment(f)
## <environment: R_GlobalEnv>

The body(), formals(), environment() call of Primitive functions is NULL.

Primitive functions are in base package and they are more efficient

objs <- mget(ls("package:base"), inherits = TRUE)
objs[1]
## $`-`
## function (e1, e2)  .Primitive("-")
funs <- Filter(is.function, objs)
funs[1]
## $`-`
## function (e1, e2)  .Primitive("-")

is.function estimate an object is a function or not Filter is powerful than filter in dplyr, Filter extracts the elements of a vector for which a predicate (logical) function gives true mget: Return the Value of a Named Object

four basic principles behind R’s implementation of lexical scoping:

  • name masking
  • functions vs. variables
  • a fresh start
  • dynamic lookup
x <- 1
h <- function() {
  y <- 2
  i <- function() {
    z <- 3
    c(x, y, z)
  }
  i()
}
h()
## [1] 1 2 3
rm(x, h) # equal i <- c(1, 2, 3)
j <- function(x) {
  y <- 2
  function() {
    c(x, y)
  }
}
k <- j(1)
k()
## [1] 1 2
rm(j, k)

tweak rules, R will ignore objects that are not functions while it is searching both a name of variables and functions

n <- function(x) x / 2
o <- function() {
  n <- 10
  n(n)
}
o()
## [1] 5

codetools::findGlobals() can be used to lists all the external dependencies of a function

f <- function() x + 1
codetools::findGlobals(f)
## [1] "+" "x"

```, the backtick, can be used to refer a functions or variables that have otherwise reserved or illegal names:

for (i in 1:2) print(i)
## [1] 1
## [1] 2
`for`(i, 1:2, print(i))
## [1] 1
## [1] 2
{print(1); print(2); print(3)}
## [1] 1
## [1] 2
## [1] 3
`{`(print(1), print(2), print(3))
## [1] 1
## [1] 2
## [1] 3
x <- list(1:3, 4:9, 10:12)
sapply(x, "[", 2)
## [1]  2  5 11
# equivalent to
sapply(x, function(x) x[2])
## [1]  2  5 11

Arguments are matched first by exact name (perfect matching), then by prefix matching, and finally by position.

stop: stops execution of the current expression and executes an error action.

iter <- 12
try(if(iter > 10) stop("too many iterations"))

tst1 <- function(...) stop("dummy error")
try(tst1(1:10, long, calling, expression))

tst2 <- function(...) stop("dummy error", call. = FALSE)
try(tst2(1:10, longcalling, expression, but.not.seen.in.Error))
add <- function(x) {
  function(y) x + y
}
adders <- lapply(1:10, add) # make a list of function; x is from 1 to 10
adders[[1]](10)  # get the first function and supply y with 10
## [1] 11

Default arguments are evaluated inside the function.

rm(list = ls()) # remove all variable
f <- function(x = ls()) {
  a <- 1
  x
}
# ls() evaluated inside f:
f()
## [1] "a" "x"
# ls() evaluated in global environment 
f(ls())
## [1] "f"

... will match any arguments not otherwise matched, and can be easily passed on to other functions.

f <- function(...) {
  names(list(...))
}
f(a = 1, b = 2)
## [1] "a" "b"

Using ... comes at a price — any misspelled arguments will not raise an error, and any arguments after … must be fully named. This makes it easy for typos to go unnoticed:

sum(1, 2, NA, na.mr = TRUE)
## [1] NA
sum(1, 2, NA, na.rm = TRUE)
## [1] 3

The below is the same, notice the `` function

`second<-` <- function(x, value) {
  x[2] <- value
  x
}
x <- 1:10
second(x) <- 5L
x
##  [1]  1  5  3  4  5  6  7  8  9 10
second <- function(x, value) {
  x[2] <- value
  x
}
x <- 1:10
second(x) <- 5L
x
##  [1]  1  5  3  4  5  6  7  8  9 10

using pryr::address() to find the memory address, find the address is different

library(pryr)
## 
## Attaching package: 'pryr'
## The following object is masked _by_ '.GlobalEnv':
## 
##     f
## The following objects are masked from 'package:purrr':
## 
##     compose, partial
x <- 1:10
address(x)
## [1] "0x7fb188c4c498"
second(x) <- 6L
address(x)
## [1] "0x7fb18ab0fa38"

but the default subset is not change the address

x <- 1:10
address(x)
## [1] "0x7fb186530188"
x[2] <- 7L
address(x)
## [1] "0x7fb18b96dfd8"
x <- 1:10
`modify<-` <- function(x, position, value) {
  x[position] <- value
  x
}
modify(x, 1) <- 10
x
##  [1] 10  2  3  4  5  6  7  8  9 10
x <- 1:10
x <- `modify<-`(x, 1, 10)
x
##  [1] 10  2  3  4  5  6  7  8  9 10
x <- 1:10
# modify(get("x"), 1) <- 10 # do not work because this the same with 
# get("x") <- `modify<-`(get("x"), 1, 10)

Last modified on 2018-12-02