
2020-04-17


参考: https://wly.supernum.tech/2019/06/r%E8%AF%AD%E8%A8%80%E7%BC%96%E7%A8%8B%E7%AF%87%E9%9D%A2%E5%90%91%E8%AF%AD%E8%A8%80%E7%9A%84%E7%BC%96%E7%A8%8B/ https://wutaoblog.com.cn/2021/01/14/r-meta/#%E5%9F%BA%E4%BA%8E%E8%AF%AD%E8%A8%80%E7%9A%84%E8%AE%A1%E7%AE%97-%E5%85%83%E7%BC%96%E7%A8%8B https://shixiangwang.github.io/home/cn/post/2019-11-20-meta-programming/

Metaprogramming (元编程1) 是指程序在运行前/时, 将代码自身作为数据进行处理,从而在更高的层面扩展了编程语法的表达力。


R中进行元编程的操作可以使用base R中的函数,也可以使用rlang函数【Tidy evaluation的实现】,当然,data.table也有自己的元编程。

通过操作命令(表达式)与执行环境,操作自己的代码。 在R语言中,“表达式”的概念有狭义和广义两种意义。狭义的表达式指表达式(expression)类对象,由expression函数产生;而广义的的表达式既包含expression类,也包含Rlanguage类。expressionlanguage是R语言中两种特殊数据类:

## Class "expression" [package "methods"]
## No Slots, prototype of class "expression"
## Extends: "vector"
## Virtual Class "language" [package "methods"]
## No Slots, prototype of class "name"
## Known Subclasses: 
## Class "name", directly
## Class "call", directly
## Class "{", directly
## Class "if", directly
## Class "<-", directly
## Class "for", directly
## Class "while", directly
## Class "repeat", directly
## Class "(", directly
## Class ".name", by class "name", distance 2, with explicit coerce
可以看到expression类由向量派生得到,language类是虚拟类,它包括我们熟悉的程序控制关键词/符号namecall 子类。

base R


call函数构建一个命令(function call),其第一个参数必须是一个字符串,指明需要被构建的命令,其余参数都会被传递给新生成的命令。

cl <- call("round",1.11)
## [1] "call"
## [1] TRUE
is.call(cl) && is.language(cl)
(cl_list <- as.list(cl))
## [[1]]
## round
## [[2]]
## [1] 1.11
## round(1.11)
## round(1.11)
mode(cl_list) <- "call";cl_list
do.call(what, args, quote = FALSE, envir = parent.frame())命令则是直接在envir中执行call命令。



quote(1:9 + 2)
enquote(1:9 + 2)
如果希望捕获代码中,某些变量名被替换为对应的值,可以使用substitute(expr, env),substitute函数除了需要捕获的代码,还可以传递一个替换环境env(可以是列表数据框执行环境等)参数,此时代码中的变量名如果在env中有对应的值,则会被替换为相应的值,除非env是全局执行环境。


substitute(a + b, list(b = 1))
substitute(a + b, baseenv())
b <- 1;substitute(a + b, globalenv())
bquote(x <- .(x) + 1, list(x = 1:9))
## Class 'formula'  language y ~ x
##   ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
f <- y~x+z
## y ~ x + z
## attr(,"variables")
## list(y, x, z)
## attr(,"factors")
##   x z
## y 0 0
## x 1 0
## z 0 1
## attr(,"term.labels")
## [1] "x" "z"
## attr(,"order")
## [1] 1 1
## attr(,"intercept")
## [1] 1
## attr(,"response")
## [1] 1
## attr(,".Environment")
## <environment: R_GlobalEnv>
(ex <- expression(x = 1, 1 + sqrt(2)))
## expression(1 + sqrt(2))
## [1] "expression"
## $x
## [1] 1
## [[2]]
## 1 + sqrt(2)
## $x
## [1] 1
## [[2]]
## [1] 2.414214
eval(expr, envir, enclos)执行捕获的代码,其中envir是代码中变量名的首要查找位置,envir中查找不到的变量名会在enclos中查找。


# #在指定的环境中计算R表达式
# eval(1+1,envir = globalenv())
# ## [1] 2
# #local函数默认情况下会在一个临时执行环境中执行代码,可以有效的舍弃运算过程中产生的中间变量,返回最后一行表达式,类似函数。
# local({
#   a <- 1:9;
#   b <- a
# },envir = new.env())
# a;b
# ## Error in eval(expr, envir, enclos): object 'a' not found
# ## [1] 1



(ex <- parse(text = "local({a <- 1;1})"))
## expression(local({
##     a <- 1
##     1
## }))
deparse(quote(x <- 1))
## [1] "expression(local({" "    a <- 1"         "    1"             
## [4] "}))"
## [1] "function (formula, data, subset, weights, na.action, method = \"qr\", " 
## [2] "    model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, "
## [3] "    contrasts = NULL, offset, ...) "                                    
## [4] "NULL"
总体而已,base R这些函数关系大概如下:


Tidy evaluation

书籍:Advanced Rmetaprogramming章节。




## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.1     ✓ dplyr   1.0.5
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x purrr::%@%()         masks rlang::%@%()
## x purrr::as_function() masks rlang::as_function()
## x dplyr::filter()      masks stats::filter()
## x purrr::flatten()     masks rlang::flatten()
## x purrr::flatten_chr() masks rlang::flatten_chr()
## x purrr::flatten_dbl() masks rlang::flatten_dbl()
## x purrr::flatten_int() masks rlang::flatten_int()
## x purrr::flatten_lgl() masks rlang::flatten_lgl()
## x purrr::flatten_raw() masks rlang::flatten_raw()
## x purrr::invoke()      masks rlang::invoke()
## x dplyr::lag()         masks stats::lag()
## x purrr::list_along()  masks rlang::list_along()
## x purrr::modify()      masks rlang::modify()
## x purrr::prepend()     masks rlang::prepend()
## x purrr::splice()      masks rlang::splice()
expr(mean(x, na.rm = TRUE))
expr(10 + 100 + 1000)
capture_it <- function(x) {
capture_it(a + b + c)
capture_it <- function(x) {
capture_it(a + b + c)
capture_it <- function(x) {
capture_it(a + b + c)
几乎每种编程语言都将代码表示为一棵树,通常称为抽象语法树,简称 AST。在R中,可以通过lobstr::ast(x)查看代码树。

lobstr::ast(f1(f2(a = 1+2*3, b), f3(1, f4(2))))
## █─f1 
## ├─█─f2 
## │ ├─a = █─`+` 
## │ │     ├─1 
## │ │     └─█─`*` 
## │ │       ├─2 
## │ │       └─3 
## │ └─b 
## └─█─f3 
##   ├─1 
##   └─█─f4 
##     └─2
在base R中提供call函数生成代码,而rlang则使用call2和unquoting。

rlang::call2("+", 1, call2("*", 2, 3))
rlang使用unquote操作符!!(发音为bang bang)可以将存储的代码树插入被捕获表达式中:

xx <- expr(x + x)
yy <- expr(y + y)

expr(!!xx / !!yy)
cv <- function(var) {
  var <- enexpr(var)
  expr(sd(!!var) / mean(!!var))

cv(x + y)
xs <- exprs(1, a, -b)
expr(f(!!!xs, y))
eval_tidy(expr,data = NULL,env = caller_env)eval的一种变体,其使用as_data_mask函数增加了一层数据掩码,eval_tidy的data参数中的对象优先于调用环境中的对象。

Advanced R展示了一个例子,用于解释使用数据掩码时必须始终使用enquo()而不是enexpr()

with2 <- function(df, expr) {
  a <- 1000
  eval_tidy(enexpr(expr), df)
df <- data.frame(x = 1:3)
a <- 10
with2(df, x + a)
可以看到捕获到的表达式中a变量的值为1000,而不是全局变量中的10,而rlang 使用一种新的数据结构解决这个问题: 将表达式与环境捆绑在一起的quosure。

with2 <- function(df, expr) {
  a <- 1000
  eval_tidy(enquo(expr), df)

with2(df, x + a)
parse_expr(x)可以解析字符串为表达式,类似与parse,而expr_text 则类似deparse

chr <- "y <- x + 10"
(z <- parse_expr(chr))
path <- tempfile("my-file.R")
# tempfile returns a vector of character strings which can be used as names for temporary files.
cat("1; 2; mtcars", file = path)
## [[1]]
## [1] 1
## [[2]]
## [1] 2
## [[3]]
## mtcars
e2 = "vs + am ; am +vs";
mtcars %>% mutate(!!!parse_exprs(e2))
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb vs + am
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4       1
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4       1
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1       2
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1       1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2       0
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1       1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4       0
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2       1
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2       1
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4       1
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4       1
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3       0
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3       0
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3       0
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4       0
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4       0
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4       0
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1       2
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2       2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1       2
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1       1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2       0
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2       0
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4       0
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2       0
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1       2
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2       1
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2       2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4       1
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6       1
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8       1
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2       2
##                     am + vs
## Mazda RX4                 1
## Mazda RX4 Wag             1
## Datsun 710                2
## Hornet 4 Drive            1
## Hornet Sportabout         0
## Valiant                   1
## Duster 360                0
## Merc 240D                 1
## Merc 230                  1
## Merc 280                  1
## Merc 280C                 1
## Merc 450SE                0
## Merc 450SL                0
## Merc 450SLC               0
## Cadillac Fleetwood        0
## Lincoln Continental       0
## Chrysler Imperial         0
## Fiat 128                  2
## Honda Civic               2
## Toyota Corolla            2
## Toyota Corona             1
## Dodge Challenger          0
## AMC Javelin               0
## Camaro Z28                0
## Pontiac Firebird          0
## Fiat X1-9                 2
## Porsche 914-2             1
## Lotus Europa              2
## Ford Pantera L            1
## Ferrari Dino              1
## Maserati Bora             1
## Volvo 142E                2




for_evaluation <- quote(2+2)
for_evaluation1 <- quote(a <- 2+2)


tpt_string <- "1:5"

tpt_expression <- parse(text = tpt_string)

## [1] 1 2 3 4 5


## [1] "2 + 2"





R 变量名称和字符串的转换

使用assign 和 get函数

get():返回与字符串同名的变量的值 assign():为字符串变量的字符串赋值

The get function searches and calls a data object. The assign R function assigns values to a variable name.


get R Function (5 Example Codes) | Call Vector or Column of Data Frame (statisticsglobe.com)

assign Function in R (2 Examples) | How to Store Values in Variable Name (statisticsglobe.com)

强制运算符(Forcing operators)(!!及!!!)

大家熟知的!符号的作用可能都是逻辑运算,用来表示否定的逻辑。但实际上它们还有另一个作用,就是Forcing operators。Forcing operators一共有两种,!!是bang-bang operator,!!!是big-bang operator。如果我们在Rstudio的帮助功能中搜索!!,就会查到这两个运算符的解释:

The bang-bang operator !! forces a single object. One common case for !! is to substitute an environment-variable (created with <-) with a data-variable (inside a data frame).

The big-bang operator !!! forces-splice a list of objects. The elements of the list are spliced in place, meaning that they each become one single argument. 我查了半天都没查到这两个运算符到底要怎么翻译,音译跟意译都挺搞笑的…就姑且统称强制运算符吧。

sym把一个R字符串对象转换为一个symbol。 (bang-bang操作符)!!则把symbol再转回R字符串对象,等同于直接写这个对象。


We need some way to explicitly unquote the input to tell cement() to remove the automatic quote marks. Here we need time and name to be treated differently to Good. Quasiquotation (准报价) gives us a standard tool to do so: !!, called “unquote”, and pronounced bang-bang. !! tells a quoting function to drop the implicit (内含的) quotes:

name <- "Hadley"
time <- "morning"

paste("Good", time, name)
## [1] "Good morning Hadley"
cement <- function(...) {
  args <- ensyms(...)
  paste(purrr::map(args, as_string), collapse = " ")

cement(Good, time, name)
## [1] "Good time name"

cement() quotes its arguments, so we must unquote where needed.

cement(Good, !!time, !!name)
## [1] "Good morning Hadley"

!! is a one-to-one replacement. !!! (called “unquote-splice”, and pronounced bang-bang-bang) is a one-to-many replacement. It takes a list of expressions and inserts them at the location of the !!!:

xs <- exprs(1, a, -b)
expr(f(!!!xs, y))
## f(1, a, -b, y)
df <- data.frame(A=1:5, B=(1:5)*2)
df %>% select(A,B)
##   A  B
## 1 1  2
## 2 2  4
## 3 3  6
## 4 4  8
## 5 5 10
df %>% select("A", "B")
##   A  B
## 1 1  2
## 2 2  4
## 3 3  6
## 4 4  8
## 5 5 10
# 等同于
columns <- c("A","B")
columns_en <- syms(columns)
df %>% select(!!!columns_en)
##   A  B
## 1 1  2
## 2 2  4
## 3 3  6
## 4 4  8
## 5 5 10
df %>% select(columns)
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(columns)` instead of `columns` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
##   A  B
## 1 1  2
## 2 2  4
## 3 3  6
## 4 4  8
## 5 5 10
dfs <- list(
  a = data.frame(x = 1, y = 2),
  b = data.frame(x = 3, y = 4)

## $a
##   x y
## 1 1 2
## $b
##   x y
## 1 3 4
##   x y
## 1 1 2
## 2 3 4

another example

test_df <- tibble(a = 1, b = 1, c = 1, d = 1)

test_df %>%
  select(1, 2, 3)
## # A tibble: 1 x 3
##       a     b     c
##   <dbl> <dbl> <dbl>
## 1     1     1     1
our_list <- list(1, 2, 3)

# test_df %>% select(our_list) 这样不行

test_df %>% select(c(1, 2, 3)) # 这样可以
## # A tibble: 1 x 3
##       a     b     c
##   <dbl> <dbl> <dbl>
## 1     1     1     1
# This code
test_df %>%
## # A tibble: 1 x 3
##       a     b     c
##   <dbl> <dbl> <dbl>
## 1     1     1     1
# Translates to this
test_df %>%
  select(1, 2, 3)
## # A tibble: 1 x 3
##       a     b     c
##   <dbl> <dbl> <dbl>
## 1     1     1     1



最后一次修改于 2020-04-17