R整洁代码
KJY
2022-12-05
数据屏蔽:使得可以不用带数据框(环境变量)名字,就能使用数据框内的变量(数据变量),便 于在数据集内计算值
环境变量(env-variables) ,一般你在Rstuido右上角的Environment中发现它。比如n <- 10这里的n
数据变量(data-variables),一般指数据框的某个变量。比如data <- data.frame(x = 1, n = 2)中的data$n
但作为函数参数时的间接使用,正常是环境变量,要想作为数据变量使用,则需要用两个大括号括起来 {{var}}:
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## Warning: package 'readr' was built under R version 4.0.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
var_summary = function(data, var) {
data %>% summarise(n = n(), mean = mean({{var}}))
}
mtcars %>%
group_by(cyl) %>%
var_summary(mpg)
## # A tibble: 3 × 3
## cyl n mean
## <dbl> <int> <dbl>
## 1 4 11 26.7
## 2 6 7 19.7
## 3 8 14 15.1
若是字符向量形式,想作为数据变量,则需要在函数体中使用 .data[[var]],这里 .data 是代替 数据集的代词:
var_summary = function(data, var) { data %>%
summarise(n = n(), mean = mean(.data[[var]]))
}
mtcars %>%
group_by(cyl) %>%
var_summary("mpg")
## # A tibble: 3 × 3
## cyl n mean
## <dbl> <int> <dbl>
## 1 4 11 26.7
## 2 6 7 19.7
## 3 8 14 15.1
同样地,整洁选择作为函数参数时的间接使用,也需要用两个大括号括起来 {{vars}}:
summarise_mean = function(data, vars) { data %>%
summarise(n = n(), across({{vars}}, mean))
}
mtcars %>%
group_by(cyl) %>%
summarise_mean(where(is.numeric))
## # A tibble: 3 × 12
## cyl n mpg disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 4 11 26.7 105. 82.6 4.07 2.29 19.1 0.909 0.727 4.09 1.55
## 2 6 7 19.7 183. 122. 3.59 3.12 18.0 0.571 0.429 3.86 3.43
## 3 8 14 15.1 353. 209. 3.23 4.00 16.8 0 0.143 3.29 3.5
若是字符向量形式,则需要借助函数 all_of() 或 any_of(),取决于你的选择:
vars = c("mpg", "vs")
mtcars %>% select(all_of(vars))
## mpg vs
## Mazda RX4 21.0 0
## Mazda RX4 Wag 21.0 0
## Datsun 710 22.8 1
## Hornet 4 Drive 21.4 1
## Hornet Sportabout 18.7 0
## Valiant 18.1 1
## Duster 360 14.3 0
## Merc 240D 24.4 1
## Merc 230 22.8 1
## Merc 280 19.2 1
## Merc 280C 17.8 1
## Merc 450SE 16.4 0
## Merc 450SL 17.3 0
## Merc 450SLC 15.2 0
## Cadillac Fleetwood 10.4 0
## Lincoln Continental 10.4 0
## Chrysler Imperial 14.7 0
## Fiat 128 32.4 1
## Honda Civic 30.4 1
## Toyota Corolla 33.9 1
## Toyota Corona 21.5 1
## Dodge Challenger 15.5 0
## AMC Javelin 15.2 0
## Camaro Z28 13.3 0
## Pontiac Firebird 19.2 0
## Fiat X1-9 27.3 1
## Porsche 914-2 26.0 0
## Lotus Europa 30.4 1
## Ford Pantera L 15.8 0
## Ferrari Dino 19.7 0
## Maserati Bora 15.0 0
## Volvo 142E 21.4 1
mtcars %>% select(!all_of(vars))
## cyl disp hp drat wt qsec am gear carb
## Mazda RX4 6 160.0 110 3.90 2.620 16.46 1 4 4
## Mazda RX4 Wag 6 160.0 110 3.90 2.875 17.02 1 4 4
## Datsun 710 4 108.0 93 3.85 2.320 18.61 1 4 1
## Hornet 4 Drive 6 258.0 110 3.08 3.215 19.44 0 3 1
## Hornet Sportabout 8 360.0 175 3.15 3.440 17.02 0 3 2
## Valiant 6 225.0 105 2.76 3.460 20.22 0 3 1
## Duster 360 8 360.0 245 3.21 3.570 15.84 0 3 4
## Merc 240D 4 146.7 62 3.69 3.190 20.00 0 4 2
## Merc 230 4 140.8 95 3.92 3.150 22.90 0 4 2
## Merc 280 6 167.6 123 3.92 3.440 18.30 0 4 4
## Merc 280C 6 167.6 123 3.92 3.440 18.90 0 4 4
## Merc 450SE 8 275.8 180 3.07 4.070 17.40 0 3 3
## Merc 450SL 8 275.8 180 3.07 3.730 17.60 0 3 3
## Merc 450SLC 8 275.8 180 3.07 3.780 18.00 0 3 3
## Cadillac Fleetwood 8 472.0 205 2.93 5.250 17.98 0 3 4
## Lincoln Continental 8 460.0 215 3.00 5.424 17.82 0 3 4
## Chrysler Imperial 8 440.0 230 3.23 5.345 17.42 0 3 4
## Fiat 128 4 78.7 66 4.08 2.200 19.47 1 4 1
## Honda Civic 4 75.7 52 4.93 1.615 18.52 1 4 2
## Toyota Corolla 4 71.1 65 4.22 1.835 19.90 1 4 1
## Toyota Corona 4 120.1 97 3.70 2.465 20.01 0 3 1
## Dodge Challenger 8 318.0 150 2.76 3.520 16.87 0 3 2
## AMC Javelin 8 304.0 150 3.15 3.435 17.30 0 3 2
## Camaro Z28 8 350.0 245 3.73 3.840 15.41 0 3 4
## Pontiac Firebird 8 400.0 175 3.08 3.845 17.05 0 3 2
## Fiat X1-9 4 79.0 66 4.08 1.935 18.90 1 4 1
## Porsche 914-2 4 120.3 91 4.43 2.140 16.70 1 5 2
## Lotus Europa 4 95.1 113 3.77 1.513 16.90 1 5 2
## Ford Pantera L 8 351.0 264 4.22 3.170 14.50 1 5 4
## Ferrari Dino 6 145.0 175 3.62 2.770 15.50 1 5 6
## Maserati Bora 8 301.0 335 3.54 3.570 14.60 1 5 8
## Volvo 142E 4 121.0 109 4.11 2.780 18.60 1 4 2
以下则不合适
vars = c("mpg", "vs")
mtcars %>% select(vars)
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(vars)` instead of `vars` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## mpg vs
## Mazda RX4 21.0 0
## Mazda RX4 Wag 21.0 0
## Datsun 710 22.8 1
## Hornet 4 Drive 21.4 1
## Hornet Sportabout 18.7 0
## Valiant 18.1 1
## Duster 360 14.3 0
## Merc 240D 24.4 1
## Merc 230 22.8 1
## Merc 280 19.2 1
## Merc 280C 17.8 1
## Merc 450SE 16.4 0
## Merc 450SL 17.3 0
## Merc 450SLC 15.2 0
## Cadillac Fleetwood 10.4 0
## Lincoln Continental 10.4 0
## Chrysler Imperial 14.7 0
## Fiat 128 32.4 1
## Honda Civic 30.4 1
## Toyota Corolla 33.9 1
## Toyota Corona 21.5 1
## Dodge Challenger 15.5 0
## AMC Javelin 15.2 0
## Camaro Z28 13.3 0
## Pontiac Firebird 19.2 0
## Fiat X1-9 27.3 1
## Porsche 914-2 26.0 0
## Lotus Europa 30.4 1
## Ford Pantera L 15.8 0
## Ferrari Dino 19.7 0
## Maserati Bora 15.0 0
## Volvo 142E 21.4 1
使用数据屏蔽或整洁选择同时修改列名的用法:
my_summarise = function(data, mean_var, sd_var) { data %>%
summarise("mean_{{mean_var}}" := mean({{mean_var}}), "sd_{{sd_var}}" := sd({{sd_var}}))
}
mtcars %>%
group_by(cyl) %>%
my_summarise(mpg, disp)
## # A tibble: 3 × 3
## cyl mean_mpg sd_disp
## <dbl> <dbl> <dbl>
## 1 4 26.7 26.9
## 2 6 19.7 41.6
## 3 8 15.1 67.8
my_summarise = function(data, group_var, summarise_var) { data %>%
group_by(across({{group_var}})) %>% summarise(across({{summarise_var}}, mean, .names = "mean_{.col}")) #整洁选择作为函数参数时的间接使用,也需要用两个大括号括起来 {{vars}}:
#环境变量要想作为数据变量使用,则需要用两个大括号括起来
}
mtcars %>%
my_summarise(c(am, cyl), where(is.numeric))
## `summarise()` has grouped output by 'am'. You can override using the `.groups`
## argument.
## # A tibble: 6 × 11
## # Groups: am [2]
## am cyl mean_mpg mean_disp mean_hp mean_…¹ mean_wt mean_…² mean_vs mean_…³
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 4 22.9 136. 84.7 3.77 2.94 21.0 1 3.67
## 2 0 6 19.1 205. 115. 3.42 3.39 19.2 1 3.5
## 3 0 8 15.0 358. 194. 3.12 4.10 17.1 0 3
## 4 1 4 28.1 93.6 81.9 4.18 2.04 18.4 0.875 4.25
## 5 1 6 20.6 155 132. 3.81 2.76 16.3 0 4.33
## 6 1 8 15.4 326 300. 3.88 3.37 14.6 0 5
## # … with 1 more variable: mean_carb <dbl>, and abbreviated variable names
## # ¹mean_drat, ²mean_qsec, ³mean_gear
环境变量在函数中当作mtcars中的数据变量,即当做mtcars的一个列的名字来使用,就需要引用(quote)和解引用(unquote)两个工序:
第一步,用 enquo()把用户传递过来的参数引用起来(引用可以理解为冷冻起来)
第二步,用 !! 解开这个引用(解引用可以理解为解冷),然后使用参数的内容
这个quote-unquote的过程让环境变量名变成了数据变量,也可以理解为在函数评估过程中,数据变量(data-variable)遮盖了环境变量(env-variable),即数据遮盖(data masking),看到cyl,正常情况下,本来应该是到环境变量里去找这个cyl对应的值,然而,数据遮盖机制,插队了,让代码去数据变量中去找cyl以及对应的值。
group_var <- quote(cyl)
summary_var <- quote(mpg)
rlang::qq_show(
data %>%
group_by(!!group_var) %>%
summarise(mean = mean(!!summary_var))
)
## data %>% group_by(cyl) %>% summarise(mean = mean(mpg))
group_var <- quote(cyl)
summary_var <- quote(mpg)
mtcars %>%
group_by(!!group_var) %>%
summarise(mean = mean(!!summary_var))
## # A tibble: 3 × 2
## cyl mean
## <dbl> <dbl>
## 1 4 26.7
## 2 6 19.7
## 3 8 15.1
grouped_mean = function(data, summary_var, group_var) {
summary_var = enquo(summary_var)
group_var = enquo(group_var)
data %>%
group_by(!!group_var) %>%
summarise(mean = mean(!!summary_var))
}
grouped_mean(mtcars, mpg, cyl)
## # A tibble: 3 × 2
## cyl mean
## <dbl> <dbl>
## 1 4 26.7
## 2 6 19.7
## 3 8 15.1
是一样的,但是使用{{}}的代码更加简单些
grouped_mean = function(data, summary_var, group_var) {
data %>%
group_by({{group_var}}) %>%
summarise(mean = mean({{summary_var}}))
}
grouped_mean(mtcars, mpg, cyl)
## # A tibble: 3 × 2
## cyl mean
## <dbl> <dbl>
## 1 4 26.7
## 2 6 19.7
## 3 8 15.1
要想修改结果列名,可借助 as_label() 函数从引用中提取名字:
grouped_mean = function(data, summary_var, group_var) {
summary_var = enquo(summary_var)
group_var = enquo(group_var)
summary_nm = str_c("mean_", as_label(summary_var))
group_nm = str_c("group_", as_label(group_var))
data %>%
group_by(!!group_nm := !!group_var) %>%
summarise(!!summary_nm := mean(!!summary_var))
}
grouped_mean(mtcars, mpg, cyl)
## # A tibble: 3 × 2
## group_cyl mean_mpg
## <dbl> <dbl>
## 1 4 26.7
## 2 6 19.7
## 3 8 15.1
我们希望输出的统计结果中,统计参数名加一个前缀 “avg_”, 可以分三步完成
获取引用参数的默认名 修改参数的默认名,比如加前缀或者后缀 !! 解引用并放在 := 左边
grouped_mean2 <- function(.data, .summary_var, ...) {
summary_var <- enquo(.summary_var)
group_vars <- enquos(...)
# Get and modify the default name
summary_nm <- as_label(summary_var)
summary_nm <- paste0("avg_", summary_nm)
.data %>%
group_by(!!!group_vars) %>%
summarise(!!summary_nm := mean(!!summary_var)) # Unquote the name
}
grouped_mean2(mtcars, disp, cyl, am)
## `summarise()` has grouped output by 'cyl'. You can override using the `.groups`
## argument.
## # A tibble: 6 × 3
## # Groups: cyl [3]
## cyl am avg_disp
## <dbl> <dbl> <dbl>
## 1 4 0 136.
## 2 4 1 93.6
## 3 6 0 205.
## 4 6 1 155
## 5 8 0 358.
## 6 8 1 326
或者更简洁的办法
my_summarise <- function(data, group_var, summarise_var) {
data %>%
group_by(across({{ group_var }})) %>%
summarise(across({{ summarise_var }}, mean, .names = "mean_{col}"))
}
my_summarise(starwars, species, height)
## # A tibble: 38 × 2
## species mean_height
## <chr> <dbl>
## 1 Aleena 79
## 2 Besalisk 198
## 3 Cerean 198
## 4 Chagrian 196
## 5 Clawdite 168
## 6 Droid NA
## 7 Dug 112
## 8 Ewok 88
## 9 Geonosian 183
## 10 Gungan 209.
## # … with 28 more rows
最后一次修改于 2022-12-05