R语言使用dplyr编程【练习】
Liang
2020-12-14
官方文档
说的并不是很清楚,看不太懂
dplyr是R语言的数据分析包,类似于python中的pandas,能对dataframe类型的数据做很方便的数据处理和分析操作。最初我也很奇怪dplyr这个奇怪的名字,我查到其中一种解释 - d代表dataframe - plyr是英文钳子plier的谐音
dplyr如同R的大多数包,都是函数式编程,这点跟Python面向对象编程区别很大。优点是初学者比较容易接受这种函数式思维,有点类似于流水线,每个函数就是一个车间,多个车间共同完成一个生产(数据分析)任务。
dplyr有两种类型的函数,data masking或者tidy selection
数据脱敏(Data Masking),又称数据漂白、数据去隐私化或数据变形
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# base R
starwars[starwars$homeworld == "Naboo" & starwars$species == "Human", ,]
## # A tibble: 13 x 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Palpat… 170 75 grey pale yellow 82 male mascu…
## 2 <NA> NA NA <NA> <NA> <NA> NA <NA> <NA>
## 3 <NA> NA NA <NA> <NA> <NA> NA <NA> <NA>
## 4 <NA> NA NA <NA> <NA> <NA> NA <NA> <NA>
## 5 <NA> NA NA <NA> <NA> <NA> NA <NA> <NA>
## 6 Gregar… 185 85 black dark brown NA male mascu…
## 7 Cordé 157 NA brown light brown NA female femin…
## 8 Dormé 165 NA brown light brown NA female femin…
## 9 <NA> NA NA <NA> <NA> <NA> NA <NA> <NA>
## 10 <NA> NA NA <NA> <NA> <NA> NA <NA> <NA>
## 11 <NA> NA NA <NA> <NA> <NA> NA <NA> <NA>
## 12 <NA> NA NA <NA> <NA> <NA> NA <NA> <NA>
## 13 Padmé … 165 45 brown light brown 46 female femin…
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
# tidy
starwars %>% filter(homeworld == "Naboo", species == "Human")
## # A tibble: 5 x 14
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Palpati… 170 75 grey pale yellow 82 male mascul…
## 2 Gregar … 185 85 black dark brown NA male mascul…
## 3 Cordé 157 NA brown light brown NA fema… femini…
## 4 Dormé 165 NA brown light brown NA fema… femini…
## 5 Padmé A… 165 45 brown light brown 46 fema… femini…
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
使用{{}}来拥抱变量在这种情况下。
var_summary <- function(data, var) {
data %>%
summarise(n = n(), min = min({{ var }}), max = max({{ var }}))
}
mtcars %>%
group_by(cyl) %>%
var_summary(mpg)
## # A tibble: 3 x 4
## cyl n min max
## <dbl> <int> <dbl> <dbl>
## 1 4 11 21.4 33.9
## 2 6 7 17.8 21.4
## 3 8 14 10.4 19.2
但是如果环境变量是字符的话,则可以用[[]]来代替
for (var in names(mtcars)) {
mtcars %>% count(.data[[var]]) %>% print()
}
## mpg n
## 1 10.4 2
## 2 13.3 1
## 3 14.3 1
## 4 14.7 1
## 5 15.0 1
## 6 15.2 2
## 7 15.5 1
## 8 15.8 1
## 9 16.4 1
## 10 17.3 1
## 11 17.8 1
## 12 18.1 1
## 13 18.7 1
## 14 19.2 2
## 15 19.7 1
## 16 21.0 2
## 17 21.4 2
## 18 21.5 1
## 19 22.8 2
## 20 24.4 1
## 21 26.0 1
## 22 27.3 1
## 23 30.4 2
## 24 32.4 1
## 25 33.9 1
## cyl n
## 1 4 11
## 2 6 7
## 3 8 14
## disp n
## 1 71.1 1
## 2 75.7 1
## 3 78.7 1
## 4 79.0 1
## 5 95.1 1
## 6 108.0 1
## 7 120.1 1
## 8 120.3 1
## 9 121.0 1
## 10 140.8 1
## 11 145.0 1
## 12 146.7 1
## 13 160.0 2
## 14 167.6 2
## 15 225.0 1
## 16 258.0 1
## 17 275.8 3
## 18 301.0 1
## 19 304.0 1
## 20 318.0 1
## 21 350.0 1
## 22 351.0 1
## 23 360.0 2
## 24 400.0 1
## 25 440.0 1
## 26 460.0 1
## 27 472.0 1
## hp n
## 1 52 1
## 2 62 1
## 3 65 1
## 4 66 2
## 5 91 1
## 6 93 1
## 7 95 1
## 8 97 1
## 9 105 1
## 10 109 1
## 11 110 3
## 12 113 1
## 13 123 2
## 14 150 2
## 15 175 3
## 16 180 3
## 17 205 1
## 18 215 1
## 19 230 1
## 20 245 2
## 21 264 1
## 22 335 1
## drat n
## 1 2.76 2
## 2 2.93 1
## 3 3.00 1
## 4 3.07 3
## 5 3.08 2
## 6 3.15 2
## 7 3.21 1
## 8 3.23 1
## 9 3.54 1
## 10 3.62 1
## 11 3.69 1
## 12 3.70 1
## 13 3.73 1
## 14 3.77 1
## 15 3.85 1
## 16 3.90 2
## 17 3.92 3
## 18 4.08 2
## 19 4.11 1
## 20 4.22 2
## 21 4.43 1
## 22 4.93 1
## wt n
## 1 1.513 1
## 2 1.615 1
## 3 1.835 1
## 4 1.935 1
## 5 2.140 1
## 6 2.200 1
## 7 2.320 1
## 8 2.465 1
## 9 2.620 1
## 10 2.770 1
## 11 2.780 1
## 12 2.875 1
## 13 3.150 1
## 14 3.170 1
## 15 3.190 1
## 16 3.215 1
## 17 3.435 1
## 18 3.440 3
## 19 3.460 1
## 20 3.520 1
## 21 3.570 2
## 22 3.730 1
## 23 3.780 1
## 24 3.840 1
## 25 3.845 1
## 26 4.070 1
## 27 5.250 1
## 28 5.345 1
## 29 5.424 1
## qsec n
## 1 14.50 1
## 2 14.60 1
## 3 15.41 1
## 4 15.50 1
## 5 15.84 1
## 6 16.46 1
## 7 16.70 1
## 8 16.87 1
## 9 16.90 1
## 10 17.02 2
## 11 17.05 1
## 12 17.30 1
## 13 17.40 1
## 14 17.42 1
## 15 17.60 1
## 16 17.82 1
## 17 17.98 1
## 18 18.00 1
## 19 18.30 1
## 20 18.52 1
## 21 18.60 1
## 22 18.61 1
## 23 18.90 2
## 24 19.44 1
## 25 19.47 1
## 26 19.90 1
## 27 20.00 1
## 28 20.01 1
## 29 20.22 1
## 30 22.90 1
## vs n
## 1 0 18
## 2 1 14
## am n
## 1 0 19
## 2 1 13
## gear n
## 1 3 15
## 2 4 12
## 3 5 5
## carb n
## 1 1 7
## 2 2 10
## 3 3 3
## 4 4 10
## 5 6 1
## 6 8 1
环境变量还是用{{}}包裹
summarise_mean <- function(data, vars) {
data %>% summarise(n = n(), across({{ vars }}, mean))
}
mtcars %>%
group_by(cyl) %>%
summarise_mean(where(is.numeric))
## # A tibble: 3 x 12
## cyl n mpg disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 4 11 26.7 105. 82.6 4.07 2.29 19.1 0.909 0.727 4.09 1.55
## 2 6 7 19.7 183. 122. 3.59 3.12 18.0 0.571 0.429 3.86 3.43
## 3 8 14 15.1 353. 209. 3.23 4.00 16.8 0 0.143 3.29 3.5
但是变量是字符串的时候用all_of或者是any_of
vars <- c("mpg", "vs")
mtcars %>% select(all_of(vars))
## mpg vs
## Mazda RX4 21.0 0
## Mazda RX4 Wag 21.0 0
## Datsun 710 22.8 1
## Hornet 4 Drive 21.4 1
## Hornet Sportabout 18.7 0
## Valiant 18.1 1
## Duster 360 14.3 0
## Merc 240D 24.4 1
## Merc 230 22.8 1
## Merc 280 19.2 1
## Merc 280C 17.8 1
## Merc 450SE 16.4 0
## Merc 450SL 17.3 0
## Merc 450SLC 15.2 0
## Cadillac Fleetwood 10.4 0
## Lincoln Continental 10.4 0
## Chrysler Imperial 14.7 0
## Fiat 128 32.4 1
## Honda Civic 30.4 1
## Toyota Corolla 33.9 1
## Toyota Corona 21.5 1
## Dodge Challenger 15.5 0
## AMC Javelin 15.2 0
## Camaro Z28 13.3 0
## Pontiac Firebird 19.2 0
## Fiat X1-9 27.3 1
## Porsche 914-2 26.0 0
## Lotus Europa 30.4 1
## Ford Pantera L 15.8 0
## Ferrari Dino 19.7 0
## Maserati Bora 15.0 0
## Volvo 142E 21.4 1
mtcars %>% select(!all_of(vars))
## cyl disp hp drat wt qsec am gear carb
## Mazda RX4 6 160.0 110 3.90 2.620 16.46 1 4 4
## Mazda RX4 Wag 6 160.0 110 3.90 2.875 17.02 1 4 4
## Datsun 710 4 108.0 93 3.85 2.320 18.61 1 4 1
## Hornet 4 Drive 6 258.0 110 3.08 3.215 19.44 0 3 1
## Hornet Sportabout 8 360.0 175 3.15 3.440 17.02 0 3 2
## Valiant 6 225.0 105 2.76 3.460 20.22 0 3 1
## Duster 360 8 360.0 245 3.21 3.570 15.84 0 3 4
## Merc 240D 4 146.7 62 3.69 3.190 20.00 0 4 2
## Merc 230 4 140.8 95 3.92 3.150 22.90 0 4 2
## Merc 280 6 167.6 123 3.92 3.440 18.30 0 4 4
## Merc 280C 6 167.6 123 3.92 3.440 18.90 0 4 4
## Merc 450SE 8 275.8 180 3.07 4.070 17.40 0 3 3
## Merc 450SL 8 275.8 180 3.07 3.730 17.60 0 3 3
## Merc 450SLC 8 275.8 180 3.07 3.780 18.00 0 3 3
## Cadillac Fleetwood 8 472.0 205 2.93 5.250 17.98 0 3 4
## Lincoln Continental 8 460.0 215 3.00 5.424 17.82 0 3 4
## Chrysler Imperial 8 440.0 230 3.23 5.345 17.42 0 3 4
## Fiat 128 4 78.7 66 4.08 2.200 19.47 1 4 1
## Honda Civic 4 75.7 52 4.93 1.615 18.52 1 4 2
## Toyota Corolla 4 71.1 65 4.22 1.835 19.90 1 4 1
## Toyota Corona 4 120.1 97 3.70 2.465 20.01 0 3 1
## Dodge Challenger 8 318.0 150 2.76 3.520 16.87 0 3 2
## AMC Javelin 8 304.0 150 3.15 3.435 17.30 0 3 2
## Camaro Z28 8 350.0 245 3.73 3.840 15.41 0 3 4
## Pontiac Firebird 8 400.0 175 3.08 3.845 17.05 0 3 2
## Fiat X1-9 4 79.0 66 4.08 1.935 18.90 1 4 1
## Porsche 914-2 4 120.3 91 4.43 2.140 16.70 1 5 2
## Lotus Europa 4 95.1 113 3.77 1.513 16.90 1 5 2
## Ford Pantera L 8 351.0 264 4.22 3.170 14.50 1 5 4
## Ferrari Dino 6 145.0 175 3.62 2.770 15.50 1 5 6
## Maserati Bora 8 301.0 335 3.54 3.570 14.60 1 5 8
## Volvo 142E 4 121.0 109 4.11 2.780 18.60 1 4 2
这样写会有问题,因为x, grp和y都没有定义
my_summary_function <- function(data) {
data %>%
filter(x > 0) %>%
group_by(grp) %>%
summarise(y = mean(y), n = n())
}
但是我们可以利用rlang中的.data来解决问题
#' @importFrom rlang .data
my_summary_function <- function(data) {
data %>%
filter(.data$x > 0) %>%
group_by(.data$grp) %>%
summarise(y = mean(.data$y), n = n())
}
my_summarise <- function(.data, ...) {
.data %>%
group_by(...) %>%
summarise(mass = mean(mass, na.rm = TRUE), height = mean(height, na.rm = TRUE))
}
starwars %>% my_summarise(homeworld)
## # A tibble: 49 x 3
## homeworld mass height
## <chr> <dbl> <dbl>
## 1 Alderaan 64 176.
## 2 Aleen Minor 15 79
## 3 Bespin 79 175
## 4 Bestine IV 110 180
## 5 Cato Neimoidia 90 191
## 6 Cerea 82 198
## 7 Champala NaN 196
## 8 Chandrila NaN 150
## 9 Concord Dawn 79 183
## 10 Corellia 78.5 175
## # … with 39 more rows
starwars %>% my_summarise(sex, gender)
## `summarise()` has grouped output by 'sex'. You can override using the `.groups` argument.
## # A tibble: 6 x 4
## # Groups: sex [5]
## sex gender mass height
## <chr> <chr> <dbl> <dbl>
## 1 female feminine 54.7 169.
## 2 hermaphroditic masculine 1358 175
## 3 male masculine 81.0 179.
## 4 none feminine NaN 96
## 5 none masculine 69.8 140
## 6 <NA> <NA> 48 181.
my_summarise <- function(data, summary_vars) {
data %>%
summarise(across({{ summary_vars }}, ~ mean(., na.rm = TRUE)))
}
starwars %>%
group_by(species) %>%
my_summarise(c(mass, height))
## # A tibble: 38 x 3
## species mass height
## <chr> <dbl> <dbl>
## 1 Aleena 15 79
## 2 Besalisk 102 198
## 3 Cerean 82 198
## 4 Chagrian NaN 196
## 5 Clawdite 55 168
## 6 Droid 69.8 131.
## 7 Dug 40 112
## 8 Ewok 20 88
## 9 Geonosian 80 183
## 10 Gungan 74 209.
## # … with 28 more rows
mtcars %>%
names() %>%
purrr::map(~ count(mtcars, .data[[.x]]))
## [[1]]
## mpg n
## 1 10.4 2
## 2 13.3 1
## 3 14.3 1
## 4 14.7 1
## 5 15.0 1
## 6 15.2 2
## 7 15.5 1
## 8 15.8 1
## 9 16.4 1
## 10 17.3 1
## 11 17.8 1
## 12 18.1 1
## 13 18.7 1
## 14 19.2 2
## 15 19.7 1
## 16 21.0 2
## 17 21.4 2
## 18 21.5 1
## 19 22.8 2
## 20 24.4 1
## 21 26.0 1
## 22 27.3 1
## 23 30.4 2
## 24 32.4 1
## 25 33.9 1
##
## [[2]]
## cyl n
## 1 4 11
## 2 6 7
## 3 8 14
##
## [[3]]
## disp n
## 1 71.1 1
## 2 75.7 1
## 3 78.7 1
## 4 79.0 1
## 5 95.1 1
## 6 108.0 1
## 7 120.1 1
## 8 120.3 1
## 9 121.0 1
## 10 140.8 1
## 11 145.0 1
## 12 146.7 1
## 13 160.0 2
## 14 167.6 2
## 15 225.0 1
## 16 258.0 1
## 17 275.8 3
## 18 301.0 1
## 19 304.0 1
## 20 318.0 1
## 21 350.0 1
## 22 351.0 1
## 23 360.0 2
## 24 400.0 1
## 25 440.0 1
## 26 460.0 1
## 27 472.0 1
##
## [[4]]
## hp n
## 1 52 1
## 2 62 1
## 3 65 1
## 4 66 2
## 5 91 1
## 6 93 1
## 7 95 1
## 8 97 1
## 9 105 1
## 10 109 1
## 11 110 3
## 12 113 1
## 13 123 2
## 14 150 2
## 15 175 3
## 16 180 3
## 17 205 1
## 18 215 1
## 19 230 1
## 20 245 2
## 21 264 1
## 22 335 1
##
## [[5]]
## drat n
## 1 2.76 2
## 2 2.93 1
## 3 3.00 1
## 4 3.07 3
## 5 3.08 2
## 6 3.15 2
## 7 3.21 1
## 8 3.23 1
## 9 3.54 1
## 10 3.62 1
## 11 3.69 1
## 12 3.70 1
## 13 3.73 1
## 14 3.77 1
## 15 3.85 1
## 16 3.90 2
## 17 3.92 3
## 18 4.08 2
## 19 4.11 1
## 20 4.22 2
## 21 4.43 1
## 22 4.93 1
##
## [[6]]
## wt n
## 1 1.513 1
## 2 1.615 1
## 3 1.835 1
## 4 1.935 1
## 5 2.140 1
## 6 2.200 1
## 7 2.320 1
## 8 2.465 1
## 9 2.620 1
## 10 2.770 1
## 11 2.780 1
## 12 2.875 1
## 13 3.150 1
## 14 3.170 1
## 15 3.190 1
## 16 3.215 1
## 17 3.435 1
## 18 3.440 3
## 19 3.460 1
## 20 3.520 1
## 21 3.570 2
## 22 3.730 1
## 23 3.780 1
## 24 3.840 1
## 25 3.845 1
## 26 4.070 1
## 27 5.250 1
## 28 5.345 1
## 29 5.424 1
##
## [[7]]
## qsec n
## 1 14.50 1
## 2 14.60 1
## 3 15.41 1
## 4 15.50 1
## 5 15.84 1
## 6 16.46 1
## 7 16.70 1
## 8 16.87 1
## 9 16.90 1
## 10 17.02 2
## 11 17.05 1
## 12 17.30 1
## 13 17.40 1
## 14 17.42 1
## 15 17.60 1
## 16 17.82 1
## 17 17.98 1
## 18 18.00 1
## 19 18.30 1
## 20 18.52 1
## 21 18.60 1
## 22 18.61 1
## 23 18.90 2
## 24 19.44 1
## 25 19.47 1
## 26 19.90 1
## 27 20.00 1
## 28 20.01 1
## 29 20.22 1
## 30 22.90 1
##
## [[8]]
## vs n
## 1 0 18
## 2 1 14
##
## [[9]]
## am n
## 1 0 19
## 2 1 13
##
## [[10]]
## gear n
## 1 3 15
## 2 4 12
## 3 5 5
##
## [[11]]
## carb n
## 1 1 7
## 2 2 10
## 3 3 3
## 4 4 10
## 5 6 1
## 6 8 1
其他人的理解
参考:https://shixiangwang.github.io/home/cn/post/2019-07-08-dplyr-programming/
dplyr 函数使用非标准计算(NSE)。这是一个术语——意味着它们不遵循通常的计算规则。 相反,它们捕获你键入的表达式并以自定义的方式对其进行计算。这让 dplyr 代码有两个主要优点:
数据框的操作可以简洁地表达,因为你不需要重复输入数据框名称。例如你可以这样写filter(df, x == 1, y == 2, z == 3)来代替df[df\(x == 1 & df\)y ==2 & df$z == 3, ].
dplyr 可以选择以不同的方式计算结果与base R 相结合。
不幸的是,这些好处不是免费的。有两个主要缺点:
- 大多数dplyr参数不是透明。这意味着你不能用一个看似等价的对象代替一个在别处定义的值。
df <- tibble(x = 1:3, y = 3:1)
filter(df, x == 1)
## # A tibble: 1 x 2
## x y
## <int> <int>
## 1 1 3
# 不等于下面两个的意思
# my_var <- x
# filter(df, my_var == 1)
my_var <- "x"
# filter(df, my_var == 1)
- dplyr代码不明确,取决于在哪里定义了哪些变量, filter(df, x == y)可以等价于下面任意一个:
df[df$x == df$y, ]
## # A tibble: 1 x 2
## x y
## <int> <int>
## 1 2 2
# 这下面三个其实不行
# df[df$x == y, ]
# df[x == df$y, ]
# df[x == y, ]
greet <- function(name) {
"How do you do, name?"
}
greet("Shixiang")
## [1] "How do you do, name?"
这样做并不行,英文输入是字符串,但是可以通过下面的到想要的结果
greet <- function(name) {
paste0("How do you do, ", name, "?")
}
greet("Shixiang")
## [1] "How do you do, Shixiang?"
greet <- function(name) {
glue::glue("How do you do, {name}?")
}
greet("Shixiang")
## How do you do, Shixiang?
df <- tibble(
g1 = c(1, 1, 2, 2, 2),
g2 = c(1, 2, 1, 2, 1),
a = sample(5),
b = sample(5)
)
df %>%
group_by(g1) %>%
summarise(a = mean(a))
## # A tibble: 2 x 2
## g1 a
## <dbl> <dbl>
## 1 1 3
## 2 2 3
df %>%
group_by(g2) %>%
summarise(a = mean(a))
## # A tibble: 2 x 2
## g2 a
## <dbl> <dbl>
## 1 1 2
## 2 2 4.5
# 这样的话就是不行
my_summarise <- function(df, group_var) {
df %>%
group_by(group_var) %>%
summarise(a = mean(a))
}
# my_summarise(df, g1)
# my_summarise(df, "g2")
group_by()函数像引号”一样工作:它不会计算(执行)它的输入,而是捕获输入。
想要函数工作,我们需要做两件事情。我们需要自己先捕获输入(因此my_summarise()像group_by()一样可以输入一个裸的变量名);然后我们需要告诉group_by()计算已经捕获的输入。
最后一次修改于 2020-12-14