R语言使用dplyr编程【练习】

KJY / 2020-12-14


R语言使用dplyr编程【练习】

官方文档

说的并不是很清楚,看不太懂

dplyr是R语言的数据分析包,类似于python中的pandas,能对dataframe类型的数据做很方便的数据处理和分析操作。最初我也很奇怪dplyr这个奇怪的名字,我查到其中一种解释 - d代表dataframe - plyr是英文钳子plier的谐音

dplyr如同R的大多数包,都是函数式编程,这点跟Python面向对象编程区别很大。优点是初学者比较容易接受这种函数式思维,有点类似于流水线,每个函数就是一个车间,多个车间共同完成一个生产(数据分析)任务。

dplyr有两种类型的函数,data masking或者tidy selection

数据脱敏(Data Masking),又称数据漂白、数据去隐私化或数据变形

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# base R
starwars[starwars$homeworld == "Naboo" & starwars$species == "Human", ,]
## # A tibble: 13 x 14
##    name    height  mass hair_color skin_color eye_color birth_year sex    gender
##    <chr>    <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>  <chr> 
##  1 Palpat…    170    75 grey       pale       yellow            82 male   mascu…
##  2 <NA>        NA    NA <NA>       <NA>       <NA>              NA <NA>   <NA>  
##  3 <NA>        NA    NA <NA>       <NA>       <NA>              NA <NA>   <NA>  
##  4 <NA>        NA    NA <NA>       <NA>       <NA>              NA <NA>   <NA>  
##  5 <NA>        NA    NA <NA>       <NA>       <NA>              NA <NA>   <NA>  
##  6 Gregar…    185    85 black      dark       brown             NA male   mascu…
##  7 Cordé      157    NA brown      light      brown             NA female femin…
##  8 Dormé      165    NA brown      light      brown             NA female femin…
##  9 <NA>        NA    NA <NA>       <NA>       <NA>              NA <NA>   <NA>  
## 10 <NA>        NA    NA <NA>       <NA>       <NA>              NA <NA>   <NA>  
## 11 <NA>        NA    NA <NA>       <NA>       <NA>              NA <NA>   <NA>  
## 12 <NA>        NA    NA <NA>       <NA>       <NA>              NA <NA>   <NA>  
## 13 Padmé …    165    45 brown      light      brown             46 female femin…
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
# tidy

starwars %>% filter(homeworld == "Naboo", species == "Human")
## # A tibble: 5 x 14
##   name     height  mass hair_color skin_color eye_color birth_year sex   gender 
##   <chr>     <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr>  
## 1 Palpati…    170    75 grey       pale       yellow            82 male  mascul…
## 2 Gregar …    185    85 black      dark       brown             NA male  mascul…
## 3 Cordé       157    NA brown      light      brown             NA fema… femini…
## 4 Dormé       165    NA brown      light      brown             NA fema… femini…
## 5 Padmé A…    165    45 brown      light      brown             46 fema… femini…
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
mtcars
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

使用{{}}来拥抱变量在这种情况下。

var_summary <- function(data, var) {
  data %>%
    summarise(n = n(), min = min({{ var }}), max = max({{ var }}))
}
mtcars %>% 
  group_by(cyl) %>% 
  var_summary(mpg)
## # A tibble: 3 x 4
##     cyl     n   min   max
##   <dbl> <int> <dbl> <dbl>
## 1     4    11  21.4  33.9
## 2     6     7  17.8  21.4
## 3     8    14  10.4  19.2

但是如果环境变量是字符的话,则可以用[[]]来代替

for (var in names(mtcars)) {
  mtcars %>% count(.data[[var]]) %>% print()
}
##     mpg n
## 1  10.4 2
## 2  13.3 1
## 3  14.3 1
## 4  14.7 1
## 5  15.0 1
## 6  15.2 2
## 7  15.5 1
## 8  15.8 1
## 9  16.4 1
## 10 17.3 1
## 11 17.8 1
## 12 18.1 1
## 13 18.7 1
## 14 19.2 2
## 15 19.7 1
## 16 21.0 2
## 17 21.4 2
## 18 21.5 1
## 19 22.8 2
## 20 24.4 1
## 21 26.0 1
## 22 27.3 1
## 23 30.4 2
## 24 32.4 1
## 25 33.9 1
##   cyl  n
## 1   4 11
## 2   6  7
## 3   8 14
##     disp n
## 1   71.1 1
## 2   75.7 1
## 3   78.7 1
## 4   79.0 1
## 5   95.1 1
## 6  108.0 1
## 7  120.1 1
## 8  120.3 1
## 9  121.0 1
## 10 140.8 1
## 11 145.0 1
## 12 146.7 1
## 13 160.0 2
## 14 167.6 2
## 15 225.0 1
## 16 258.0 1
## 17 275.8 3
## 18 301.0 1
## 19 304.0 1
## 20 318.0 1
## 21 350.0 1
## 22 351.0 1
## 23 360.0 2
## 24 400.0 1
## 25 440.0 1
## 26 460.0 1
## 27 472.0 1
##     hp n
## 1   52 1
## 2   62 1
## 3   65 1
## 4   66 2
## 5   91 1
## 6   93 1
## 7   95 1
## 8   97 1
## 9  105 1
## 10 109 1
## 11 110 3
## 12 113 1
## 13 123 2
## 14 150 2
## 15 175 3
## 16 180 3
## 17 205 1
## 18 215 1
## 19 230 1
## 20 245 2
## 21 264 1
## 22 335 1
##    drat n
## 1  2.76 2
## 2  2.93 1
## 3  3.00 1
## 4  3.07 3
## 5  3.08 2
## 6  3.15 2
## 7  3.21 1
## 8  3.23 1
## 9  3.54 1
## 10 3.62 1
## 11 3.69 1
## 12 3.70 1
## 13 3.73 1
## 14 3.77 1
## 15 3.85 1
## 16 3.90 2
## 17 3.92 3
## 18 4.08 2
## 19 4.11 1
## 20 4.22 2
## 21 4.43 1
## 22 4.93 1
##       wt n
## 1  1.513 1
## 2  1.615 1
## 3  1.835 1
## 4  1.935 1
## 5  2.140 1
## 6  2.200 1
## 7  2.320 1
## 8  2.465 1
## 9  2.620 1
## 10 2.770 1
## 11 2.780 1
## 12 2.875 1
## 13 3.150 1
## 14 3.170 1
## 15 3.190 1
## 16 3.215 1
## 17 3.435 1
## 18 3.440 3
## 19 3.460 1
## 20 3.520 1
## 21 3.570 2
## 22 3.730 1
## 23 3.780 1
## 24 3.840 1
## 25 3.845 1
## 26 4.070 1
## 27 5.250 1
## 28 5.345 1
## 29 5.424 1
##     qsec n
## 1  14.50 1
## 2  14.60 1
## 3  15.41 1
## 4  15.50 1
## 5  15.84 1
## 6  16.46 1
## 7  16.70 1
## 8  16.87 1
## 9  16.90 1
## 10 17.02 2
## 11 17.05 1
## 12 17.30 1
## 13 17.40 1
## 14 17.42 1
## 15 17.60 1
## 16 17.82 1
## 17 17.98 1
## 18 18.00 1
## 19 18.30 1
## 20 18.52 1
## 21 18.60 1
## 22 18.61 1
## 23 18.90 2
## 24 19.44 1
## 25 19.47 1
## 26 19.90 1
## 27 20.00 1
## 28 20.01 1
## 29 20.22 1
## 30 22.90 1
##   vs  n
## 1  0 18
## 2  1 14
##   am  n
## 1  0 19
## 2  1 13
##   gear  n
## 1    3 15
## 2    4 12
## 3    5  5
##   carb  n
## 1    1  7
## 2    2 10
## 3    3  3
## 4    4 10
## 5    6  1
## 6    8  1

环境变量还是用{{}}包裹

summarise_mean <- function(data, vars) {
  data %>% summarise(n = n(), across({{ vars }}, mean))
}
mtcars %>% 
  group_by(cyl) %>% 
  summarise_mean(where(is.numeric))
## # A tibble: 3 x 12
##     cyl     n   mpg  disp    hp  drat    wt  qsec    vs    am  gear  carb
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1     4    11  26.7  105.  82.6  4.07  2.29  19.1 0.909 0.727  4.09  1.55
## 2     6     7  19.7  183. 122.   3.59  3.12  18.0 0.571 0.429  3.86  3.43
## 3     8    14  15.1  353. 209.   3.23  4.00  16.8 0     0.143  3.29  3.5

但是变量是字符串的时候用all_of或者是any_of

vars <- c("mpg", "vs")
mtcars %>% select(all_of(vars))
##                      mpg vs
## Mazda RX4           21.0  0
## Mazda RX4 Wag       21.0  0
## Datsun 710          22.8  1
## Hornet 4 Drive      21.4  1
## Hornet Sportabout   18.7  0
## Valiant             18.1  1
## Duster 360          14.3  0
## Merc 240D           24.4  1
## Merc 230            22.8  1
## Merc 280            19.2  1
## Merc 280C           17.8  1
## Merc 450SE          16.4  0
## Merc 450SL          17.3  0
## Merc 450SLC         15.2  0
## Cadillac Fleetwood  10.4  0
## Lincoln Continental 10.4  0
## Chrysler Imperial   14.7  0
## Fiat 128            32.4  1
## Honda Civic         30.4  1
## Toyota Corolla      33.9  1
## Toyota Corona       21.5  1
## Dodge Challenger    15.5  0
## AMC Javelin         15.2  0
## Camaro Z28          13.3  0
## Pontiac Firebird    19.2  0
## Fiat X1-9           27.3  1
## Porsche 914-2       26.0  0
## Lotus Europa        30.4  1
## Ford Pantera L      15.8  0
## Ferrari Dino        19.7  0
## Maserati Bora       15.0  0
## Volvo 142E          21.4  1
mtcars %>% select(!all_of(vars))
##                     cyl  disp  hp drat    wt  qsec am gear carb
## Mazda RX4             6 160.0 110 3.90 2.620 16.46  1    4    4
## Mazda RX4 Wag         6 160.0 110 3.90 2.875 17.02  1    4    4
## Datsun 710            4 108.0  93 3.85 2.320 18.61  1    4    1
## Hornet 4 Drive        6 258.0 110 3.08 3.215 19.44  0    3    1
## Hornet Sportabout     8 360.0 175 3.15 3.440 17.02  0    3    2
## Valiant               6 225.0 105 2.76 3.460 20.22  0    3    1
## Duster 360            8 360.0 245 3.21 3.570 15.84  0    3    4
## Merc 240D             4 146.7  62 3.69 3.190 20.00  0    4    2
## Merc 230              4 140.8  95 3.92 3.150 22.90  0    4    2
## Merc 280              6 167.6 123 3.92 3.440 18.30  0    4    4
## Merc 280C             6 167.6 123 3.92 3.440 18.90  0    4    4
## Merc 450SE            8 275.8 180 3.07 4.070 17.40  0    3    3
## Merc 450SL            8 275.8 180 3.07 3.730 17.60  0    3    3
## Merc 450SLC           8 275.8 180 3.07 3.780 18.00  0    3    3
## Cadillac Fleetwood    8 472.0 205 2.93 5.250 17.98  0    3    4
## Lincoln Continental   8 460.0 215 3.00 5.424 17.82  0    3    4
## Chrysler Imperial     8 440.0 230 3.23 5.345 17.42  0    3    4
## Fiat 128              4  78.7  66 4.08 2.200 19.47  1    4    1
## Honda Civic           4  75.7  52 4.93 1.615 18.52  1    4    2
## Toyota Corolla        4  71.1  65 4.22 1.835 19.90  1    4    1
## Toyota Corona         4 120.1  97 3.70 2.465 20.01  0    3    1
## Dodge Challenger      8 318.0 150 2.76 3.520 16.87  0    3    2
## AMC Javelin           8 304.0 150 3.15 3.435 17.30  0    3    2
## Camaro Z28            8 350.0 245 3.73 3.840 15.41  0    3    4
## Pontiac Firebird      8 400.0 175 3.08 3.845 17.05  0    3    2
## Fiat X1-9             4  79.0  66 4.08 1.935 18.90  1    4    1
## Porsche 914-2         4 120.3  91 4.43 2.140 16.70  1    5    2
## Lotus Europa          4  95.1 113 3.77 1.513 16.90  1    5    2
## Ford Pantera L        8 351.0 264 4.22 3.170 14.50  1    5    4
## Ferrari Dino          6 145.0 175 3.62 2.770 15.50  1    5    6
## Maserati Bora         8 301.0 335 3.54 3.570 14.60  1    5    8
## Volvo 142E            4 121.0 109 4.11 2.780 18.60  1    4    2

这样写会有问题,因为x, grp和y都没有定义

my_summary_function <- function(data) {
  data %>% 
    filter(x > 0) %>% 
    group_by(grp) %>% 
    summarise(y = mean(y), n = n())
}

但是我们可以利用rlang中的.data来解决问题

#' @importFrom rlang .data
my_summary_function <- function(data) {
  data %>% 
    filter(.data$x > 0) %>% 
    group_by(.data$grp) %>% 
    summarise(y = mean(.data$y), n = n())
}
my_summarise <- function(.data, ...) {
  .data %>%
    group_by(...) %>%
    summarise(mass = mean(mass, na.rm = TRUE), height = mean(height, na.rm = TRUE))
}

starwars %>% my_summarise(homeworld)
## # A tibble: 49 x 3
##    homeworld       mass height
##    <chr>          <dbl>  <dbl>
##  1 Alderaan        64     176.
##  2 Aleen Minor     15      79 
##  3 Bespin          79     175 
##  4 Bestine IV     110     180 
##  5 Cato Neimoidia  90     191 
##  6 Cerea           82     198 
##  7 Champala       NaN     196 
##  8 Chandrila      NaN     150 
##  9 Concord Dawn    79     183 
## 10 Corellia        78.5   175 
## # … with 39 more rows
starwars %>% my_summarise(sex, gender)
## `summarise()` has grouped output by 'sex'. You can override using the `.groups` argument.
## # A tibble: 6 x 4
## # Groups:   sex [5]
##   sex            gender      mass height
##   <chr>          <chr>      <dbl>  <dbl>
## 1 female         feminine    54.7   169.
## 2 hermaphroditic masculine 1358     175 
## 3 male           masculine   81.0   179.
## 4 none           feminine   NaN      96 
## 5 none           masculine   69.8   140 
## 6 <NA>           <NA>        48     181.
my_summarise <- function(data, summary_vars) {
  data %>%
    summarise(across({{ summary_vars }}, ~ mean(., na.rm = TRUE)))
}
starwars %>% 
  group_by(species) %>% 
  my_summarise(c(mass, height))
## # A tibble: 38 x 3
##    species    mass height
##    <chr>     <dbl>  <dbl>
##  1 Aleena     15      79 
##  2 Besalisk  102     198 
##  3 Cerean     82     198 
##  4 Chagrian  NaN     196 
##  5 Clawdite   55     168 
##  6 Droid      69.8   131.
##  7 Dug        40     112 
##  8 Ewok       20      88 
##  9 Geonosian  80     183 
## 10 Gungan     74     209.
## # … with 28 more rows
mtcars %>% 
  names() %>% 
  purrr::map(~ count(mtcars, .data[[.x]]))
## [[1]]
##     mpg n
## 1  10.4 2
## 2  13.3 1
## 3  14.3 1
## 4  14.7 1
## 5  15.0 1
## 6  15.2 2
## 7  15.5 1
## 8  15.8 1
## 9  16.4 1
## 10 17.3 1
## 11 17.8 1
## 12 18.1 1
## 13 18.7 1
## 14 19.2 2
## 15 19.7 1
## 16 21.0 2
## 17 21.4 2
## 18 21.5 1
## 19 22.8 2
## 20 24.4 1
## 21 26.0 1
## 22 27.3 1
## 23 30.4 2
## 24 32.4 1
## 25 33.9 1
## 
## [[2]]
##   cyl  n
## 1   4 11
## 2   6  7
## 3   8 14
## 
## [[3]]
##     disp n
## 1   71.1 1
## 2   75.7 1
## 3   78.7 1
## 4   79.0 1
## 5   95.1 1
## 6  108.0 1
## 7  120.1 1
## 8  120.3 1
## 9  121.0 1
## 10 140.8 1
## 11 145.0 1
## 12 146.7 1
## 13 160.0 2
## 14 167.6 2
## 15 225.0 1
## 16 258.0 1
## 17 275.8 3
## 18 301.0 1
## 19 304.0 1
## 20 318.0 1
## 21 350.0 1
## 22 351.0 1
## 23 360.0 2
## 24 400.0 1
## 25 440.0 1
## 26 460.0 1
## 27 472.0 1
## 
## [[4]]
##     hp n
## 1   52 1
## 2   62 1
## 3   65 1
## 4   66 2
## 5   91 1
## 6   93 1
## 7   95 1
## 8   97 1
## 9  105 1
## 10 109 1
## 11 110 3
## 12 113 1
## 13 123 2
## 14 150 2
## 15 175 3
## 16 180 3
## 17 205 1
## 18 215 1
## 19 230 1
## 20 245 2
## 21 264 1
## 22 335 1
## 
## [[5]]
##    drat n
## 1  2.76 2
## 2  2.93 1
## 3  3.00 1
## 4  3.07 3
## 5  3.08 2
## 6  3.15 2
## 7  3.21 1
## 8  3.23 1
## 9  3.54 1
## 10 3.62 1
## 11 3.69 1
## 12 3.70 1
## 13 3.73 1
## 14 3.77 1
## 15 3.85 1
## 16 3.90 2
## 17 3.92 3
## 18 4.08 2
## 19 4.11 1
## 20 4.22 2
## 21 4.43 1
## 22 4.93 1
## 
## [[6]]
##       wt n
## 1  1.513 1
## 2  1.615 1
## 3  1.835 1
## 4  1.935 1
## 5  2.140 1
## 6  2.200 1
## 7  2.320 1
## 8  2.465 1
## 9  2.620 1
## 10 2.770 1
## 11 2.780 1
## 12 2.875 1
## 13 3.150 1
## 14 3.170 1
## 15 3.190 1
## 16 3.215 1
## 17 3.435 1
## 18 3.440 3
## 19 3.460 1
## 20 3.520 1
## 21 3.570 2
## 22 3.730 1
## 23 3.780 1
## 24 3.840 1
## 25 3.845 1
## 26 4.070 1
## 27 5.250 1
## 28 5.345 1
## 29 5.424 1
## 
## [[7]]
##     qsec n
## 1  14.50 1
## 2  14.60 1
## 3  15.41 1
## 4  15.50 1
## 5  15.84 1
## 6  16.46 1
## 7  16.70 1
## 8  16.87 1
## 9  16.90 1
## 10 17.02 2
## 11 17.05 1
## 12 17.30 1
## 13 17.40 1
## 14 17.42 1
## 15 17.60 1
## 16 17.82 1
## 17 17.98 1
## 18 18.00 1
## 19 18.30 1
## 20 18.52 1
## 21 18.60 1
## 22 18.61 1
## 23 18.90 2
## 24 19.44 1
## 25 19.47 1
## 26 19.90 1
## 27 20.00 1
## 28 20.01 1
## 29 20.22 1
## 30 22.90 1
## 
## [[8]]
##   vs  n
## 1  0 18
## 2  1 14
## 
## [[9]]
##   am  n
## 1  0 19
## 2  1 13
## 
## [[10]]
##   gear  n
## 1    3 15
## 2    4 12
## 3    5  5
## 
## [[11]]
##   carb  n
## 1    1  7
## 2    2 10
## 3    3  3
## 4    4 10
## 5    6  1
## 6    8  1

其他人的理解

参考:https://shixiangwang.github.io/home/cn/post/2019-07-08-dplyr-programming/

dplyr 函数使用非标准计算(NSE)。这是一个术语——意味着它们不遵循通常的计算规则。 相反,它们捕获你键入的表达式并以自定义的方式对其进行计算。这让 dplyr 代码有两个主要优点:

数据框的操作可以简洁地表达,因为你不需要重复输入数据框名称。例如你可以这样写filter(df, x == 1, y == 2, z == 3)来代替df[df\(x == 1 & df\)y ==2 & df$z == 3, ].

dplyr 可以选择以不同的方式计算结果与base R 相结合。

不幸的是,这些好处不是免费的。有两个主要缺点:

  1. 大多数dplyr参数不是透明。这意味着你不能用一个看似等价的对象代替一个在别处定义的值。
df <- tibble(x = 1:3, y = 3:1)
filter(df, x == 1)
## # A tibble: 1 x 2
##       x     y
##   <int> <int>
## 1     1     3
# 不等于下面两个的意思

# my_var <- x
# filter(df, my_var == 1)


my_var <- "x"
# filter(df, my_var == 1)
  1. dplyr代码不明确,取决于在哪里定义了哪些变量, filter(df, x == y)可以等价于下面任意一个:
df[df$x == df$y, ]
## # A tibble: 1 x 2
##       x     y
##   <int> <int>
## 1     2     2
# 这下面三个其实不行

# df[df$x == y, ] 
# df[x == df$y, ]
# df[x == y, ]
greet <- function(name) {
  "How do you do, name?"
}
greet("Shixiang")
## [1] "How do you do, name?"

这样做并不行,英文输入是字符串,但是可以通过下面的到想要的结果

greet <- function(name) {
  paste0("How do you do, ", name, "?")
}
greet("Shixiang")
## [1] "How do you do, Shixiang?"
greet <- function(name) {
  glue::glue("How do you do, {name}?")
}
greet("Shixiang")
## How do you do, Shixiang?
df <- tibble(
  g1 = c(1, 1, 2, 2, 2),
  g2 = c(1, 2, 1, 2, 1),
  a = sample(5),
  b = sample(5)
)

df %>%
  group_by(g1) %>%
  summarise(a = mean(a))
## # A tibble: 2 x 2
##      g1     a
##   <dbl> <dbl>
## 1     1     3
## 2     2     3
df %>%
  group_by(g2) %>%
  summarise(a = mean(a))
## # A tibble: 2 x 2
##      g2     a
##   <dbl> <dbl>
## 1     1   2  
## 2     2   4.5
# 这样的话就是不行

my_summarise <- function(df, group_var) {
  df %>%
    group_by(group_var) %>%
    summarise(a = mean(a))
}

# my_summarise(df, g1)

# my_summarise(df, "g2")

group_by()函数像引号”一样工作:它不会计算(执行)它的输入,而是捕获输入。

想要函数工作,我们需要做两件事情。我们需要自己先捕获输入(因此my_summarise()像group_by()一样可以输入一个裸的变量名);然后我们需要告诉group_by()计算已经捕获的输入。

最后一次修改于 2020-12-14