The problem arises when we have to use time-series in a data.frame or use time-series operations like lag
and diff
for numeric vectors in a data.frame. Let’s look into an example.
First, wee restrict tibble printing options to minimize the space occupied.
Let’s load the economics dataset from ggplot2
for illustration.
econ <- ggplot2::economics
econ
#> # A tibble: 574 x 6
#> date pce pop psavert uempmed unemploy
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1967-07-01 507. 198712 12.6 4.5 2944
#> 2 1967-08-01 510. 198911 12.6 4.7 2945
#> 3 1967-09-01 516. 199113 11.9 4.6 2958
#> # … with 571 more rows
Then, we are going to use some stats
functions:
mutate(econ, pop_lag = stats::lag(as.ts(pop)))
#> # A tibble: 574 x 7
#> date pce pop psavert uempmed unemploy pop_lag
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1967-07-01 507. 198712 12.6 4.5 2944 198712
#> 2 1967-08-01 510. 198911 12.6 4.7 2945 198911
#> 3 1967-09-01 516. 199113 11.9 4.6 2958 199113
#> # … with 571 more rows
base::lag
only works on ts
objects. However, dplyr has thought about this problem
mutate(econ, pop_lag = dplyr::lag(pop))
#> # A tibble: 574 x 7
#> date pce pop psavert uempmed unemploy pop_lag
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1967-07-01 507. 198712 12.6 4.5 2944 NA
#> 2 1967-08-01 510. 198911 12.6 4.7 2945 198712
#> 3 1967-09-01 516. 199113 11.9 4.6 2958 198911
#> # … with 571 more rows
However, this problem extends to all the univariate functions that are applied in the same manner in a data.frame. For example
mutate(econ, pop_diff = base::diff(pop))
#> Error: Problem with `mutate()` input `pop_diff`.
#> ✖ Input `pop_diff` can't be recycled to size 574.
#> ℹ Input `pop_diff` is `base::diff(pop)`.
#> ℹ Input `pop_diff` must be size 574 or 1, not 573.
The idea for transx
is coming from the need to construct wrapper functions.
diffx <- function(x, ...) x - dplyr::lag(x, ... )
mutate(econ, pop_diff = diffx(pop))
#> # A tibble: 574 x 7
#> date pce pop psavert uempmed unemploy pop_diff
#> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1967-07-01 507. 198712 12.6 4.5 2944 NA
#> 2 1967-08-01 510. 198911 12.6 4.7 2945 199
#> 3 1967-09-01 516. 199113 11.9 4.6 2958 202
#> # … with 571 more rows