List of functions from tidyverse that I do not use often

I do not use these functions often, but they can be really useful for some tasks.

ggplot2 package:
- coord_cartesian(xlim = , ylim = ) to zoom in a part of a figure, which is different from xlim() or scale_x_continuous(limits = ). The later will simply toss data points.
- cut_width(), cut_interval(), cut_number() to convert a continous variable to groups.
- ggplot by default will drop categories without any value, to avoid this, use ... + geom_bar() + scale_x_discrete(drop = FALSE).
- reorder factor according to an numerical variable: ggplot(data, aes(num_var, forcats::fct_reorder(factor_var, num_var))) + geom_point().
- remove legend: ... + guides(fill = FALSE) or ... + guides(color = FALSE)
- change legend rows: ... + guides(fill = guide_legend(nrow = 1))
- change legend title: ... + labs(fill = "title") or ... + labs(color = "title") or ... + scale_fill_xxx(name = "title")
- change axes tick labels: e.g. ... + scale_x_log10(labels = scales::dollar, labels = scales::wrap_format(10), breaks = ...). Package scales can be useful.
- draw maps: ... + geom_polygon(aes(group = group)) + coord_map(projection = "albers", lat0 = 39, lat1 = 45)
- when write a function for plotting, aes_string() can be useful.
- scale_x_continuous(expand = c(.1, .1)) to expand the plot to avoid cutoff of labels.
- scale_x_discrete(limits = rev(level(grp))) to reverse the order of a factor.
- p + xlab(NULL) to remove x labels and its space.
tidyr package:
- complete() complete a data frame with missing combinations of data. Turns implicit missing values into explicit missing values.
- fill() Fills missing values in using the previous entry. Useful if repeated values are omitted. Last observation carried forward.
- convert = TRUE within gather() and spread() to convert the generated column into correct types.
- extract() with regular expressions to extract part of a column.
dplyr package
- transmute() will only keep generated variables.
- count() count the number of observations.
- left_join(x, y, by = c("a" = "b")) when key variable has different names in x and y.
- bind_rows(list) = plyr::ldply(list): stack a list into a data frame (not always work, e.g. bind_rows(list(1:2, 3:4)) does not work but ldply() works)
stringr package
- str_subset(words, "x$") = words[str_detect(words, "x$")]
- str_count() will count how many matches resulted from str_detect(). str_count("abababa", "aba") will return 2.
- When you use a pattern that’s a string, it’s automatically wrapped into a call to regex(). See more options for regex().
forcats package
- fct_reorder(), fct_reorder2()
- fct_infreq(), fct_rev(), fct_recode(), fct_collapse(), fct_lump()
purrr package
- map(imput, fun), similar as lapply(); when input is a data frame, do something specified by fun to each column and return as a list. If want to return vector, use map_dbl(), map_lgl(), etc.
- when input is a list, same as plyr::l_ply(); e.g. we can use split(mtcars, mtcars$cyl) to get a list from a data frame.
- split(mtcars, mtcars$cyl) %>% map(~lm(mpg ~ wt, data = .)) do a lm to each element of the list; ~ is a shortcut for anonymous function, e.g. split(mtcars, mtcars$cyl) %>% map( function(df) lm(mpg ~ wt, data = df))
- a list of models from the above point named as models, then models %>% map(summary) %>% map_dbl(~.$r.squared) will extract $R^2$ of each model. We can do this by strings too: models %>% map(summary) %>% map_dbl("r.squared"); can even use position sometimes, e.g. map_dbl(list(list(1, 2, 3), list(4, 5, 6), list(7, 8, 9)), 2).