modifying columns in datatable with lapply

How to modify or update multiple columns in rdatatable with lapply and SDcols
Here’s a quick example of needing to make some updates to an existing column in a data.table.
For reasons I don’t want to get into (involving MS Word :/) I had to change a couple of columns from factor to character, and swap out the ampersand symbol and replace with ‘and’.
Dplyr afficionados will be excitedly jumping up and down at the thought of using across.
However, this was a large table, and it was already a data.table, so I used lapply
and data.table’s .SD functionality to do the job in super quick time.
This is really easy:
- create a character vector of column names
- plug
.SD
into your lapply function call , alongside the function, and any further arguments - define what
.SD
means by assigning.SDcols
to your character vector - bask in the general awesomeness of it all
Here I’m updating the areaname
and parent_area
columns, using as.character
and gsub
, like some kind of (base R) monster.
Key points - notice how char_cols
gets wrapped in the brackets and walrussed* through the function
# columns to update / modify:
char_cols <- c("areaname", "parent_area")
# modify
DT[,(char_cols) := lapply(.SD, as.character), .SDcols = char_cols]
DT[,(char_cols) := lapply(.SD, gsub, pattern = ' & ', replacement = ' and '), .SDcols = char_cols]
Dead easy. Or you can chain the two lines together, so you don’t even need a pipe (native or otherwise).
# Or do both at once by chaining:
DT[,(char_cols) := lapply(.SD, as.character), .SDcols = char_cols
][,(char_cols) := lapply(.SD, gsub, pattern = ' & ', replacement = ' and '), .SDcols = char_cols][]
# using [] on the end to force updating the display in the console
Normally I use the functional form when updating multiple columns:
DT[,`:=`(colA = b + c, colB = x*y), by = colC]
This is a slightly different way of doing things utilising the functionality of .SDcols
- I just made that up, no one really says this, do they?