Well Well Well my Excel (Last updated: 2025-06-16)

Posted on datatable, dplyr, purrr, rio, rstats, rbloggers

Ways to combine multiple excel workbooks into one dataframe, using purrr, data.table, rio and base R.

Importing multiple excel files (not sheets, but files) and flattening into one data frame / tibble / data.table should be straightforward - shouldn’t it? How many ways can there be?

What are the pros and cons of each?

I’ve made three copies of an excel file from a PreppinData challenge.

For this, I just want to import the first sheet from each workbook, into one table. Ideally I’d like to identify the sheets - but it’s not a deal breaker.

suppressPackageStartupMessages(library(rio)) suppressPackageStartupMessages(library(readxl)) suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(purrr)) suppressPackageStartupMessages(library(data.table)) files <- dir(pattern = "*.xlsx")  # in case anything else sneaks into folder 

base….how low can you go?

As above, no packages, what can we do with base R?

method_lapply <- lapply(files, read_excel) method_lapply <- do.call(rbind, Map(data.frame, method_lapply)) head(method_lapply) 

##         Date New...Saddles New...Mudguards New...Wheels New...Bags ## 1 2021-01-21            13              42           19         38 ## 2 2021-02-21             1               9           14          6 ## 3 2021-03-21             8              22            6         35 ## 4 2021-04-21             3               9            8         16 ## 5 2021-05-21             2               8            5         34 ## 6 2021-06-21            11               2            6          8 ##   Existing...Saddles Existing...Mudguards Existing...Wheels Existing...Bags ## 1                 17                   48                19              13 ## 2                  2                    4                19              24 ## 3                  0                   48                17              16 ## 4                 18                   50                18              25 ## 5                 17                    3                12              19 ## 6                  2                    8                 3               1 

Is it a data.frame? Yes.
Is it one I’d want to work with? No - those column names are horrible.

Purrr

method_purrr <- map_dfr(files, read_excel, sheet = "Manchester") str(method_purrr) 

## tibble [36 x 9] (S3: tbl_df/tbl/data.frame) ##  $ Date                : POSIXct[1:36], format: "2021-01-21" "2021-02-21" ... ##  $ New - Saddles       : num [1:36] 13 1 8 3 2 11 16 10 15 9 ... ##  $ New - Mudguards     : num [1:36] 42 9 22 9 8 2 5 7 25 11 ... ##  $ New - Wheels        : num [1:36] 19 14 6 8 5 6 15 18 1 11 ... ##  $ New - Bags          : num [1:36] 38 6 35 16 34 8 37 27 38 0 ... ##  $ Existing - Saddles  : num [1:36] 17 2 0 18 17 2 19 10 18 18 ... ##  $ Existing - Mudguards: num [1:36] 48 4 48 50 3 8 1 4 9 10 ... ##  $ Existing - Wheels   : num [1:36] 19 19 17 18 12 3 7 8 0 17 ... ##  $ Existing - Bags     : num [1:36] 13 24 16 25 19 1 28 9 23 7 ... 

Let’s verify the sheet argument is working by pulling in the London data instead:

method_purrr2 <- map_dfr(files, read_excel, sheet = "London") str(method_purrr2) 

## tibble [36 x 9] (S3: tbl_df/tbl/data.frame) ##  $ Date                : POSIXct[1:36], format: "2021-01-21" "2021-02-21" ... ##  $ New - Saddles       : num [1:36] 3 19 16 6 5 13 10 16 1 9 ... ##  $ New - Mudguards     : num [1:36] 44 32 52 56 3 6 8 9 32 8 ... ##  $ New - Wheels        : num [1:36] 18 2 2 1 6 16 7 10 2 17 ... ##  $ New - Bags          : num [1:36] 30 38 1 14 36 22 25 2 23 23 ... ##  $ Existing - Saddles  : num [1:36] 5 3 17 18 16 12 20 20 14 18 ... ##  $ Existing - Mudguards: num [1:36] 35 7 0 15 0 4 9 0 36 15 ... ##  $ Existing - Wheels   : num [1:36] 19 17 18 1 1 18 3 7 17 8 ... ##  $ Existing - Bags     : num [1:36] 19 14 14 9 26 20 12 6 23 36 ... 

If I want to identify the source workbook, how do I do that? Turns out I need to specify the .id argument, and I need to give it a name wrapped in quotes.

This fails, because I didn’t wrap source_wb ( for source workbook) in quotes :

method_purrr2 <- map_dfr(files, read_excel, sheet = "London", .id = source_wb) 

The .id argument needs to be quoted. This now works, and it returns a character vector to identify the source - although these are not the actual names of the workbooks, so still not ideal.

method_purrr2 <- map_dfr(files, read_excel, sheet = "London", .id = "source_wb") method_purrr2 %>% select(source_wb) %>% distinct() 

## # A tibble: 3 x 1 ##   source_wb ##   <chr>     ## 1 1         ## 2 2         ## 3 3 

This shows I have 3 unique source workbook values, and that the structure is as expected.

str(method_purrr2) 

## tibble [36 x 10] (S3: tbl_df/tbl/data.frame) ##  $ source_wb           : chr [1:36] "1" "1" "1" "1" ... ##  $ Date                : POSIXct[1:36], format: "2021-01-21" "2021-02-21" ... ##  $ New - Saddles       : num [1:36] 3 19 16 6 5 13 10 16 1 9 ... ##  $ New - Mudguards     : num [1:36] 44 32 52 56 3 6 8 9 32 8 ... ##  $ New - Wheels        : num [1:36] 18 2 2 1 6 16 7 10 2 17 ... ##  $ New - Bags          : num [1:36] 30 38 1 14 36 22 25 2 23 23 ... ##  $ Existing - Saddles  : num [1:36] 5 3 17 18 16 12 20 20 14 18 ... ##  $ Existing - Mudguards: num [1:36] 35 7 0 15 0 4 9 0 36 15 ... ##  $ Existing - Wheels   : num [1:36] 19 17 18 1 1 18 3 7 17 8 ... ##  $ Existing - Bags     : num [1:36] 19 14 14 9 26 20 12 6 23 36 ... 

How do I get the actual workbook names? I need to pipe the files vector to set_names and then onto map_dfr

method_purrr3 <- files %>%  set_names() %>%  map_dfr(read_excel, sheet = "Manchester",.id = "source_wb") str(method_purrr3) 

## tibble [36 x 10] (S3: tbl_df/tbl/data.frame) ##  $ source_wb           : chr [1:36] "copy2.xlsx" "copy2.xlsx" "copy2.xlsx" "copy2.xlsx" ... ##  $ Date                : POSIXct[1:36], format: "2021-01-21" "2021-02-21" ... ##  $ New - Saddles       : num [1:36] 13 1 8 3 2 11 16 10 15 9 ... ##  $ New - Mudguards     : num [1:36] 42 9 22 9 8 2 5 7 25 11 ... ##  $ New - Wheels        : num [1:36] 19 14 6 8 5 6 15 18 1 11 ... ##  $ New - Bags          : num [1:36] 38 6 35 16 34 8 37 27 38 0 ... ##  $ Existing - Saddles  : num [1:36] 17 2 0 18 17 2 19 10 18 18 ... ##  $ Existing - Mudguards: num [1:36] 48 4 48 50 3 8 1 4 9 10 ... ##  $ Existing - Wheels   : num [1:36] 19 19 17 18 12 3 7 8 0 17 ... ##  $ Existing - Bags     : num [1:36] 13 24 16 25 19 1 28 9 23 7 ... 

And let’s verify that those (admittedly horrible) source workbook names are coming through correctly

method_purrr3 %>% select(source_wb) %>% distinct() 

## # A tibble: 3 x 1 ##   source_wb               ##   <chr>                   ## 1 copy2.xlsx              ## 2 copy3.xlsx              ## 3 PD 2021 Wk 4 Input.xlsx 

for loop and data.table’s rbindlist()

filecount <- as.numeric(length(files))   temp_list <- list() for (i in seq_along(files)) {   filename <- files[i]    df <- read_excel(path = filename, sheet = "Manchester")   df$source_wb <- filename   temp_list[[i]] <- df   rm(df)  method_datatable <- data.table::rbindlist(temp_list, fill = TRUE)  }  rm(temp_list) str(method_datatable) 

## Classes 'data.table' and 'data.frame':   36 obs. of  10 variables: ##  $ Date                : POSIXct, format: "2021-01-21" "2021-02-21" ... ##  $ New - Saddles       : num  13 1 8 3 2 11 16 10 15 9 ... ##  $ New - Mudguards     : num  42 9 22 9 8 2 5 7 25 11 ... ##  $ New - Wheels        : num  19 14 6 8 5 6 15 18 1 11 ... ##  $ New - Bags          : num  38 6 35 16 34 8 37 27 38 0 ... ##  $ Existing - Saddles  : num  17 2 0 18 17 2 19 10 18 18 ... ##  $ Existing - Mudguards: num  48 4 48 50 3 8 1 4 9 10 ... ##  $ Existing - Wheels   : num  19 19 17 18 12 3 7 8 0 17 ... ##  $ Existing - Bags     : num  13 24 16 25 19 1 28 9 23 7 ... ##  $ source_wb           : chr  "copy2.xlsx" "copy2.xlsx" "copy2.xlsx" "copy2.xlsx" ... ##  - attr(*, ".internal.selfref")=<externalptr> 

Ugly, but effective. This is fast, as you’d expect, and we get the source workbook name along with the data.

rio

Not only is rio the misspelled name of my favourite Japanese, Liverpool supporting rstats football guru, but it also excels at - well, importing excel.
Check it out - the source file name comes through with no additional code required from us.

method_rio <-  import_list(files, rbind = TRUE, sheet = "Manchester") str(method_rio) 

## 'data.frame':    36 obs. of  10 variables: ##  $ Date                : POSIXct, format: "2021-01-21" "2021-02-21" ... ##  $ New - Saddles       : num  13 1 8 3 2 11 16 10 15 9 ... ##  $ New - Mudguards     : num  42 9 22 9 8 2 5 7 25 11 ... ##  $ New - Wheels        : num  19 14 6 8 5 6 15 18 1 11 ... ##  $ New - Bags          : num  38 6 35 16 34 8 37 27 38 0 ... ##  $ Existing - Saddles  : num  17 2 0 18 17 2 19 10 18 18 ... ##  $ Existing - Mudguards: num  48 4 48 50 3 8 1 4 9 10 ... ##  $ Existing - Wheels   : num  19 19 17 18 12 3 7 8 0 17 ... ##  $ Existing - Bags     : num  13 24 16 25 19 1 28 9 23 7 ... ##  $ _file               : chr  "copy2.xlsx" "copy2.xlsx" "copy2.xlsx" "copy2.xlsx" ... ##  - attr(*, ".internal.selfref")=<externalptr> 

Yes, it works, it’s quick, it identifies the source, and it’s ridiculously easy. We can also specify a name for the identity column:

method_rio2 <-  import_list(files, rbind = TRUE, rbind_label = "source_wb", sheet = "London") str(method_rio2) 

## 'data.frame':    36 obs. of  10 variables: ##  $ Date                : POSIXct, format: "2021-01-21" "2021-02-21" ... ##  $ New - Saddles       : num  3 19 16 6 5 13 10 16 1 9 ... ##  $ New - Mudguards     : num  44 32 52 56 3 6 8 9 32 8 ... ##  $ New - Wheels        : num  18 2 2 1 6 16 7 10 2 17 ... ##  $ New - Bags          : num  30 38 1 14 36 22 25 2 23 23 ... ##  $ Existing - Saddles  : num  5 3 17 18 16 12 20 20 14 18 ... ##  $ Existing - Mudguards: num  35 7 0 15 0 4 9 0 36 15 ... ##  $ Existing - Wheels   : num  19 17 18 1 1 18 3 7 17 8 ... ##  $ Existing - Bags     : num  19 14 14 9 26 20 12 6 23 36 ... ##  $ source_wb           : chr  "copy2.xlsx" "copy2.xlsx" "copy2.xlsx" "copy2.xlsx" ... ##  - attr(*, ".internal.selfref")=<externalptr> 

So there you are, several methods of importing multiple excel files, take your pick, I have to say, the rio function is rather sweet and might be the one I use in future.

purrrr’s map_dfr is also pretty cool, once you realise that the additional arguments to the read_excel function don’t go inside the brackets of the call to read_excel.

Hope this has been useful / helpful to some of you. If so, let me know.

The source files are here if you want to try this yourself

Hey! 我們剛更新了這篇文章，還特別為日本地區的朋友們增添了一些延伸內容哦！如果你想了解更多，歡迎點擊下面的網址來看完整的文章！ https://www.johnmackintosh.net/jp/article/57/excel-merge-method-guide。

Well Well Well my Excel (Last updated: 2025-06-16)

base….how low can you go?

Purrr

for loop and data.table’s rbindlist()

rio

Data By John

Error

base….how low can you go?

Purrr

for loop and data.table’s rbindlist()

rio

Templates (for web app):

Error