R has no consistent table class

And neither do Python and Rust

The usual case

R, in addition to being array-based, can also be table-based: it has a table class in the base language, data.frame. This is great, because a lot of data comes in table form.

Here are some simple examples:

twocols <- data.frame(
  a = rep(1:3, 4),
  b = rep(1:2, 6)
)
twocols
##    a b
## 1  1 1
## 2  2 2
## 3  3 1
## 4  1 2
## 5  2 1
## 6  3 2
## 7  1 1
## 8  2 2
## 9  3 1
## 10 1 2
## 11 2 1
## 12 3 2
onecol <- data.frame(
  a = rep(1, 5)
)
onecol
##   a
## 1 1
## 2 1
## 3 1
## 4 1
## 5 1

One thing we can do with these tables is to look for, or remove, duplicate rows:

duplicated(twocols)
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
duplicated(onecol)
## [1] FALSE  TRUE  TRUE  TRUE  TRUE
unique(twocols)
##   a b
## 1 1 1
## 2 2 2
## 3 3 1
## 4 1 2
## 5 2 1
## 6 3 2
unique(onecol)
##   a
## 1 1

Simple enough, right? Right.

The edge case

Now let’s try this with a data frame with no columns. This is something that R allows, so this should work as expected.

nocol <- onecol[, FALSE, drop = FALSE]
nocol
## data frame with 0 columns and 5 rows

Now, what do we expect to happen when we look for duplicates? Well, every row is the same, so every row after the first is a duplicate, so unique should leave a single row. The fact that the rows contain no data is irrelevant. What actually happens?

duplicated(nocol)
## logical(0)

duplicated returns a zero-length vector, as if there were no rows. This results in R claiming there are no rows after removing duplicates:

unique(nocol)
## data frame with 0 columns and 0 rows

Uh oh. How about if we only have the one row to begin with?

nocol_onerow <- nocol[1, , drop = FALSE]
nocol_onerow
## data frame with 0 columns and 1 row
duplicated(nocol_onerow)
## logical(0)
unique(nocol_onerow)
## data frame with 0 columns and 0 rows

Oh, dear.

Why this matters

In practice, a table with no columns is not going to turn up much, so you could argue that this doesn’t matter. However, it should matter, if nothing else, for reasons of consistency: if we’re working programmatically, we have no idea what dimension of table we’re working with.

In fact, I’ve run into this problem multiple times when writing the autodb package for decomposing a data table into a partially-normalised database.

A database is composed of several relations, which are tables with some additional information. One piece of additional information is the relation’s (candidate) keys, which are sets of the columns that, together, uniquely determine the rows. Each row has a unique set of values for the key’s columns; vice versa, knowing the values for the key’s columns determines which row we’re looking at.

When turning a table of real data into a database, you can get a relation with an empty key. This happens when a column has the same value in every row: its value is constant, and determinable with no information. Such a relation can only have 0 or 1 rows, since an empty key can’t distinguish between multiple rows.

There are a few reasons an empty key is a problem in R, given how we saw its data frames deal with this case, but let’s take the example where we’re checking that a given database is valid. One thing we need to check is that the columns in each key of a relation have unique values over its rows.

For example, suppose x above has both of its columns as its sole key. Does the key have unique values over its rows? No, because there are duplicates:

twocols_key <- c("a", "b")
anyDuplicated((twocols[, twocols_key, drop = FALSE])) # returns 0 if unique
## [1] 7

However, removing the duplicates makes the key values unique:

anyDuplicated(unique(twocols)[, twocols_key, drop = FALSE])
## [1] 0

Now, let’s try validating a valid table with an empty key, which can only have 0 or 1 rows:

v <- data.frame(a = 1L, b = 2L, c = FALSE)
v
##   a b     c
## 1 1 2 FALSE
v_key <- character()
anyDuplicated(v[, v_key, drop = FALSE]) # the right answer...
## [1] 0
duplicated(v[, v_key, drop = FALSE]) # ... for the wrong reason
## logical(0)

How about if that table invalidly has multiple rows?

u <- data.frame(a = c(1L, 2L), b = c(2L, 3L), c = c(FALSE, TRUE))
u
##   a b     c
## 1 1 2 FALSE
## 2 2 3  TRUE
u_key <- character()
anyDuplicated(u[, u_key, drop = FALSE]) # the wrong answer...
## [1] 0
duplicated(u[, u_key, drop = FALSE]) # ... for the wrong reason
## logical(0)

This shows that we can run into this problem, even when dealing with realistic data. This is clearly a problem when writing a library that models databases! I end up having to write nasty code like this:

dups <- if (length(u_key) == 0) {
  if (nrow(u) == 0)
    logical() # length 0 boolean vector
  else
    c(FALSE, rep(TRUE, nrow(u) - 1))
}else
  duplicated(u[, u_key, drop = FALSE])
dups
## [1] FALSE  TRUE
u[dups, , drop = FALSE]
##   a b    c
## 2 2 3 TRUE

Tibbles are inconsistent

OK, R’s base data.frame class is inconsistent, but people also like to use the tibble and data.table classes instead, from their eponymous libraries. Do they do any better?

Here’s tibble:

library(tibble)
nocol_tib <- as_tibble(nocol) # should be 5x0
nocol_tib
## # A tibble: 5 × 0
nocol_onerow_tib <- as_tibble(nocol_onerow) # should be 1x0
nocol_onerow_tib
## # A tibble: 1 × 0

The row counts are preserved, as before.

duplicated(nocol_tib)
## logical(0)
try(unique(nocol_tib))
## Error in x[!duplicated(x, fromLast = fromLast, ...), , drop = FALSE] : 
##   Can't subset rows with `!duplicated(x, fromLast = fromLast, ...)`.
## ✖ Logical subscript `!duplicated(x, fromLast = fromLast, ...)` must be size 1 or 5, not 0.
duplicated(nocol_onerow_tib)
## logical(0)
try(unique(nocol_onerow_tib))
## Error in x[!duplicated(x, fromLast = fromLast, ...), , drop = FALSE] : 
##   Can't subset rows with `!duplicated(x, fromLast = fromLast, ...)`.
## ✖ Logical subscript `!duplicated(x, fromLast = fromLast, ...)` must be size 1 or 1, not 0.

Asking for unique rows, however, returns an error. That’s no good, although it’s probably better than the base data frames silently doing the wrong thing.

Data tables are inconsistent

How about data.table?

library(data.table)
nocol_dt <- as.data.table(nocol) # should be 5x0
nocol_dt
## Null data.table (0 rows and 0 cols)
nocol_onerow_dt <- as.data.table(nocol_onerow) # should be 1x0
nocol_onerow_dt
## Null data.table (0 rows and 0 cols)

As much as I like data.table over tibble, this is even worse: the rows are all dropped on conversion. Creating the table directly as a data.table, instead of converting from a data.frame, makes no difference.

Arrow tables are inconsistent

Another table class is arrow, which is an interface for Apache’s Arrow C++ library. How does arrow do?

library(arrow)
## 
## Attaching package: 'arrow'
## The following object is masked from 'package:utils':
## 
##     timestamp
nocol_arw <- as_arrow_table(nocol)
nocol_arw
## Table
## 0 rows x 0 columns
## 
## 
## See $metadata for additional Schema metadata
nocol_onerow_arw <- as_arrow_table(nocol_onerow)
nocol_onerow_arw
## Table
## 0 rows x 0 columns
## 
## 
## See $metadata for additional Schema metadata

Not well: all rows are dropped, as they were for data.table.

try(duplicated(nocol_arw))
## Error in duplicated.default(nocol_arw) : 
##   duplicated() applies only to vectors
unique(nocol_arw)
## Table (query)
## 
## 
## See $.data for the source Arrow object
try(duplicated(nocol_onerow_arw))
## Error in duplicated.default(nocol_onerow_arw) : 
##   duplicated() applies only to vectors
unique(nocol_onerow_arw)
## Table (query)
## 
## 
## See $.data for the source Arrow object

Furthermore, duplicated, can’t be used at all, because there’s no duplicated method for Arrow tables, only one for unique.

Files and file-driven table classes aren’t consistent either

We’ve looked at table classes within an R session. How do file formats do for read/write operations handling zero columns properly?

We look at four formats here1:

  • csv, handled with basic R read/write functions, and the parquetize and vroom packages;
  • feather, handled with the feather and arrow packages;
  • fst, handled with the fst package;
  • parquet, handled with the parquetize and arrow packages.

Since parquet should be able to read from several file formats, we check this one as we go.2

The basic issue is that writing a zero-column data frame to a CSV file results in something that can’t be parsed properly:

tf <- tempfile()
write.csv(nocol, tf, row.names = FALSE)
readLines(tf)
## [1] "\"\"" ""     ""     ""     ""     ""
try(read.csv(tf, row.names = FALSE))
## Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
##   first five rows are empty: giving up

parquet and vroom don’t have much better luck writing and reading it:

tf_parquet <- tempfile() # for writing parquet files via parquetize
tf_parquet_arrow <- tempfile() # for writing parquet files via arrow
try(parquetize::csv_to_parquet(tf, path_to_parquet = tf_parquet))
## Reading data...
## Error : Could not guess the delimiter.
## 
## Use `vroom(delim =)` to specify one explicitly.
vroom::vroom(tf, delim = ",")
## New names:
## Rows: 0 Columns: 1
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
## # A tibble: 0 × 1
## # ℹ 1 variable: ...1 <chr>

I wasn’t expecting any option to turn a \(5 \times 0\) table into a \(0 \times 1\) table, but there it is.

Writing the row names doesn’t improve matters much:

tf_rn <- tempfile()
write.csv(nocol, tf_rn, row.names = TRUE)
readLines(tf_rn)
## [1] "\"\""   "\"1\"," "\"2\"," "\"3\"," "\"4\"," "\"5\","
try(read.csv(tf_rn, row.names = TRUE))
## Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
##   more columns than column names
try(parquetize::csv_to_parquet(tf_rn, path_to_parquet = tf_parquet))
## Reading data...
## Error : Could not guess the delimiter.
## 
## Use `vroom(delim =)` to specify one explicitly.
vroom_tf_rn <- vroom::vroom(tf_rn, delim = ",")
## New names:
## Rows: 5 Columns: 1
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," num
## (1): ...1
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
subset(vroom::problems(vroom_tf_rn), , -file)
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## # A tibble: 5 × 4
##     row   col expected  actual   
##   <int> <int> <chr>     <chr>    
## 1     2     2 1 columns 2 columns
## 2     3     2 1 columns 2 columns
## 3     4     2 1 columns 2 columns
## 4     5     2 1 columns 2 columns
## 5     6     2 1 columns 2 columns
vroom_tf_rn
## # A tibble: 5 × 1
##    ...1
##   <dbl>
## 1     1
## 2     2
## 3     3
## 4     4
## 5     5

vroom now returns a \(5 \times 1\) table, where the row names are misread as the single column’s values.

How about feather?

tf_feather <- tempfile()
tf_feather_arrow <- tempfile()
feather::write_feather(nocol, tf_feather)
feather::read_feather(tf_feather)
## # A tibble: 5 × 0
arrow::write_feather(nocol, tf_feather_arrow)
arrow::read_feather(tf_feather_arrow)
## # A tibble: 0 × 0
try(feather::read_feather(tf_feather_arrow))
## Error in eval(expr, envir, enclos) : Invalid: Not a feather file
arrow::read_feather(tf_feather)
## # A tibble: 5 × 0

We have slightly better luck here, depending on which package we use to handle Feather files.

We’re not so lucky with fst:

tf_fst <- tempfile()
fst::write_fst(nocol, tf_fst)
fst::read_fst(tf_fst)
## data frame with 0 columns and 0 rows
arrow::write_parquet(nocol, tf_parquet_arrow)
arrow::read_parquet(tf_parquet_arrow)
## # A tibble: 0 × 0

Let’s summarise everything done above for the \(5 \times 0\) table nocol:

format rows cols
vroom w/o rownames 0 1
vroom w/ rownames 5 1
feather (write feather, read feather) 5 0
feather (write arrow, read arrow) 0 0
feather (write feather, read arrow) 5 0
fst 0 0
parquet 0 0

The only library that gives the correct dimensions here is feather as the file writer. However, it’s only correct when writing the file using the original feather package: that package hasn’t been updated since 2019, since the format was integrated into Apache Arrow and it was integrated into arrow, so there are no maintained packages that get this right.

Pandas is no better

Come on, we can’t make it look like Python is preferable.

import pandas as pd

A 2x1 table works as expected:

py_onecol = pd.DataFrame(data = {'a': [1, 1]})
py_onecol
##    a
## 0  1
## 1  1
py_onecol.duplicated()
## 0    False
## 1     True
## dtype: bool
py_onecol.drop_duplicates()
##    a
## 0  1

But now let’s remove the only column:

py_nocol = py_onecol.iloc[[0, 1], []]
py_nocol
## Empty DataFrame
## Columns: []
## Index: [0, 1]
py_nocol.duplicated()
## Series([], dtype: bool)
py_nocol.drop_duplicates()
## Empty DataFrame
## Columns: []
## Index: [0, 1]

Like data.table, this treats the table as empty.

duplicated and drop_duplicates take a subset of columns to check, so we could use this for uniqueness checks by taking the subset as our key. What if the table has a non-zero number of columns, but the key is empty?

py_onecol2 = pd.DataFrame(data = {'a': [1, 2]})
py_onecol2
##    a
## 0  1
## 1  2
try: py_onecol2.duplicated(subset = [])
except Exception as e: print(e)
## not enough values to unpack (expected 2, got 0)
try: py_onecol2.drop_duplicates(subset = [])
except Exception as e: print(e)
## not enough values to unpack (expected 2, got 0)

Well, that’s no good either.

Rust’s polars is no better

As it turns out, we can call Rust’s polars library for data frames from R:

library(polars)
ps <- pl$DataFrame(a = 1:5)
ps
## shape: (5, 1)
## ┌─────┐
## │ a   │
## │ --- │
## │ i32 │
## ╞═════╡
## │ 1   │
## │ 2   │
## │ 3   │
## │ 4   │
## │ 5   │
## └─────┘
rownames(ps)
## [1] "1" "2" "3" "4" "5"
ps$select()
## shape: (0, 0)
## ┌┐
## ╞╡
## └┘
rownames(ps$select())
## character(0)

This is the same behaviour as that of data.table – not surprising, since polars is inspired by pandas – so even the Rustaceans don’t get this right.

Why it’s like this, and possible fixes

I can’t say for the other implementations, but let’s look at base R’s code for duplicated.data.frame:

duplicated.data.frame
## function (x, incomparables = FALSE, fromLast = FALSE, ...) 
## {
##     if (!isFALSE(incomparables)) 
##         .NotYetUsed("incomparables != FALSE")
##     if (length(x) != 1L) {
##         if (any(i <- vapply(x, is.factor, NA))) 
##             x[i] <- lapply(x[i], as.numeric)
##         if (any(i <- (lengths(lapply(x, dim)) == 2L))) 
##             x[i] <- lapply(x[i], split.data.frame, seq_len(nrow(x)))
##         duplicated(do.call(Map, `names<-`(c(list, x), NULL)), 
##             fromLast = fromLast)
##     }
##     else duplicated(x[[1L]], fromLast = fromLast, ...)
## }
## <bytecode: 0x00000220520238c0>
## <environment: namespace:base>

Here we see an approach for looking for duplicate columns that I’ve used directly before: use Map(list, x), after some tidying of x, to return a list of rows, where each row is given as the list of its values. Effectively, we take the column-based data frame format, and turn it inside out to get a row-based format. We then check whether these rows are duplicated, using duplicated’s default method, so we’re comparing list elements instead of several columns at once.

This is a reasonable approach if we have at least one column. What happens if we try this conversion with no columns?

Data frames are stored as a list, with each element giving a column’s values, and the elements having to be the same length. If there are no columns, this list is empty:

unclass(nocol)
## named list()
## attr(,"row.names")
## [1] 1 2 3 4 5

Therefore, Map(list, z) returns an empty list, rather than a list of empty row lists:

Map(list, nocol)
## named list()

When we pass this into duplicated, of course, we get a zero-length logical vector.

I don’t think there’s much that can be done about this, outside of changing how data frames are stored. It’s a strange situation where only the row names preserve the row count. If we make a copy where they’re removed, as done in data.table, then this information is lost:

a <- nocol
attr(a, "row.names") <- NULL # skips `row.names<-` sanity checks
a
## data frame with 0 columns and 0 rows
unclass(a)
## named list()

In turn, this information is only kept because, when asked for a data frame’s row count, R uses the row names:

nrow
## function (x) 
## dim(x)[1L]
## <bytecode: 0x0000022051a4ae50>
## <environment: namespace:base>
dim.data.frame
## function (x) 
## c(.row_names_info(x, 2L), length(x))
## <bytecode: 0x0000022053099f50>
## <environment: namespace:base>

Effectively, the row names are used like a “header”, but for the rows instead of the columns. This is probably why, if you try to remove them with something like row.names(a) <- NULL, R immediately adds integer row names as replacements: removing row names completely would break the information about the table’s size, in a way that removing the column names can’t. We can see this with tables that have columns, too:

b <- data.frame(a = 1:4, b = 2:3)
dim(b)
## [1] 4 2
attr(b, "row.names") <- NULL # attr lets us treat classes as mere suggestions
dim(b) # R now thinks there are 0 rows...
## [1] 0 2
unclass(b) # ... but the data's still there
## $a
## [1] 1 2 3 4
## 
## $b
## [1] 2 3 2 3
length(b$a)
## [1] 4

This means that we could fix duplicated.data.frame by having it make use of the row names. What would such an implementation of duplicated for tables look like? Writing it in a way that’s agnostic to the number of columns is easy enough, but might be inefficient:

duplicated2 <- function(x, incomparables = FALSE, fromLast = FALSE, ...) {
  UseMethod("duplicated2")
}
duplicated2.data.frame <- function(x, incomparables = FALSE, fromLast = FALSE, ...) {
  if (!isFALSE(incomparables)) 
    .NotYetUsed("incomparables != FALSE")
  if (any(i <- vapply(x, is.factor, NA))) 
    x[i] <- lapply(x[i], as.numeric)
  lst <- lapply(
    seq_along(row.names(x)),
    function(row) `rownames<-`(`names<-`(x[row, , drop = TRUE], NULL), NULL)
  )
  duplicated(lst, fromLast = fromLast, ...)
}
duplicated2(nocol)
## [1] FALSE  TRUE  TRUE  TRUE  TRUE
duplicated2(nocol_onerow)
## [1] FALSE
duplicated2(twocols)
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
duplicated2(onecol)
## [1] FALSE  TRUE  TRUE  TRUE  TRUE

Maybe it’s just better to add a second explicit edge case, so we’re not relying on nrow using the row names:

duplicated3 <- function(x, incomparables = FALSE, fromLast = FALSE, ...) {
  UseMethod("duplicated3")
}
duplicated3.data.frame <- function(x, incomparables = FALSE, fromLast = FALSE, ...) {
    if (!isFALSE(incomparables)) 
        .NotYetUsed("incomparables != FALSE")
    if (length(x) == 0L) {
      nr <- nrow(x)
      if (nr == 0L)
        return(logical())
      else
        return(c(FALSE, rep_len(TRUE, nr - 1L)))
    }
    if (length(x) != 1L) {
        if (any(i <- vapply(x, is.factor, NA))) 
            x[i] <- lapply(x[i], as.numeric)
        duplicated(do.call(Map, `names<-`(c(list, x), NULL)), 
            fromLast = fromLast)
    }
    else duplicated(x[[1L]], fromLast = fromLast, ...)}
duplicated3(nocol)
## [1] FALSE  TRUE  TRUE  TRUE  TRUE
duplicated3(nocol_onerow)
## [1] FALSE
duplicated3(twocols)
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
duplicated3(onecol)
## [1] FALSE  TRUE  TRUE  TRUE  TRUE

It looks like it would also be quicker, at least for small tables like these:

microbenchmark::microbenchmark(
  duplicated2(nocol),
  duplicated3(nocol),
  times = 1000,
  check = "identical"
)
## Unit: microseconds
##                expr  min    lq     mean median     uq    max neval cld
##  duplicated2(nocol) 88.4 100.1 144.0731 109.55 185.45 4879.2  1000   b
##  duplicated3(nocol)  5.0   5.8   8.4798   6.20  10.30   67.0  1000  a
microbenchmark::microbenchmark(
  duplicated2(nocol_onerow),
  duplicated3(nocol_onerow),
  times = 1000,
  check = "identical"
)
## Unit: microseconds
##                       expr  min    lq    mean median    uq   max neval cld
##  duplicated2(nocol_onerow) 28.3 30.95 49.5871  40.85 58.65 407.6  1000   b
##  duplicated3(nocol_onerow)  4.9  5.30  8.3830   6.20 10.00 122.2  1000  a
microbenchmark::microbenchmark(
  duplicated(twocols),
  duplicated2(twocols),
  duplicated3(twocols),
  times = 1000,
  check = "identical"
)
## Unit: microseconds
##                  expr   min    lq     mean median     uq    max neval cld
##   duplicated(twocols)  25.8  29.9  42.3624   32.7  43.10  271.5  1000  a 
##  duplicated2(twocols) 184.2 206.9 299.8286  228.2 320.25 5932.2  1000   b
##  duplicated3(twocols)  21.5  24.6  35.2825   27.2  36.60  349.5  1000  a
microbenchmark::microbenchmark(
  duplicated(onecol),
  duplicated2(onecol),
  duplicated3(onecol),
  times = 1000,
  check = "identical"
)
## Unit: microseconds
##                 expr  min   lq    mean median    uq   max neval cld
##   duplicated(onecol)  8.5 10.0 12.7721  10.60 14.65 112.9  1000  a 
##  duplicated2(onecol) 64.3 74.6 93.9061  78.15 97.75 340.2  1000   b
##  duplicated3(onecol)  9.0 10.7 13.2581  11.20 13.30  54.3  1000  a

For autodb classes, I’ll probably be writing something like duplicated3 for internal use, so I don’t have this edge case all over the code any more.

Environment used

R session information:

sessionInfo()
## R version 4.3.2 (2023-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 11 x64 (build 22631)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United Kingdom.utf8 
## [2] LC_CTYPE=English_United Kingdom.utf8   
## [3] LC_MONETARY=English_United Kingdom.utf8
## [4] LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.utf8    
## 
## time zone: Europe/London
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] polars_0.15.1.9000 fstcore_0.9.12     arrow_14.0.0.2     data.table_1.14.2 
## [5] tibble_3.2.1      
## 
## loaded via a namespace (and not attached):
##  [1] xfun_0.35            bslib_0.3.1          lattice_0.21-9      
##  [4] tzdb_0.4.0           vctrs_0.6.4          tools_4.3.2         
##  [7] generics_0.1.2       sandwich_3.0-2       curl_4.3.2          
## [10] parallel_4.3.2       fansi_1.0.2          RSQLite_2.3.4       
## [13] highr_0.9            blob_1.2.2           pkgconfig_2.0.3     
## [16] Matrix_1.6-1.1       assertthat_0.2.1     lifecycle_1.0.3     
## [19] compiler_4.3.2       stringr_1.4.0        microbenchmark_1.4.9
## [22] codetools_0.2-19     fst_0.9.8            htmltools_0.5.2     
## [25] sass_0.4.0           yaml_2.2.1           pillar_1.9.0        
## [28] crayon_1.4.2         jquerylib_0.1.4      MASS_7.3-60         
## [31] ellipsis_0.3.2       cachem_1.0.6         feather_0.3.5       
## [34] multcomp_1.4-19      tidyselect_1.2.0     digest_0.6.33       
## [37] mvtnorm_1.1-3        stringi_1.7.6        dplyr_1.1.3         
## [40] purrr_1.0.2          bookdown_0.24        splines_4.3.2       
## [43] forcats_0.5.1        rprojroot_2.0.2      fastmap_1.1.0       
## [46] grid_4.3.2           here_1.0.1           cli_3.6.1           
## [49] magrittr_2.0.3       survival_3.5-7       utf8_1.2.2          
## [52] TH.data_1.1-1        readr_2.1.1          withr_2.5.0         
## [55] bit64_4.0.5          parquetize_0.5.6.1   rmarkdown_2.18      
## [58] bit_4.0.4            reticulate_1.35.0    blogdown_1.16       
## [61] zoo_1.8-9            png_0.1-7            hms_1.1.1           
## [64] memoise_2.0.1        evaluate_0.23        knitr_1.41          
## [67] haven_2.4.3          rlang_1.1.1          Rcpp_1.0.8          
## [70] glue_1.6.2           DBI_1.1.2            rstudioapi_0.13     
## [73] vroom_1.6.5          jsonlite_1.8.7       R6_2.5.1

Python version (a little old, but installing/updating things in Python is so awful I don’t want to do it again):

import sys
print(sys.version)
## 3.12.2 (tags/v3.12.2:6abddd9, Feb  6 2024, 21:26:36) [MSC v.1937 64 bit (AMD64)]

  1. We don’t consider data.table’s fread and fwrite here, since we know that the data.table format can’t handle zero columns anyway.↩︎

  2. Feather and Parquet files can also be read with the arrow package, with the same result, since both are integrated with Apache Arrow.↩︎

Avatar
Mark Webster
Data Scientist

Probability and Statistics, with some programming in R.

Related