Why UpsetR Ignores order.by When keep.order Is True and How to Fix It

Summary

UpsetR orders sets by the total size only when order.by = "freq" and keep.order = FALSE.
When you force a custom sets vector and set keep.order = TRUE, the library respects the supplied order — it does not re‑sort by frequency.

Root Cause

  • keep.order = TRUE tells UpsetR to preserve the order of the sets argument.
  • order.by = "freq" is ignored in that mode, so the plot shows the sets exactly as you listed them.
  • The decreasing argument does not exist for UpsetR; sorting direction is always descending when order.by = "freq".

Why This Happens in Real Systems

  • Developers often combine “I want a specific subset of sets” (sets = …) with “I want them sorted by size” (order.by = "freq").
  • UpsetR’s API was designed to treat these as mutually exclusive: either you specify the order, or you let the library determine it.

Real-World Impact

  • Mis‑interpreted visualizations: the biggest sets may appear lower in the bar chart, leading analysts to draw wrong conclusions.
  • Extra debugging time spent hunting for a non‑existent decreasing argument.
  • Scripts become fragile; a future change that adds more sets will silently break the intended ordering.

Example or Code (if necessary and relevant)

set.seed(123)
modality_matrix_upset <- data.frame(
  "AB (10000)" = sample(0:1, 200, replace = TRUE),
  "ND (9000)"  = sample(0:1, 200, replace = TRUE),
  "US (8000)"  = sample(0:1, 200, replace = TRUE),
  "CA (7500)"  = sample(0:1, 200, replace = TRUE),
  "RF (7000)"  = sample(0:1, 200, replace = TRUE),
  "XA (6500)"  = sample(0:1, 200, replace = TRUE),
  "NM (5000)"  = sample(0:1, 200, replace = TRUE)
)

# Let UpsetR decide the order (sorted by total frequency)
UpSetR::upset(
  modality_matrix_upset,
  nsets = 7,
  order.by = "freq",
  keep.order = FALSE,
  number.angles = 30,
  mainbar.y.label = "Number of school",
  sets.x.label = "Number of participant schools"
)

# If you need a **specific subset** but still want it sorted,
# compute the order yourself and pass the reordered vector:
set_names <- colnames(modality_matrix_upset)
set_sizes <- colSums(modality_matrix_upset)
ordered <- set_names[order(set_sizes, decreasing = TRUE)][1:7]

UpSetR::upset(
  modality_matrix_upset,
  nsets = 7,
  sets = ordered,
  keep.order = TRUE,
  number.angles = 30,
  mainbar.y.label = "Number of school",
  sets.x.label = "Number of participant schools"
)

How Senior Engineers Fix It

  • Do not set keep.order = TRUE when you also want automatic sorting.
  • Use order.by = "freq" alone, or compute the desired order manually (as shown above) and feed it via sets.
  • Remove the nonexistent decreasing argument; sorting direction is inherent to order.by = "freq".
  • Verify the plot by checking that colSums(modality_matrix_upset) matches the bar heights.

Why Juniors Miss It

  • They assume that keep.order = TRUE merely keeps the default sort order, not that it locks the order to the supplied vector.
  • They look for a decreasing flag because many base R functions use it, overlooking that UpsetR’s API is different.
  • Lack of familiarity with the library’s documentation leads to conflicting arguments being passed without noticing they cancel each other out.

Leave a Comment