MatchIt full matching + marginaleffects: cluster-robust SE by subject ID after matching

Summary

The question revolves around obtaining cluster-robust standard errors by subject ID after performing generalized full matching using MatchIt and estimating a risk ratio with a weighted Poisson model, followed by computing average comparisons with marginaleffects::avg_comparisons(). The key challenge is determining the correct way to specify clustering in avg_comparisons() to respect the matching structure and weights produced by MatchIt.

Root Cause

The root cause of the issue is the non-independence of observations due to multiple records per subject, which necessitates the use of cluster-robust standard errors. The main causes include:

  • Observations are not independent
  • Each record belongs to a higher-level unit (subject)
  • Multiple records per subject are stored in an ID column (id_cluster)

Why This Happens in Real Systems

This issue occurs in real systems because:

  • Observational datasets often have hierarchical or clustered structures
  • Matching methods like MatchIt are used to balance covariates, but may not account for clustering
  • Weighted models can be sensitive to clustering, requiring cluster-robust standard errors for valid inference

Real-World Impact

The real-world impact of this issue includes:

  • Biased standard errors if clustering is not accounted for
  • Incorrect inference about treatment effects or risk ratios
  • Over- or under-estimation of the true effect size

Example or Code

library(MatchIt)
library(marginaleffects)
library(sandwich)

# Perform matching
m.out <- matchit(treat ~ t_year + cov1 + cov2 + cov3 + cov4 + cov5 + cov6 + cov7 + cov8, 
                 data = df, method = "quick", estimand = "ATE")

# Create matched data
m.data <- match.data(m.out)

# Fit weighted Poisson model
m1.fit <- glm(y ~ treat + t_year + cov1 + cov2 + cov3 + cov4 + cov5 + cov6 + cov7 + cov8, 
              data = m.data, weights = weights, family = poisson(link = "log"))

# Compute cluster-robust standard errors using sandwich package
vcov_cluster <- vcovCL(m1.fit, cluster = m.data$id_cluster)

# Pass precomputed vcov matrix to avg_comparisons()
avg_comparisons(m1.fit, variables = "treat", type = "response", comparison = "ratioavg", 
                hypothesis = 1, wts = "weights", vcov = vcov_cluster)

How Senior Engineers Fix It

Senior engineers fix this issue by:

  • Using cluster-robust standard errors to account for non-independence of observations
  • Specifying the correct clustering variable (in this case, id_cluster)
  • Using external variance estimators like sandwich or clubSandwich to compute cluster-robust standard errors
  • Passing precomputed vcov matrices to avg_comparisons() to ensure valid inference

Why Juniors Miss It

Juniors may miss this issue because:

  • Lack of experience with observational datasets and matching methods
  • Insufficient understanding of cluster-robust standard errors and their importance
  • Over-reliance on default settings or assumptions in statistical software
  • Failure to consider the hierarchical or clustered structure of the data

Leave a Comment