MatchIt full matching + marginaleffects: cluster-robust SE by subject ID after matching

Summary

The question revolves around obtaining cluster-robust standard errors by subject ID after performing generalized full matching using MatchIt and estimating a risk ratio with a weighted Poisson model, followed by computing average comparisons with marginaleffects::avg_comparisons(). The key challenge is determining the correct way to specify clustering in avg_comparisons() to respect the matching structure and weights produced by MatchIt.

Root Cause

The root cause of the issue is the non-independence of observations due to multiple records per subject, which necessitates the use of cluster-robust standard errors. The main causes include:

Observations are not independent
Each record belongs to a higher-level unit (subject)
Multiple records per subject are stored in an ID column (id_cluster)

Why This Happens in Real Systems

This issue occurs in real systems because:

Observational datasets often have hierarchical or clustered structures
Matching methods like MatchIt are used to balance covariates, but may not account for clustering
Weighted models can be sensitive to clustering, requiring cluster-robust standard errors for valid inference

Real-World Impact

The real-world impact of this issue includes:

Biased standard errors if clustering is not accounted for
Incorrect inference about treatment effects or risk ratios
Over- or under-estimation of the true effect size

Example or Code

library(MatchIt)
library(marginaleffects)
library(sandwich)

# Perform matching
m.out <- matchit(treat ~ t_year + cov1 + cov2 + cov3 + cov4 + cov5 + cov6 + cov7 + cov8, 
                 data = df, method = "quick", estimand = "ATE")

# Create matched data
m.data <- match.data(m.out)

# Fit weighted Poisson model
m1.fit <- glm(y ~ treat + t_year + cov1 + cov2 + cov3 + cov4 + cov5 + cov6 + cov7 + cov8, 
              data = m.data, weights = weights, family = poisson(link = "log"))

# Compute cluster-robust standard errors using sandwich package
vcov_cluster <- vcovCL(m1.fit, cluster = m.data$id_cluster)

# Pass precomputed vcov matrix to avg_comparisons()
avg_comparisons(m1.fit, variables = "treat", type = "response", comparison = "ratioavg", 
                hypothesis = 1, wts = "weights", vcov = vcov_cluster)

How Senior Engineers Fix It

Senior engineers fix this issue by:

Using cluster-robust standard errors to account for non-independence of observations
Specifying the correct clustering variable (in this case, id_cluster)
Using external variance estimators like sandwich or clubSandwich to compute cluster-robust standard errors
Passing precomputed vcov matrices to avg_comparisons() to ensure valid inference

Why Juniors Miss It

Juniors may miss this issue because:

Lack of experience with observational datasets and matching methods
Insufficient understanding of cluster-robust standard errors and their importance
Over-reliance on default settings or assumptions in statistical software
Failure to consider the hierarchical or clustered structure of the data