Summary
The question revolves around obtaining cluster-robust standard errors by subject ID after performing generalized full matching using MatchIt and estimating a risk ratio with a weighted Poisson model, followed by computing average comparisons with marginaleffects::avg_comparisons(). The key challenge is determining the correct way to specify clustering in avg_comparisons() to respect the matching structure and weights produced by MatchIt.
Root Cause
The root cause of the issue is the non-independence of observations due to multiple records per subject, which necessitates the use of cluster-robust standard errors. The main causes include:
- Observations are not independent
- Each record belongs to a higher-level unit (subject)
- Multiple records per subject are stored in an ID column (
id_cluster)
Why This Happens in Real Systems
This issue occurs in real systems because:
- Observational datasets often have hierarchical or clustered structures
- Matching methods like
MatchItare used to balance covariates, but may not account for clustering - Weighted models can be sensitive to clustering, requiring cluster-robust standard errors for valid inference
Real-World Impact
The real-world impact of this issue includes:
- Biased standard errors if clustering is not accounted for
- Incorrect inference about treatment effects or risk ratios
- Over- or under-estimation of the true effect size
Example or Code
library(MatchIt)
library(marginaleffects)
library(sandwich)
# Perform matching
m.out <- matchit(treat ~ t_year + cov1 + cov2 + cov3 + cov4 + cov5 + cov6 + cov7 + cov8,
data = df, method = "quick", estimand = "ATE")
# Create matched data
m.data <- match.data(m.out)
# Fit weighted Poisson model
m1.fit <- glm(y ~ treat + t_year + cov1 + cov2 + cov3 + cov4 + cov5 + cov6 + cov7 + cov8,
data = m.data, weights = weights, family = poisson(link = "log"))
# Compute cluster-robust standard errors using sandwich package
vcov_cluster <- vcovCL(m1.fit, cluster = m.data$id_cluster)
# Pass precomputed vcov matrix to avg_comparisons()
avg_comparisons(m1.fit, variables = "treat", type = "response", comparison = "ratioavg",
hypothesis = 1, wts = "weights", vcov = vcov_cluster)
How Senior Engineers Fix It
Senior engineers fix this issue by:
- Using cluster-robust standard errors to account for non-independence of observations
- Specifying the correct clustering variable (in this case,
id_cluster) - Using external variance estimators like
sandwichorclubSandwichto compute cluster-robust standard errors - Passing precomputed vcov matrices to
avg_comparisons()to ensure valid inference
Why Juniors Miss It
Juniors may miss this issue because:
- Lack of experience with observational datasets and matching methods
- Insufficient understanding of cluster-robust standard errors and their importance
- Over-reliance on default settings or assumptions in statistical software
- Failure to consider the hierarchical or clustered structure of the data