GEE Quick Reference
Fast lookup for syntax, correlation structures, and troubleshooting. For step-by-step learning, see the Walkthrough. For conceptual background, see the Overview.
Jump to: Syntax · Correlation Structures · Extract Output · Model Comparison · Robust SEs · Diagnostics · Parameters · Time Coding · Errors & Fixes · Troubleshooting · Pitfalls · Formulas · Extensions · Resources
Syntax
geepack::geeglm
library(geepack)
# Binary — exchangeable correlation
fit <- geeglm(y ~ time + x, id = id, data = df,
family = binomial, corstr = "exchangeable")
# Binary — AR(1)
fit <- geeglm(y ~ time + x, id = id, data = df,
family = binomial, corstr = "ar1")
# Binary — independence
fit <- geeglm(y ~ time + x, id = id, data = df,
family = binomial, corstr = "independence")
# Binary — unstructured
fit <- geeglm(y ~ time + x, id = id, data = df,
family = binomial, corstr = "unstructured")
# Count — Poisson with exchangeable
fit <- geeglm(y ~ time + x, id = id, data = df,
family = poisson, corstr = "exchangeable")
geeglm requires rows sorted by the id variable. Use arrange(id, time) before fitting. Unsorted data produce wrong results without warning.
Interaction Terms
# Time × treatment
fit <- geeglm(y ~ time * treatment, id = id, data = df,
family = binomial, corstr = "exchangeable")
# Polynomial time
fit <- geeglm(y ~ time + I(time^2) + treatment, id = id, data = df,
family = binomial, corstr = "exchangeable")
Offset (Exposure)
fit <- geeglm(events ~ time + treatment, id = id, data = df,
family = poisson, corstr = "exchangeable",
offset = log(exposure_time))
Correlation Structures
Structure Comparison
| Structure | corstr | Pattern | Parameters | Best for |
|---|---|---|---|---|
| Independence | "independence" | No correlation | 0 | Baseline; weak correlation |
| Exchangeable | "exchangeable" | All pairs equal | 1 | Cluster-like data; default |
| AR(1) | "ar1" | Decay with lag | 1 | Time series; temporal decay |
| Unstructured | "unstructured" | All pairs free | T(T−1)/2 | Few time points; many clusters |
Correlation Matrices
Independence: Exchangeable: AR(1): Unstructured:
[1 0 0 0] [1 α α α] [1 α α² α³] [1 r₁₂ r₁₃ r₁₄]
[0 1 0 0] [α 1 α α] [α 1 α α²] [r₁₂ 1 r₂₃ r₂₄]
[0 0 1 0] [α α 1 α] [α² α 1 α ] [r₁₃ r₂₃ 1 r₃₄]
[0 0 0 1] [α α α 1] [α³ α² α 1 ] [r₁₄ r₂₄ r₃₄ 1 ]
Parameter Count by Time Points
| T (waves) | Independence | Exchangeable | AR(1) | Unstructured |
|---|---|---|---|---|
| 3 | 0 | 1 | 1 | 3 |
| 5 | 0 | 1 | 1 | 10 |
| 8 | 0 | 1 | 1 | 28 |
| 10 | 0 | 1 | 1 | 45 |
How to Choose
- Start with exchangeable — safe default
- If temporal decay is plausible, try AR(1)
- Compare using QIC (see below)
- Check robust vs naive SE agreement — large discrepancy = poor fit
- Use unstructured only with ≤ 5 time points and ≥ 100 clusters
Decision Tree
Start here
│
┌───────────┴───────────┐
│ Few time points │
│ (≤ 5) and ≥ 100 │
│ clusters? │
└───────┬───────┬───────┘
Yes │ │ No
▼ ▼
┌───────────┐ ┌─────────────────┐
│ Consider │ │ Exchangeable │
│ Unstruc- │ │ (safe default) │
│ tured │ └────────┬─────────┘
└───────────┘ │
┌────────┴────────┐
│ Expect temporal │
│ decay? │
└───┬─────────┬───┘
Yes │ │ No
▼ ▼
┌──────────┐ ┌──────────────┐
│ Try AR(1)│ │ Stay with │
│ + compare│ │ exchangeable │
│ QIC │ └──────────────┘
└──────────┘
In all cases, compare robust vs naive SEs — large discrepancy means the working correlation is far from reality.
Extract Estimated Correlation
# Correlation parameter(s)
fit$geese$alpha
# Full estimated working correlation matrix
# (not directly available; reconstruct from alpha and corstr)
Extract Output
Coefficients
coef(fit)
summary(fit)$coefficients
Odds Ratios / Incidence Rate Ratios
# ORs (binary) or IRRs (count)
exp(coef(fit))
# With confidence intervals
exp(confint(fit))
# Or manually from robust SEs
beta <- coef(fit)
se <- summary(fit)$coefficients[, "Std.err"]
cbind(OR = exp(beta),
lower = exp(beta - 1.96 * se),
upper = exp(beta + 1.96 * se))
Robust Standard Errors (Default)
# Robust (sandwich) — reported by default in geepack
summary(fit)$coefficients[, "Std.err"]
Naive Standard Errors
# Model-based (naive) — only valid if corstr is correct
sqrt(diag(fit$geese$vbeta.naiv))
Predicted Values
# Marginal predictions
predict(fit, type = "response")
# For new data
predict(fit, newdata = nd, type = "response")
# Link-scale predictions
predict(fit, type = "link")
Working Correlation Parameter
fit$geese$alpha
Dispersion (Scale) Parameter
fit$geese$gamma
Model Comparison (QIC)
QIC (Quasi-likelihood under the Independence model Criterion) is the GEE analogue of AIC. Lower = better.
Computing QIC
# geepack provides QIC
QIC(fit_exch)
QIC(fit_ar1)
QIC(fit_indep)
# Compare in a table
data.frame(
corstr = c("Exchangeable", "AR(1)", "Independence"),
QIC = c(QIC(fit_exch)[1], QIC(fit_ar1)[1], QIC(fit_indep)[1])
)
What QIC Compares
| Comparison | Use QIC for |
|---|---|
| Correlation structures | Same mean model, different corstr |
| Mean model terms | Same corstr, different predictors |
| Across families | Not recommended (different quasi-likelihoods) |
Robust vs. Naive SEs
Quick Comparison
robust <- summary(fit)$coefficients[, "Std.err"]
naive <- sqrt(diag(fit$geese$vbeta.naiv))
data.frame(
term = names(coef(fit)),
robust = round(robust, 4),
naive = round(naive, 4),
ratio = round(robust / naive, 3)
)
Interpreting the Ratio
| Ratio | Meaning |
|---|---|
| 0.9–1.1 | Working correlation is approximately correct |
| 1.1–1.3 | Mild misspecification — robust SEs wider |
| > 1.3 | Substantial misspecification — consider different corstr |
| < 0.9 | Unusual — may indicate small-sample instability |
When to Use Which
| SE Type | When |
|---|---|
| Robust (sandwich) | Always — default choice |
| Naive (model-based) | Only if you're confident the working correlation is correct |
| Bias-corrected robust | < 40 clusters |
Diagnostics
Checklist
| Check | When | How |
|---|---|---|
| Robust vs naive SE ratio | Always | Compare SEs; ratio should be 0.9–1.1 |
| Residual plot | Always | Pearson residuals vs fitted |
| Overdispersion | Count models | Dispersion parameter near 1 |
| Influential clusters | Suspect outliers | Leave-one-cluster-out |
Check Correlation Structure Choice
# Compare robust vs naive SEs (large discrepancy = misspecification)
robust <- summary(fit)$coefficients[, "Std.err"]
naive <- sqrt(diag(fit$geese$vbeta.naiv))
max(abs(robust / naive - 1)) # Should be < 0.2
Residuals
# Pearson residuals
res <- residuals(fit, type = "pearson")
# Plot residuals vs fitted
plot(fitted(fit), res, xlab = "Fitted", ylab = "Pearson Residuals")
abline(h = 0, lty = 2)
Overdispersion (Count Models)
# Dispersion parameter (should be near 1 for Poisson)
fit$geese$gamma
Influence Diagnostics
# Cook's distance analogue — identify influential clusters
# (Not built into geepack; use leave-one-cluster-out)
ids <- unique(df$id)
betas <- matrix(NA, length(ids), length(coef(fit)))
for (i in seq_along(ids)) {
fit_i <- update(fit, data = df[df$id != ids[i], ])
betas[i, ] <- coef(fit_i)
}
# Check for outlier clusters
apply(betas, 2, sd)
Parameters
| Parameter | Interpretation |
|---|---|
| β₀ (intercept) | Marginal log-odds (or log-count) at reference levels |
| β₁ (time) | Marginal change per time unit on link scale |
| βₖ (predictor) | Marginal effect of predictor on link scale |
| α (correlation) | Working correlation parameter(s) |
| φ (scale) | Dispersion parameter |
Time Coding
| Coding | Values (5 waves) | Intercept = |
|---|---|---|
| Wave 1 (default) | 0, 1, 2, 3, 4 | Marginal mean at Wave 1 |
| Midpoint | -2, -1, 0, 1, 2 | Marginal mean at Wave 3 |
| Final wave | -4, -3, -2, -1, 0 | Marginal mean at Wave 5 |
| Actual months | 0, 3, 6, 12, 24 | Marginal mean at baseline |
Slope interpretation is unchanged by centering — only the intercept meaning shifts to the zero-point. On the link scale (logit, log), this also affects the scale at which the intercept's OR or IRR is evaluated.
Common Errors & Fixes
| Issue | Symptom | Fix |
|---|---|---|
| Unsorted data | Wrong results, no warning | arrange(id, time) before fitting |
| Non-convergence | Warning or NaN coefficients | Simplify model; check for separation |
| Singular correlation | Error in correlation update | Reduce T; use simpler structure |
| Too few clusters | Unreliable sandwich SEs | Need 40+ clusters; use bias correction |
| Separation (binary) | Huge coefficients | Remove predictor or collapse categories |
Troubleshooting
Data Not Sorted
The most common GEE mistake. Always sort before fitting:
df <- df %>% arrange(id, time)
Unstructured Won't Converge
Too many time points relative to clusters:
# Switch from unstructured to exchangeable or AR(1)
fit <- geeglm(y ~ time + x, id = id, data = df,
family = binomial, corstr = "exchangeable")
Very Small Clusters
With only 2–3 observations per person, AR(1) may be unstable. Use exchangeable or independence.
Missing Data Handling
Standard GEE assumes MCAR. If you suspect MAR dropout:
# Option 1: Use GLMM instead (handles MAR via FIML)
fit_glmm <- glmer(y ~ time + x + (1 | id), data = df, family = binomial)
# Option 2: Multiple imputation + GEE (requires additional packages)
Small-Sample Bias Correction
With 20–40 clusters, use the Mancl-DeRouen correction. geepack does not natively support this; consider the geesmv package or manual calculation.
Interpretation Pitfalls
| Mistake | Reality |
|---|---|
| "GEE OR = 1.5, so each person's odds increase 50%" | GEE gives marginal (population-averaged) ORs, not individual |
| "The exchangeable correlation is the true correlation" | Working correlation is a nuisance parameter, not a reliable estimate |
| "GEE can handle MAR dropout" | Standard GEE assumes MCAR; use weighted GEE or GLMM for MAR |
| "More time points = more power" | Power comes from clusters (people), not observations per cluster |
| "GEE and GLMM should give the same OR" | They target different quantities; marginal OR < conditional OR |
Quick Formulas
Marginal odds ratio:
OR <- exp(coef(fit)["treatment"])
Marginal incidence rate ratio:
IRR <- exp(coef(fit)["time"])
Confidence interval for OR:
beta <- coef(fit)["treatment"]
se <- summary(fit)$coefficients["treatment", "Std.err"]
exp(beta + c(-1.96, 1.96) * se)
Approximate marginal-to-conditional conversion (logistic):
# Given GLMM random intercept variance sigma2
c_star <- 1 / sqrt(1 + (16 * sqrt(3) / (15 * pi))^2 * sigma2)
marginal_beta <- conditional_beta * c_star
Advanced Extensions
Time-Varying Covariates
fit <- geeglm(y ~ time + current_stress + treatment, id = id,
data = df, family = binomial, corstr = "exchangeable")
Interactions
# Time × group interaction (different trends by group)
fit <- geeglm(y ~ time * treatment, id = id, data = df,
family = binomial, corstr = "exchangeable")
Count Outcomes
fit <- geeglm(count ~ time + treatment, id = id, data = df,
family = poisson, corstr = "exchangeable")
exp(coef(fit)) # IRRs
Multinomial / Ordinal
GEE for ordinal outcomes is available via multgee or ordgee (in geepack):
# Ordinal GEE
fit <- ordgee(ordered(y) ~ time + treatment, id = id, data = df,
corstr = "exchangeable")
Marginal vs. Conditional
| Aspect | GEE (Marginal) | GLMM (Conditional) |
|---|---|---|
| Target | Population average | Subject-specific |
| Coefficients | Marginal effects | Conditional effects |
| OR/IRR magnitude | Smaller (attenuated) | Larger |
| Random effects | Not estimated | Estimated |
| Missing data | MCAR | MAR (FIML) |
| Model comparison | QIC | AIC, BIC, LRT |
| Distributional assumptions | Fewer | More |
| For continuous outcomes | Identical to LMM fixed effects — use LMM instead |
Resources
Books
| Book | Focus |
|---|---|
| Diggle et al. (2002). Analysis of Longitudinal Data. 2nd ed. Oxford. | Foundational text; GEE and mixed models |
| Fitzmaurice et al. (2011). Applied Longitudinal Analysis. 2nd ed. Wiley. | Excellent applied coverage of GEE |
| Hardin & Hilbe (2013). Generalized Estimating Equations. 2nd ed. CRC Press. | Dedicated GEE treatment |
Key Articles
| Article | Contribution |
|---|---|
| Liang & Zeger (1986). Biometrika, 73(1), 13–22. | Original GEE paper |
| Zeger et al. (1988). Biometrics, 44(4), 1049–1060. | Marginal vs conditional models |
| Pan (2001). Biometrics, 57(1), 120–125. | QIC for model selection |
| Mancl & DeRouen (2001). Biometrics, 57(1), 126–134. | Small-sample SE correction |
Online
Software
| Package | Use Case |
|---|---|
| geepack | Primary R package; geeglm interface |
| gee | Original R implementation; less flexible than geepack |
| geeM | Matrix-based GEE; handles larger problems |
| geesmv | Bias-corrected sandwich estimators for small samples |
| multgee | Multinomial/ordinal GEE |
Related Tutorials
| Tutorial | Focus |
|---|---|
| GEE Basic | Introduction to GEE with geepack |
| GEE Count Outcomes | Poisson and NB marginal models |
| GEE Time-Varying Covariates | Within-person predictors |
Related Guides
- GEE Overview — When to use, conceptual foundations, marginal vs conditional
- GEE Walkthrough — Step-by-step worked example
- GLMM Overview — Conditional (subject-specific) alternative to GEE
- LMM Overview — Linear mixed models for continuous outcomes