GEE Tutorial

This tutorial walks through a complete GEE analysis with a binary longitudinal outcome. By the end, you will have:

Simulated correlated binary data with known parameters
Fit GEE with different working correlation structures and visualized marginal predictions
Compared robust vs. naive standard errors
Compared GEE (marginal) and GLMM (conditional) results
Fit a brief count outcome model with Poisson GEE
Checked diagnostics (residuals, influential clusters)

Workflow Overview

┌─────────────────────────────────┐
│ 1. Setup                        │
│    Load packages, set seed      │
└────────────────┬────────────────┘
                 ▼
┌─────────────────────────────────┐
│ 2. Simulate Data                │
│    Correlated binary outcome    │
└────────────────┬────────────────┘
                 ▼
┌─────────────────────────────────┐
│ 3. Visualize                    │
│    Proportions over time        │
└────────────────┬────────────────┘
                 ▼
┌─────────────────────────────────┐
│ 4. Fit GEE Models               │
│    Compare structs + visualize  │
└────────────────┬────────────────┘
                 ▼
┌─────────────────────────────────┐
│ 5. Robust vs. Naive SEs         │
│    Compare + visualize          │
└────────────────┬────────────────┘
                 ▼
┌─────────────────────────────────┐
│ 6. Compare GEE and GLMM         │
│    Marginal vs. conditional     │
└────────────────┬────────────────┘
                 ▼
┌─────────────────────────────────┐
│ 7. Count Outcome (Brief)        │
│    Poisson GEE + IRRs           │
└────────────────┬────────────────┘
                 ▼
┌─────────────────────────────────┐
│ 8. Diagnostics                  │
│    Residuals, influence         │
└────────────────┬────────────────┘
                 ▼
┌─────────────────────────────────┐
│ 9. Interpret Results            │
│    ORs, predicted probabilities │
└─────────────────────────────────┘

Step 1: Setup

# Load packages
library(tidyverse)   # Data manipulation and plotting
library(geepack)     # GEE fitting
library(lme4)        # GLMM for comparison
library(MASS)        # mvrnorm for simulation

# Set seed for reproducibility
set.seed(2024)

Step 2: Simulate Data

We simulate 200 participants measured at 5 waves with a binary outcome. The data-generating process uses a random intercept (creating exchangeable-like correlation) so we know the true marginal and conditional parameters.

Known Population Parameters

Parameter	Conditional (log-odds)	Marginal (approximate)
Intercept	-0.5	≈ -0.42
Time	-0.3 per wave	≈ -0.25
Treatment	-0.8	≈ -0.67
Random intercept SD	1.0	—

Marginal coefficients are attenuated relative to conditional — this is what we'll verify.

n <- 200
waves <- 5
N <- n * waves

person_df <- tibble(
  id = 1:n,
  treatment = rep(c(0, 1), each = n / 2),
  ri = rnorm(n, 0, 1.0)  # Random intercept
)

df <- person_df %>%
  crossing(time = 0:(waves - 1)) %>%
  arrange(id, time) %>%
  mutate(
    eta  = -0.5 + (-0.3 * time) + (-0.8 * treatment) + ri,
    prob = plogis(eta),
    y    = rbinom(N, 1, prob)
  )

head(df)

✅ Checkpoint: You should see columns id, treatment, time, and y (0/1).

Step 3: Visualize

df %>%
  group_by(time, treatment) %>%
  summarise(prop = mean(y), .groups = "drop") %>%
  mutate(group = factor(treatment, labels = c("Control", "Treatment"))) %>%
  ggplot(aes(x = time, y = prop, color = group)) +
  geom_line(linewidth = 1) +
  geom_point(size = 3) +
  scale_y_continuous(limits = c(0, 1), labels = scales::percent) +
  scale_x_continuous(breaks = 0:4, labels = paste("Wave", 1:5)) +
  labs(x = "Time", y = "Proportion (y = 1)",
       title = "Observed Prevalence by Group Over Time",
       color = "Group") +
  theme_minimal()

What to look for:

Declining prevalence over time (negative time effect)
Lower prevalence in the treatment group
Group trajectories — roughly parallel on this scale?

✅ Checkpoint: Treatment group should be consistently below the control group, both declining over waves.

Step 4: Fit GEE Models

Exchangeable Correlation

fit_exch <- geeglm(y ~ time + treatment, id = id, data = df,
                    family = binomial, corstr = "exchangeable")
summary(fit_exch)

AR(1) Correlation

fit_ar1 <- geeglm(y ~ time + treatment, id = id, data = df,
                   family = binomial, corstr = "ar1")
summary(fit_ar1)

Independence

fit_indep <- geeglm(y ~ time + treatment, id = id, data = df,
                     family = binomial, corstr = "independence")
summary(fit_indep)

Compare Estimates Across Structures

bind_rows(
  tibble(corstr = "Exchangeable",
         term = names(coef(fit_exch)),
         est = coef(fit_exch),
         se = summary(fit_exch)$coefficients[, "Std.err"]),
  tibble(corstr = "AR(1)",
         term = names(coef(fit_ar1)),
         est = coef(fit_ar1),
         se = summary(fit_ar1)$coefficients[, "Std.err"]),
  tibble(corstr = "Independence",
         term = names(coef(fit_indep)),
         est = coef(fit_indep),
         se = summary(fit_indep)$coefficients[, "Std.err"])
) %>%
  mutate(across(c(est, se), \(x) round(x, 3))) %>%
  pivot_wider(names_from = corstr, values_from = c(est, se))

✅ Checkpoint: Point estimates should be similar across structures (consistency property). Standard errors may differ — this reflects efficiency differences. The exchangeable structure should match the data-generating process best since we used a random intercept.

Examine Working Correlations

# Estimated correlation parameter
fit_exch$geese$alpha   # Exchangeable: single α
fit_ar1$geese$alpha    # AR(1): single α

Marginal Predicted Probabilities

Visualize what GEE actually estimates — population-averaged probability curves:

pred_grid <- expand.grid(time = 0:4, treatment = c(0, 1))
pred_grid$prob <- predict(fit_exch, newdata = pred_grid, type = "response")

pred_grid %>%
  mutate(group = factor(treatment, labels = c("Control", "Treatment"))) %>%
  ggplot(aes(x = time, y = prob, color = group)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 3) +
  scale_y_continuous(limits = c(0, 1), labels = scales::percent) +
  scale_x_continuous(breaks = 0:4, labels = paste("Wave", 1:5)) +
  labs(x = "Time", y = "Predicted Probability",
       title = "GEE Marginal Predicted Probabilities",
       color = "Group") +
  theme_minimal()

What to look for:

These are population-averaged curves — not individual trajectories
Both groups decline over time (negative time coefficient)
Treatment group is consistently lower (negative treatment coefficient)
Unlike GLMM, there are no person-specific curves — GEE estimates one curve per group

Step 5: Robust vs. Naive SEs

GEE reports robust (sandwich) SEs by default in geepack. Let's compare them to naive (model-based) SEs.

# Extract both SE types
robust_se <- summary(fit_exch)$coefficients[, "Std.err"]

# Naive SEs from the model-based covariance
naive_cov <- fit_exch$geese$vbeta.naiv
naive_se  <- sqrt(diag(naive_cov))

comparison <- tibble(
  term    = names(coef(fit_exch)),
  robust  = round(robust_se, 4),
  naive   = round(naive_se, 4),
  ratio   = round(robust_se / naive_se, 3)
)
comparison

Interpreting the ratio:

Ratio	Meaning
0.9–1.1	Working correlation is approximately correct
1.1–1.3	Mild misspecification — robust SEs slightly wider
> 1.3	Substantial misspecification — consider different corstr
< 0.9	Unusual — may indicate small-sample instability

Visualize SE Comparison

comparison %>%
  pivot_longer(c(robust, naive), names_to = "type", values_to = "se") %>%
  ggplot(aes(x = term, y = se, fill = type)) +
  geom_col(position = "dodge", width = 0.6) +
  scale_fill_manual(values = c(naive = "steelblue", robust = "coral"),
                    labels = c("Naive", "Robust")) +
  labs(x = "Parameter", y = "Standard Error",
       title = "Robust vs. Naive Standard Errors",
       fill = "SE Type") +
  theme_minimal()

When bars are similar height, the working correlation is reasonable. Large discrepancies signal misspecification — the robust SEs are protecting you.

✅ Checkpoint: Since the data were generated with exchangeable-like correlation and we fit exchangeable, the ratio should be close to 1.0 and the bars should be nearly equal.

Step 6: Compare GEE and GLMM

This is the key comparison: marginal (GEE) vs. conditional (GLMM) estimates for the same data.

Fit GLMM

fit_glmm <- glmer(y ~ time + treatment + (1 | id),
                   data = df, family = binomial)

Side-by-Side Comparison

tibble(
  term      = names(coef(fit_exch)),
  GEE_est   = round(coef(fit_exch), 3),
  GEE_se    = round(summary(fit_exch)$coefficients[, "Std.err"], 3),
  GLMM_est  = round(fixef(fit_glmm), 3),
  GLMM_se   = round(summary(fit_glmm)$coefficients[, "Std. Error"], 3),
  ratio     = round(coef(fit_exch) / fixef(fit_glmm), 3)
)

Expected pattern:

GEE coefficients should be smaller in magnitude than GLMM coefficients
The attenuation ratio is approximately c* = 1/√(1 + c²σ²), where σ² is the random intercept variance and c ≈ 16√3/(15π)
With σ² = 1.0: ratio ≈ 0.84 for logistic models

✅ Checkpoint: GEE estimates should be approximately 80–85% of GLMM estimates. This isn't bias — they target different quantities.

Predicted Probabilities

# GEE: marginal predictions
pred_grid <- expand.grid(time = 0:4, treatment = c(0, 1))
pred_grid$gee_prob <- predict(fit_exch, newdata = pred_grid, type = "response")

# GLMM: conditional predictions (at RE = 0)
pred_grid$glmm_prob <- predict(fit_glmm, newdata = pred_grid,
                                type = "response", re.form = NA)

pred_grid %>%
  mutate(group = factor(treatment, labels = c("Control", "Treatment"))) %>%
  pivot_longer(c(gee_prob, glmm_prob), names_to = "model", values_to = "prob") %>%
  mutate(model = ifelse(model == "gee_prob", "GEE (marginal)", "GLMM (conditional)")) %>%
  ggplot(aes(x = time, y = prob, color = group, linetype = model)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  scale_y_continuous(limits = c(0, 1), labels = scales::percent) +
  scale_x_continuous(breaks = 0:4, labels = paste("Wave", 1:5)) +
  labs(x = "Time", y = "Predicted Probability",
       title = "GEE vs. GLMM Predicted Probabilities",
       color = "Group", linetype = "Model") +
  theme_minimal()

✅ Checkpoint: GEE and GLMM predictions at RE = 0 will be similar but not identical. The GEE curves represent population averages; the GLMM curves represent the "average individual."

Step 7: Count Outcome (Brief)

GEE handles count outcomes the same way — change the family to Poisson. Here's a minimal example using the same simulated data with a count outcome added:

# Add a count outcome to existing data
df <- df %>%
  mutate(
    eta_count = 1.0 + (-0.15 * time) + (-0.4 * treatment) + ri,
    count = rpois(n(), exp(eta_count))
  )

# Fit Poisson GEE
fit_count <- geeglm(count ~ time + treatment, id = id, data = df,
                     family = poisson, corstr = "exchangeable")
summary(fit_count)

# Incidence Rate Ratios
exp(coef(fit_count))
exp(confint(fit_count))

Parameter	Scale	Interpretation
Time	IRR = exp(β₁)	Per-wave change in population rate
Treatment	IRR = exp(β₂)	Treatment vs control rate ratio

Key difference from binary: Coefficients are on the log scale; exponentiate for incidence rate ratios (IRRs) instead of odds ratios. Check the dispersion parameter (fit_count$geese$gamma) — values well above 1 suggest overdispersion. GEE's sandwich SEs account for overdispersion automatically, but a negative binomial model may fit the mean structure better.

✅ Checkpoint: IRRs should be close to the true generating values: exp(-0.15) ≈ 0.86 for time, exp(-0.40) ≈ 0.67 for treatment.

Step 8: Diagnostics

Residual Check

tibble(
  fitted = fitted(fit_exch),
  residual = residuals(fit_exch, type = "pearson")
) %>%
  ggplot(aes(x = fitted, y = residual)) +
  geom_point(alpha = 0.3) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(x = "Fitted Values", y = "Pearson Residuals",
       title = "Residuals vs. Fitted (Exchangeable GEE)") +
  theme_minimal()

For binary outcomes, residuals naturally form two bands (one for y = 0, one for y = 1). Focus on whether there are systematic trends rather than expecting random scatter.

Influential Clusters

Check whether any individual disproportionately affects the results:

ids <- unique(df$id)
loo_coefs <- map_dfr(ids, \(drop_id) {
  fit_i <- update(fit_exch, data = filter(df, id != drop_id))
  tibble(dropped = drop_id, term = names(coef(fit_i)), est = coef(fit_i))
})

loo_coefs %>%
  group_by(term) %>%
  summarise(
    mean_est = round(mean(est), 4),
    sd_est   = round(sd(est), 4),
    max_diff = round(max(abs(est - mean(est))), 4),
    .groups = "drop"
  )

If dropping a single cluster substantially changes coefficients, investigate that cluster. With N = 200, no single person should have outsized influence.

✅ Checkpoint: The standard deviation across leave-one-out estimates should be small relative to the coefficient values.

Step 9: Interpret Results

GEE Results Summary

summary(fit_exch)

Parameter	Estimate	Robust SE	OR	Interpretation
Intercept	~-0.42	~0.14	~0.66	Population odds at Wave 1 (control)
Time	~-0.25	~0.04	~0.78	Per-wave change in population odds
Treatment	~-0.67	~0.19	~0.51	Treatment vs control population odds

Values are illustrative; yours will differ with the random seed.

Key interpretations (marginal, population-averaged):

Time OR ≈ 0.78: Across the population, the odds of the outcome decrease by ~22% per wave
Treatment OR ≈ 0.51: The treatment group has ~49% lower odds than controls, on average across the population
These describe population trends, not individual trajectories

Written Summary

We estimated a GEE model with exchangeable working correlation for a binary longitudinal outcome (N = 200, 5 waves). Using robust standard errors, the population-averaged odds of the outcome decreased over time (OR = 0.78, 95% CI [0.72, 0.85], p < .001) and were lower in the treatment group (OR = 0.51, 95% CI [0.35, 0.74], p < .001). The estimated within-person correlation was α = 0.22. Robust and naive SEs were in close agreement, supporting the exchangeable working correlation specification. Marginal estimates were attenuated relative to conditional GLMM estimates by a factor of ~0.84, consistent with the random intercept variance (σ² ≈ 1.0).

Complete Script

# ============================================
# GEE Complete Worked Example
# ============================================

library(tidyverse)
library(geepack)
library(lme4)
library(MASS)
set.seed(2024)

# --- Simulate data ---
n <- 200; waves <- 5; N <- n * waves

person_df <- tibble(
  id = 1:n,
  treatment = rep(c(0, 1), each = n / 2),
  ri = rnorm(n, 0, 1.0)
)

df <- person_df %>%
  crossing(time = 0:(waves - 1)) %>%
  arrange(id, time) %>%
  mutate(
    eta  = -0.5 + (-0.3 * time) + (-0.8 * treatment) + ri,
    prob = plogis(eta),
    y    = rbinom(N, 1, prob)
  )

# --- Fit GEE with different structures ---
fit_exch  <- geeglm(y ~ time + treatment, id = id, data = df,
                     family = binomial, corstr = "exchangeable")
fit_ar1   <- geeglm(y ~ time + treatment, id = id, data = df,
                     family = binomial, corstr = "ar1")
fit_indep <- geeglm(y ~ time + treatment, id = id, data = df,
                     family = binomial, corstr = "independence")

# --- Compare structures ---
summary(fit_exch)
summary(fit_ar1)

# --- Marginal predicted probabilities ---
pred_grid <- expand.grid(time = 0:4, treatment = c(0, 1))
pred_grid$prob <- predict(fit_exch, newdata = pred_grid, type = "response")

# --- Robust vs naive SEs ---
robust_se <- summary(fit_exch)$coefficients[, "Std.err"]
naive_se  <- sqrt(diag(fit_exch$geese$vbeta.naiv))
comparison <- data.frame(term = names(coef(fit_exch)),
                         robust = round(robust_se, 4),
                         naive = round(naive_se, 4),
                         ratio = round(robust_se / naive_se, 3))
comparison

# --- Compare GEE vs GLMM ---
fit_glmm <- glmer(y ~ time + treatment + (1 | id),
                   data = df, family = binomial)

data.frame(term = names(coef(fit_exch)),
           GEE = round(coef(fit_exch), 3),
           GLMM = round(fixef(fit_glmm), 3),
           ratio = round(coef(fit_exch) / fixef(fit_glmm), 3))

# --- Count outcome (Poisson GEE) ---
df <- df %>%
  mutate(
    eta_count = 1.0 + (-0.15 * time) + (-0.4 * treatment) + ri,
    count = rpois(n(), exp(eta_count))
  )

fit_count <- geeglm(count ~ time + treatment, id = id, data = df,
                     family = poisson, corstr = "exchangeable")
exp(coef(fit_count))
exp(confint(fit_count))

# --- Diagnostics ---
# Residuals
tibble(fitted = fitted(fit_exch),
       residual = residuals(fit_exch, type = "pearson"))

# Leave-one-cluster-out influence check
ids <- unique(df$id)
loo_coefs <- map_dfr(ids, \(drop_id) {
  fit_i <- update(fit_exch, data = filter(df, id != drop_id))
  tibble(dropped = drop_id, term = names(coef(fit_i)), est = coef(fit_i))
})

# --- ORs ---
exp(coef(fit_exch))
exp(confint(fit_exch))

Adapt to Your Data

1. Read and Prepare

df <- read_csv("my_data.csv") %>%
  mutate(id = factor(id), time = wave - 1) %>%
  arrange(id, time)  # GEE requires data sorted by cluster

Note

Data must be sorted by cluster (id)

geeglm requires rows to be grouped by the clustering variable. If data aren't sorted, results will be wrong without warning.

2. Choose Family and Correlation

Outcome	Family	Typical Correlation
Binary	`binomial`	Exchangeable or AR(1)
Count	`poisson`	Exchangeable or AR(1)

3. Start Simple, Compare

# Start with exchangeable
fit1 <- geeglm(outcome ~ time + group, id = id, data = df,
               family = binomial, corstr = "exchangeable")

# Compare with AR(1)
fit2 <- geeglm(outcome ~ time + group, id = id, data = df,
               family = binomial, corstr = "ar1")

# Compare QIC
QIC(fit1); QIC(fit2)

# Check robust vs naive SE agreement
robust1 <- summary(fit1)$coefficients[, "Std.err"]
naive1  <- sqrt(diag(fit1$geese$vbeta.naiv))
round(robust1 / naive1, 3)

Lower QIC and robust/naive ratio near 1.0 both favor a structure. If they disagree, trust robust SEs and use the structure with better QIC.

4. Fit and Examine

fit <- geeglm(outcome ~ time + group, id = id, data = df,
              family = binomial, corstr = "exchangeable")
summary(fit)
exp(coef(fit))        # ORs
exp(confint(fit))     # OR CIs

✅ Checkpoint: Always compare robust and naive SEs. Always sort data by cluster before fitting.

Next Steps

Need syntax or thresholds? → Reference
Back to overview → GEE Guide
Want individual-level effects? → GLMM Guide