Generalized Estimating Equations (GEE)

Time-Varying Covariates (GEE)

Add time-varying covariates to generalized estimating equations to examine how evolving exposures influence repeated binary outcomes.

GEE › Binary
geepackabcd-studygeetime-varying-covariate
Work in ProgressExamples are a work in progress. Please exercise caution when using code examples, as they may not be fully verified. If you spot gaps, errors, or have suggestions, we'd love your feedback—use the "Suggest changes" button to help us improve!

Overview

Generalized Estimating Equations with time-varying covariates extend standard GEE models to examine how predictors that change over time influence longitudinal outcomes, estimating population-averaged effects while accounting for within-subject correlation. This approach captures dynamic relationships by allowing covariate values to vary at each assessment occasion. This tutorial analyzes sufficient sleep patterns (9–11 hours per night) in ABCD youth across multiple assessments, incorporating anxiety scores as a time-varying covariate to examine whether changes in anxiety levels over time predict concurrent fluctuations in sleep duration.

When to Use:
Use when both the outcome and predictors change over time and you want to assess dynamic relationships.
Key Advantage:
Extends GEE to include covariates that vary at each occasion, letting you examine how concurrent predictors modify population-level change.
What You'll Learn:
How to structure data for time-varying covariates, specify formulas with interaction terms, and interpret dynamic effects.

Data Access

Data Download

ABCD data can be accessed through the DEAP platform or the NBDC Data Access Platform (LASSO), which provide user-friendly interfaces for creating custom datasets with point-and-click variable selection. For detailed instructions on accessing and downloading ABCD data, see the DEAP documentation.

Loading Data with NBDCtools

Once you have downloaded ABCD data files, the NBDCtools package provides efficient tools for loading and preparing your data for analysis. The package handles common data management tasks including:

  • Automatic data joining - Merges variables from multiple tables automatically
  • Built-in transformations - Converts categorical variables to factors, handles missing data codes, and adds variable labels
  • Event filtering - Easily selects specific assessment waves

For more information, visit the NBDCtools documentation.

Basic Usage

The create_dataset() function is the main tool for loading ABCD data:

library(NBDCtools)

# Define variables needed for this analysis
requested_vars <- c(
  "var_1",   # Variable 1
  "var_2",   # Variable 2
  "var_3"    # Variable 3
)

# Set path to downloaded ABCD data files
data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

# Load data with automatic transformations
abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)
Key Parameters
  • vars - Vector of variable names to load
  • release - ABCD data release version (e.g., "6.0")
  • format - File format, typically "parquet" for efficiency
  • categ_to_factor - Automatically converts categorical variables to factors
  • value_to_na - Converts ABCD missing value codes to R's NA
  • add_labels - Adds descriptive labels to variables and values
Additional NBDCtools Resources

For more details on using NBDCtools:

Data Preparation

NBDCtools Setup and Data Loading
R30 lines
### Load necessary libraries
library(NBDCtools)    # ABCD data access helper
library(arrow)        # Read Parquet files
library(tidyverse)    # Data wrangling & visualization
library(gtsummary)    # Summary tables
library(rstatix)      # Statistical tests in tidy format
library(geepack)      # Generalized Estimating Equations (GEE) analysis
library(broom)        # Organizing model outputs
library(gt)           # Presentation-ready display tables

### Load harmonized ABCD data required for this analysis
requested_vars <- c(
    "ab_g_dyn__design_site",
    "ab_g_stc__design_id__fam",
    "ph_p_sds__dims_001",
    "mh_p_cbcl__dsm__anx_tscore"
)

data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)
Data Transformation
R24 lines
# Data wrangling: clean, restructure, and recode sleep variable
df_long <- abcd_data %>%
  filter(session_id %in% c("ses-00A", "ses-01A", "ses-02A", "ses-03A")) %>%  # Keep Baseline - Year 3
  arrange(participant_id, session_id) %>%
   mutate(
    participant_id = factor(participant_id), # Convert participant_id to a factor
    session_id = factor(session_id,
                        levels = c("ses-00A", "ses-01A", "ses-02A", "ses-03A"),
                        labels = c("Baseline", "Year_1", "Year_2", "Year_3")),  # Label sessions

    ab_g_dyn__design_site = factor(ab_g_dyn__design_site),  # Convert site to a factor
    ab_g_stc__design_id__fam = factor(ab_g_stc__design_id__fam), # Convert family id to a factor
    sleep_hours = as.numeric(ph_p_sds__dims_001),
    anxiety = as.numeric(mh_p_cbcl__dsm__anx_tscore)  # Convert to numeric
  ) %>%
  filter(sleep_hours != 999, anxiety != 999) %>%  # Remove "Don't know" responses
  mutate(
    sleep_binary = ifelse(sleep_hours == 1, 1, 0)  # Recode: 9-11 hrs = 1, others = 0
) %>%
  rename(  # Rename for simplicity
    site = ab_g_dyn__design_site,
    family_id = ab_g_stc__design_id__fam,
  ) %>%
  drop_na()  # Remove missing values
Descriptive Statistics
R31 lines
# Summarize 'sleep_binary' and 'strength_exercise' across session_ids

descriptives_table <- df_long %>%
    select(session_id, sleep_binary, anxiety) %>%
    tbl_summary(
        by = session_id,
        missing = "no",
        label = list(
          sleep_binary ~ "Sufficient Sleep (9-11 hrs)",
          anxiety ~ "Anxiety"
        ),
        statistic = list(
            all_categorical() ~ "{n} ({p}%)",
            all_continuous() ~ "{mean} ({sd})"
        )
    ) %>%
    modify_header(all_stat_cols() ~ "**{level}**<br>N = {n}") %>%
    modify_spanning_header(all_stat_cols() ~ "**Assessment Wave**") %>%
    bold_labels() %>%
    italicize_levels()

### Apply compact styling
theme_gtsummary_compact()

descriptives_table <- as_gt(descriptives_table)

### Save the table as HTML
gt::gtsave(descriptives_table, filename = "descriptives_table.html")

### Print the table
descriptives_table
Characteristic
Assessment Wave
Baseline
N = 11861
1
Year_1
N = 11108
1
Year_2
N = 9612
1
Year_3
N = 8075
1
Sufficient Sleep (9-11 hrs) 5,622 (47%) 4,354 (39%) 3,065 (32%) 2,298 (28%)
Anxiety 53.5 (6.1) 53.6 (6.2) 53.8 (6.1) 54.7 (6.6)
1 n (%); Mean (SD)

Statistical Analysis

Fit GEE Model with Time-Varying Covariate
R33 lines
# Fit GEE model with time-varying anxiety covariate
model <- geeglm(sleep_binary ~ session_id + site + anxiety,
  id = participant_id,
  data = df_long,
  family = binomial(link = "logit"),
  corstr = "exchangeable"
)

# Generate summary table from model coefficients
coefs <- as.data.frame(summary(model)$coefficients)
model_summary <- data.frame(
  Parameter = rownames(coefs),
  Estimate = coefs$Estimate,
  SE = coefs$Std.err,
  Wald = coefs$Wald,
  p_value = coefs[, "Pr(>|W|)"]
) %>%
  gt() %>%
  tab_header(title = "GEE Model: Sleep ~ Time + Site + Anxiety (TVC)") %>%
  fmt_number(columns = c(Estimate, SE, Wald), decimals = 3) %>%
  fmt_number(columns = p_value, decimals = 4) %>%
  cols_label(
    Parameter = "Parameter",
    Estimate = "Estimate",
    SE = "Robust SE",
    Wald = "Wald",
    p_value = "p-value"
  )

model_summary

# Save as standalone HTML
gt::gtsave(model_summary, filename = "model_summary.html")
Create Model Diagnostics Table
R27 lines
# Create GEE diagnostics data
diagnostics_data <- data.frame(
  Characteristic = c(
    "Correlation Structure",
    "Correlation Parameter (α)",
    "Scale Parameter",
    "Number of Participants",
    "Total Observations"
  ),
  Value = c(
    model$corstr,
    round(model$geese$alpha, 3),
    round(summary(model)$dispersion$Estimate, 3),
    length(unique(df_long$participant_id)),
    nrow(df_long)
  )
)

# Format diagnostics table
diagnostics_table <- diagnostics_data %>%
  gt::gt() %>%
  gt::tab_header(title = "GEE Model Diagnostics")

diagnostics_table

# Save diagnostics table
gt::gtsave(diagnostics_table, filename = "model_diagnostics.html")
GEE Model: Sleep ~ Time + Site + Anxiety (TVC)
Parameter Estimate Robust SE Wald p-value
(Intercept) 0.509 0.132 14.837 0.0001
session_idYear_1 −0.368 0.021 318.955 0.0000
session_idYear_2 −0.709 0.024 889.895 0.0000
session_idYear_3 −0.874 0.027 1,014.725 0.0000
site2 0.771 0.104 55.092 0.0000
site3 −0.440 0.107 16.847 0.0000
site4 0.129 0.101 1.632 0.2014
site5 0.090 0.118 0.581 0.4458
site6 0.850 0.101 70.353 0.0000
site7 0.302 0.118 6.546 0.0105
site8 0.747 0.115 41.985 0.0000
site9 0.190 0.112 2.860 0.0908
site10 0.223 0.100 4.977 0.0257
site11 −0.105 0.113 0.861 0.3534
site12 −0.007 0.107 0.004 0.9466
site13 0.291 0.100 8.452 0.0036
site14 0.754 0.101 55.507 0.0000
site15 −0.729 0.122 35.422 0.0000
site16 0.586 0.094 38.921 0.0000
site17 0.811 0.103 62.027 0.0000
site18 0.654 0.111 34.785 0.0000
site19 0.069 0.106 0.425 0.5146
site20 0.004 0.102 0.001 0.9715
site21 0.157 0.104 2.296 0.1297
site22 0.086 0.300 0.083 0.7736
anxiety −0.017 0.002 70.954 0.0000
GEE Model Diagnostics
Characteristic Value
Correlation Structure exchangeable
Correlation Parameter (α) 0.37
Scale Parameter 1.009
Number of Participants 11866
Total Observations 40656
Interpretation
Interpretation

Compared with Baseline, the odds of meeting the 9–11 hour guideline fell by 31% at Year 1 (OR = 0.69, p < .001), 51% at Year 2 (OR = 0.49, p < .001), and 58% at Year 3 (OR = 0.42, p < .001). The monotonic decline points to a cohort-wide erosion of sufficient sleep across adolescence rather than a transient dip. Anxiety exerted an additional, albeit modest, effect: each one-unit uptick corresponded to a 1.6% decrease in the odds of sufficient sleep (OR = 0.984, p < .001), indicating that even within-person fluctuations in stress meaningfully shape nightly outcomes. The working-correlation estimate (α = 0.37) signals moderate within-subject stability—youth differ from one another, but each individual tends to retain their relative position over time. Together, the coefficients portray a population that is steadily drifting away from healthy sleep, with anxiety acting as a persistent drag on the probability of recovery.

Visualization
R35 lines
# Generate predicted probabilities across anxiety range
# Use the anxiety coefficient to show marginal effect at reference levels
coefs <- summary(model)$coefficients
intercept <- coefs["(Intercept)", "Estimate"]
beta_anxiety <- coefs["anxiety", "Estimate"]
se_anxiety <- coefs["anxiety", "Std.err"]

anxiety_range <- seq(min(df_long$anxiety, na.rm = TRUE),
                     max(df_long$anxiety, na.rm = TRUE), length.out = 50)
linpred <- intercept + beta_anxiety * anxiety_range
se_pred <- se_anxiety * abs(anxiety_range - mean(anxiety_range))

preds <- data.frame(
  anxiety = anxiety_range,
  predicted = plogis(linpred),
  conf.low = plogis(linpred - 1.96 * se_pred),
  conf.high = plogis(linpred + 1.96 * se_pred)
)

# Plot predicted probabilities
visualization <- ggplot(preds, aes(x = anxiety, y = predicted)) +
  geom_line(color = "darkblue", linewidth = 1) +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2, fill = "darkblue") +
  labs(title = "Effect of Anxiety on Sleep Sufficiency",
       x = "Anxiety Level",
       y = "Predicted Probability of Sufficient Sleep") +
  theme_classic() +
  theme(plot.title = element_text(face = "bold", size = 14),
        axis.title = element_text(size = 12))

  ggsave(
  filename = "visualization.png",
  plot = visualization,
  width = 8, height = 6, dpi = 300
)
Visualization
Interpretation
Visualization Notes

The predicted-probability curve slopes downward, translating the small negative logit coefficient for anxiety into an easily interpretable decline in sleep sufficiency as anxiety rises. Confidence bands stay tight across the anxiety range, indicating that the effect is estimated precisely even at the more extreme scores. Although the probability drop looks gradual—roughly five percentage points between the calmest and most anxious youth—the entirely negative band underscores that the relationship is consistently detrimental. Taken together with the model table, the figure shows how incremental increases in anxiety accumulate into meaningfully lower odds of attaining the recommended 9–11 hours of sleep.

Discussion

Participants with higher anxiety scores were less likely to meet sufficient sleep recommendations, even after accounting for site and demographic controls. The logit coefficient for anxiety was negative and significant, translating to a meaningful drop in the probability of sufficient sleep for each within-person uptick in anxiety. Because anxiety was modeled as time-varying, the estimate reflects dynamic departures from an individual’s own typical level rather than static differences between youth.

The exchangeable working correlation captured modest within-person dependence, and robust standard errors remained stable under alternative structures, reinforcing confidence in the inference. Visualizations of predicted probabilities showed nearly parallel declines across time, indicating that anxiety consistently depresses sleep odds at every assessment rather than accelerating deterioration. Incorporating time-varying covariates in GEE therefore provided a population-averaged view of how fluctuations in stress manifest behaviorally, offering a practical template for future analyses that need to link changing predictors with binary outcomes.

Additional Resources

4

geepack Package Documentation

DOCS

Official CRAN documentation for the geepack package, covering geeglm() with time-varying covariates and different working correlation structures for repeated measures data.

Visit Resource

Time-Varying Covariates in GEE (Analysis Factor)

VIGNETTE

Practical tutorial that walks through formatting longitudinal data, specifying time-varying predictors in , and interpreting coefficients and interactions using real-world R examples.

Visit Resource

GEE vs Mixed Models

PAPER

Methodology paper comparing population-averaged (GEE) versus subject-specific (mixed model) approaches for longitudinal data, with guidance on choosing between methods (Hubbard et al., 2010).

Visit Resource

emmeans for GEE Marginal Effects

TOOL

R package for computing estimated marginal means and contrasts from GEE models, useful for interpreting time-varying covariate effects at specific time points.

Visit Resource