Overview

Latent Growth Curve Modeling with time-invariant covariates extends basic growth modeling by explaining why individuals differ in initial levels and rates of change. By incorporating predictors like demographics or socioeconomic factors as covariates of latent intercept and slope factors, this approach reveals how stable characteristics shape developmental trajectories. This tutorial examines emotional suppression in ABCD youth across four annual assessments (Years 3–6), modeling how age, sex, parental education, and household income predict both baseline suppression levels and individual rates of change over time.

When to Use:

Use when baseline demographics or socioeconomic factors may predict both initial levels and individual growth trajectories, and you want to understand what drives developmental heterogeneity.

Key Advantage:

Integrates covariates directly into the latent intercept and slope, revealing how each predictor shifts starting points and trajectories simultaneously while accounting for measurement error in the outcome.

What You'll Learn:

How to add covariates to an LGCM, interpret their effects on intercept and slope factors, and evaluate whether covariates explain meaningful portions of growth heterogeneity.

Data Access

Data Download

ABCD data can be accessed through the DEAP platform or the NBDC Data Access Platform (LASSO), which provide user-friendly interfaces for creating custom datasets with point-and-click variable selection. For detailed instructions on accessing and downloading ABCD data, see the DEAP documentation.

Loading Data with NBDCtools

Once you have downloaded ABCD data files, the NBDCtools package provides efficient tools for loading and preparing your data for analysis. The package handles common data management tasks including:

Automatic data joining - Merges variables from multiple tables automatically
Built-in transformations - Converts categorical variables to factors, handles missing data codes, and adds variable labels
Event filtering - Easily selects specific assessment waves

For more information, visit the NBDCtools documentation.

Basic Usage

The create_dataset() function is the main tool for loading ABCD data:

library(NBDCtools)

# Define variables needed for this analysis
requested_vars <- c(
  "var_1",   # Variable 1
  "var_2",   # Variable 2
  "var_3"    # Variable 3
)

# Set path to downloaded ABCD data files
data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

# Load data with automatic transformations
abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)

Key Parameters

vars - Vector of variable names to load
release - ABCD data release version (e.g., "6.0")
format - File format, typically "parquet" for efficiency
categ_to_factor - Automatically converts categorical variables to factors
value_to_na - Converts ABCD missing value codes to R's NA
add_labels - Adds descriptive labels to variables and values

Additional Resources

For more details on using NBDCtools:

NBDCtools Getting Started Guide - Complete package overview
Joining Data - Advanced data merging strategies
Filtering Events - Selecting specific assessment waves
Data Transformations - Preprocessing and cleaning

Data Preparation

NBDCtools Setup and Data Loading

36 lines

### Load necessary libraries
library(NBDCtools)    # ABCD data access helper
library(tidyverse)    # Collection of R packages for data science
library(arrow)        # For reading Parquet files
library(gtsummary)    # Creating publication-quality tables
library(OpenMx)       # Matrix-based SEM engine
library(broom)        # For tidying model outputs
library(gt)           # For creating formatted tables

### Set random seed for reproducible family member selection
set.seed(123)

### Load harmonized ABCD data required for this analysis
requested_vars <- c(
    "ab_g_dyn__design_site",
    "ab_g_stc__design_id__fam",
    "ab_g_dyn__visit_age",
    "ab_g_stc__cohort_sex",
    "ab_g_stc__cohort_race__nih",
    "ab_g_dyn__cohort_edu__cgs",
    "ab_g_dyn__cohort_income__hhold__3lvl",
    "mh_y_erq__suppr_mean"
)

data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)

Show all 36 linesShow less

Data Transformation

40 lines

### Create longitudinal dataset
# Filter to ERQ assessment waves (Years 3-6)
df_long <- abcd_data %>%
  filter(session_id %in% c("ses-03A", "ses-04A", "ses-05A", "ses-06A")) %>%
  arrange(participant_id, session_id)

### Clean and transform variables
df_long <- df_long %>%
  mutate(
    participant_id = factor(participant_id),
    session_id = factor(session_id,
                       levels = c("ses-03A", "ses-04A", "ses-05A", "ses-06A"),
                       labels = c("Year_3", "Year_4", "Year_5", "Year_6")),
    site = factor(ab_g_dyn__design_site),
    family_id = factor(ab_g_stc__design_id__fam),
    age = as.numeric(ab_g_dyn__visit_age),
    sex = factor(ab_g_stc__cohort_sex,
                 levels = c("1", "2"),
                 labels = c("Male", "Female")),
    race = factor(ab_g_stc__cohort_race__nih,
                  levels = c("2", "3", "4", "5", "6", "7", "8"),
                  labels = c("White", "Black", "Asian", "AI/AN", "NH/PI", "Multi-Race", "Other")),
    education = as.numeric(ab_g_dyn__cohort_edu__cgs),
    income = as.numeric(ab_g_dyn__cohort_income__hhold__3lvl),
    suppression = round(as.numeric(mh_y_erq__suppr_mean), 2)
  ) %>%
  select(participant_id, session_id, site, family_id, age, sex, race, education, income, suppression) %>%
  drop_na()

### Get baseline covariates (Year 3)
baseline_covariates <- df_long %>%
  filter(session_id == "Year_3") %>%
  select(participant_id, age, sex, education, income) %>%
  mutate(
    age_c = age - mean(age, na.rm = TRUE),
    female = ifelse(sex == "Female", 1, 0),
    education_c = education - mean(education, na.rm = TRUE),
    income_c = income - mean(income, na.rm = TRUE)
  ) %>%
  select(participant_id, age_c, female, education_c, income_c)

Show all 40 linesShow less

Reshape to Wide Format

10 lines

### Reshape suppression to wide format
df_wide <- df_long %>%
  select(participant_id, session_id, suppression, site) %>%
  pivot_wider(
    names_from = session_id,
    values_from = suppression,
    names_prefix = "Suppression_"
  ) %>%
  left_join(baseline_covariates, by = "participant_id") %>%
  drop_na()

Descriptive Statistics

26 lines

### Create descriptive summary table
descriptives_table <- df_long %>%
  select(session_id, suppression) %>%
  tbl_summary(
    by = session_id,
    missing = "no",
    label = list(
      suppression ~ "Suppression"
    ),
    statistic = list(all_continuous() ~ "{mean} ({sd})")
  ) %>%
  modify_header(all_stat_cols() ~ "{level}<br>N = {n}") %>%
  modify_spanning_header(all_stat_cols() ~ "Assessment Wave") %>%
  bold_labels() %>%
  italicize_levels()

### Apply compact styling
theme_gtsummary_compact()

descriptives_table <- as_gt(descriptives_table)

### Save the table as HTML
gt::gtsave(descriptives_table, filename = "descriptives_table.html")

### Print the table
descriptives_table

Show all 26 linesShow less

Characteristic	Assessment Wave
Characteristic	Year_3 N = 8953¹	Year_4 N = 8376¹	Year_5 N = 7685¹	Year_6 N = 4496¹
Suppression	3.09 (0.86)	3.33 (0.86)	3.34 (0.87)	3.42 (0.88)
¹ Mean (SD)

Statistical Analysis

Define and Fit LGCM with Covariates in OpenMx

107 lines

### Prepare data for OpenMx
mx_data <- df_wide %>%
  select(starts_with("Suppression_"), age_c, female, education_c, income_c) %>%
  as.data.frame()

outcome_vars <- c("Suppression_Year_3", "Suppression_Year_4",
                   "Suppression_Year_5", "Suppression_Year_6")
covariate_vars <- c("age_c", "female", "education_c", "income_c")
manifest_vars <- c(outcome_vars, covariate_vars)
latent_vars <- c("i", "s")

### Build the LGCM with time-invariant covariates in OpenMx RAM notation
# The model: outcomes = intercept + slope*time + error
# Covariates predict latent intercept and slope via regression paths
model <- mxModel(
  "LGCM_TIC",
  type = "RAM",
  manifestVars = manifest_vars,
  latentVars = latent_vars,

  # Data
  mxData(observed = mx_data, type = "raw"),

  # --- Factor loadings: intercept (all 1s) ---
  mxPath(from = "i", to = outcome_vars,
         free = FALSE, values = c(1, 1, 1, 1)),

  # --- Factor loadings: slope (0, 1, 2, 3) ---
  mxPath(from = "s", to = outcome_vars,
         free = FALSE, values = c(0, 1, 2, 3)),

  # --- Latent means (conditional on covariates) ---
  mxPath(from = "one", to = c("i", "s"),
         free = TRUE, values = c(3.0, 0.1),
         labels = c("mean_i", "mean_s")),

  # --- Manifest outcome intercepts fixed to zero ---
  mxPath(from = "one", to = outcome_vars,
         free = FALSE, values = 0),

  # --- Covariate means (freely estimated, saturated) ---
  mxPath(from = "one", to = covariate_vars,
         free = TRUE, values = c(0, 0.5, 0, 0),
         labels = c("mean_age_c", "mean_female",
                     "mean_edu_c", "mean_inc_c")),

  # --- Latent residual variances and covariance ---
  mxPath(from = "i", arrows = 2, free = TRUE, values = 0.3,
         labels = "var_i"),
  mxPath(from = "s", arrows = 2, free = TRUE, values = 0.05,
         labels = "var_s"),
  mxPath(from = "i", to = "s", arrows = 2, free = TRUE, values = -0.03,
         labels = "cov_is"),

  # --- Residual variances (constrained equal across waves) ---
  mxPath(from = outcome_vars, arrows = 2,
         free = TRUE, values = 0.3,
         labels = c("resvar", "resvar", "resvar", "resvar")),

  # --- Covariate variances and covariances (saturated) ---
  mxPath(from = covariate_vars, arrows = 2, free = TRUE,
         values = c(1, 0.25, 1, 0.5),
         labels = c("var_age_c", "var_female",
                     "var_edu_c", "var_inc_c")),
  mxPath(from = "age_c", to = "female", arrows = 2, free = TRUE,
         values = 0, labels = "cov_age_fem"),
  mxPath(from = "age_c", to = "education_c", arrows = 2, free = TRUE,
         values = 0, labels = "cov_age_edu"),
  mxPath(from = "age_c", to = "income_c", arrows = 2, free = TRUE,
         values = 0, labels = "cov_age_inc"),
  mxPath(from = "female", to = "education_c", arrows = 2, free = TRUE,
         values = 0, labels = "cov_fem_edu"),
  mxPath(from = "female", to = "income_c", arrows = 2, free = TRUE,
         values = 0, labels = "cov_fem_inc"),
  mxPath(from = "education_c", to = "income_c", arrows = 2, free = TRUE,
         values = 0, labels = "cov_edu_inc"),

  # --- Regression paths: covariates → intercept ---
  mxPath(from = "age_c", to = "i", free = TRUE, values = 0.05,
         labels = "b_age_i"),
  mxPath(from = "female", to = "i", free = TRUE, values = 0,
         labels = "b_fem_i"),
  mxPath(from = "education_c", to = "i", free = TRUE, values = 0,
         labels = "b_edu_i"),
  mxPath(from = "income_c", to = "i", free = TRUE, values = 0,
         labels = "b_inc_i"),

  # --- Regression paths: covariates → slope ---
  mxPath(from = "age_c", to = "s", free = TRUE, values = -0.02,
         labels = "b_age_s"),
  mxPath(from = "female", to = "s", free = TRUE, values = 0,
         labels = "b_fem_s"),
  mxPath(from = "education_c", to = "s", free = TRUE, values = 0,
         labels = "b_edu_s"),
  mxPath(from = "income_c", to = "s", free = TRUE, values = 0,
         labels = "b_inc_s")
)

### Add non-negativity bounds on variance parameters
model <- mxModel(model,
  mxBounds(c("var_i", "var_s", "resvar"), min = 0.001))

### Fit the model
fit <- mxRun(model)

### Display model summary
summary(fit)

Show all 107 linesShow less

Format Model Summary Table

34 lines

### Extract parameter estimates into a tidy table
param_table <- summary(fit)$parameters

### Focus on substantive parameters (exclude saturated covariate moments)
covariate_moment_labels <- c("mean_age_c", "mean_female", "mean_edu_c",
                              "mean_inc_c", "var_age_c", "var_female",
                              "var_edu_c", "var_inc_c", "cov_age_fem",
                              "cov_age_edu", "cov_age_inc", "cov_fem_edu",
                              "cov_fem_inc", "cov_edu_inc")

substantive_params <- param_table %>%
  filter(!(name %in% covariate_moment_labels)) %>%
  select(name, Estimate, Std.Error) %>%
  mutate(
    z_value = Estimate / Std.Error,
    p_value = 2 * pnorm(-abs(z_value))
  ) %>%
  gt() %>%
  tab_header(title = "LGCM with Covariates Results (OpenMx)") %>%
  fmt_number(columns = c(Estimate, Std.Error, z_value, p_value), decimals = 3) %>%
  cols_label(
    name = "Parameter",
    Estimate = "Estimate",
    Std.Error = "Std. Error",
    z_value = "z",
    p_value = "p"
  )

### Save the gt table
gt::gtsave(
  data = substantive_params,
  filename = "model_summary.html",
  inline_css = FALSE
)

Show all 34 linesShow less

Format Model Fit Indices Table

34 lines

### Compute reference models for incremental fit indices
ref_models <- mxRefModels(fit, run = TRUE)
mx_summary <- summary(fit, refModels = ref_models)

### Extract fit indices
fit_data <- data.frame(
  Metric = c("chi-squared", "df", "p-value", "CFI", "TLI", "RMSEA", "AIC", "BIC"),
  Value = c(
    mx_summary$Chi,
    mx_summary$ChiDoF,
    mx_summary$p,
    mx_summary$CFI,
    mx_summary$TLI,
    mx_summary$RMSEA,
    mx_summary$AIC.Mx,
    mx_summary$BIC.Mx
  )
)

fit_indices_table <- fit_data %>%
  gt() %>%
  tab_header(title = "Model Fit Indices (OpenMx)") %>%
  fmt_number(columns = Value, decimals = 3) %>%
  cols_label(
    Metric = "Fit Measure",
    Value = "Value"
  )

### Save fit indices table
gt::gtsave(
  data = fit_indices_table,
  filename = "model_fit_indices.html",
  inline_css = FALSE
)

Show all 34 linesShow less

Parameter	Estimate	Std. Error	z	p
LGCM with Covariates Results (OpenMx)
b_age_i	0.071	0.021	3.435	0.001
b_age_s	−0.027	0.009	−2.911	0.004
b_fem_i	0.037	0.027	1.394	0.163
b_fem_s	−0.042	0.012	−3.611	0.000
b_edu_i	−0.041	0.015	−2.720	0.007
b_edu_s	−0.007	0.007	−1.018	0.309
b_inc_i	−0.058	0.021	−2.788	0.005
b_inc_s	0.016	0.009	1.758	0.079
resvar	0.384	0.007	58.592	0.000
var_i	0.336	0.015	21.960	0.000
cov_is	−0.041	0.006	−7.254	0.000
var_s	0.040	0.003	12.940	0.000
mean_i	3.074	0.019	165.984	0.000
mean_s	0.134	0.008	16.479	0.000

Fit Measure	Value
Model Fit Indices (OpenMx)
chi-squared	231.094
df	16.000
p-value	0.000
CFI	0.948
TLI	0.909
RMSEA	0.063
AIC	5,661.255
BIC	−162,828.428

Interpretation

Average suppression at Year 3 was 3.074 (SE = 0.019, p < .001) and climbed by 0.134 points per year (SE = 0.008, p < .001). Intercept and slope variances (0.336 and 0.040, both p < .001) confirmed sizable between-person differences, while the negative intercept-slope covariance (-0.041, p < .001) implied that adolescents starting high tended to rise more slowly. Fit indices (CFI = 0.948, TLI = 0.909, SRMR not available, RMSEA = 0.063) indicate the model captures the main longitudinal signal with only minor approximation error.

Time-invariant covariates added interpretable structure. Older youth entered with higher suppression (b = 0.071, p = .001) yet showed flatter growth (b = -0.027, p = .004). Females increased more slowly than males (b = -0.042, p < .001). Lower household income related to higher intercepts (b = -0.058, p = .005) and marginally steeper slopes (b = 0.016, p = .079), whereas higher parental education predicted lower starting suppression (b = -0.041, p = .007). Collectively, the results highlight heterogeneous developmental courses shaped by demographic context as well as idiosyncratic factors.

Visualization

28 lines

### Select a subset of participants for plotting
selected_ids <- sample(unique(df_long$participant_id), 150)
df_long_selected <- df_long %>% filter(participant_id %in% selected_ids)

### Plot suppression growth trajectories
visualization <- ggplot(df_long_selected,
                        aes(x = session_id, y = suppression,
                            group = participant_id)) +
    geom_line(alpha = 0.3, color = "gray") +
    geom_point(size = 1.5, color = "blue") +
    geom_smooth(aes(group = 1), method = "lm", color = "red",
                linewidth = 1.2, se = TRUE, fill = "lightpink") +
    labs(
        title = "Suppression Growth with Confidence Intervals",
        subtitle = "LGCM with Time-Invariant Covariates",
        x = "Assessment Wave",
        y = "Suppression Score"
    ) +
    theme_minimal()

print(visualization)

### Save the plot
ggsave(
  filename = "visualization.png",
  plot = visualization,
  width = 8, height = 6, dpi = 300
)

Show all 28 linesShow less

Visualization Notes

The plot displays individual and overall trends in emotional suppression across four annual assessments. Gray lines trace randomly selected individual trajectories, illustrating the heterogeneity in suppression changes. Blue points show observed measurements and the red line with shaded confidence band represents the estimated mean trajectory. The upward trend reflects the general increase in suppression over time, while the spread of individual trajectories underscores the importance of modeling both between-person predictors and residual individual variability.

Discussion

Suppression generally increased across assessments, yet youth followed noticeably different paths. Significant residual variance in both intercepts and slopes — after accounting for covariates — shows that demographic and socioeconomic factors explain only part of the heterogeneity, with substantial individual variability remaining.

Age, sex, and socioeconomic markers each contributed to understanding trajectory differences. Older youth tended to enter with higher suppression but progress more slowly thereafter, suggesting a ceiling effect or developmental plateau. Sex differences appeared primarily in growth rates rather than starting levels. Household income and parental education each predicted baseline suppression in expected directions, with lower socioeconomic resources associated with higher initial levels.

These patterns highlight meaningful stratification in both starting points and growth rates, while the remaining residual variance underscores that unmeasured factors also shape suppression development. Extensions include adding interaction terms to test whether covariate effects differ across subgroups, mediation models where initial status predicts subsequent change, and multiple-group specifications for formal cross-population comparisons.

Additional Resources

OpenMx Growth Curve Tutorial

DOCS

Official OpenMx documentation covering latent growth curve models in the RAM framework, including specifications with covariates and multiple-group extensions.

Visit Resource

McArdle & Nesselroade (2014): Longitudinal Data Analysis

BOOK

Comprehensive textbook on latent variable approaches to longitudinal data, including growth curve models with covariates specified in the Mx/OpenMx tradition.

Visit Resource

Centering in Growth Models

PAPER

Best practices for centering time-invariant predictors in latent growth curve models. Discusses grand-mean centering versus group-mean centering and their interpretational implications (Enders & Tofighi, 2007). Note: access may require institutional or paid subscription.

Visit Resource