Latent Growth Curve Models (LGCM)

Basic (OpenMx) (LGCM)

Estimate average emotional suppression trajectories, growth rates, and individual variability across repeated ABCD assessments using latent growth curve modeling in OpenMx.

LGCM โ€บ Continuous
OpenMxabcd-studytrajectorygrowth
Work in ProgressExamples are a work in progress. Please exercise caution when using code examples, as they may not be fully verified. If you spot gaps, errors, or have suggestions, we'd love your feedbackโ€”use the "Suggest changes" button to help us improve!

Overview

Latent Growth Curve Modeling (LGCM) analyzes longitudinal change by estimating growth trajectories as latent factors while distinguishing systematic development from measurement error. Using intercept and slope parameters, LGCM captures both population-average patterns and individual differences in developmental processes, providing more accurate estimates than traditional repeated measures approaches. This tutorial applies LGCM to examine emotional suppression in ABCD youth across four annual assessments (Years 3โ€“6), estimating the average trajectory and individual variation in initial levels and rates of change.

When to Use:
Ideal when you have repeated measures and want to model the average growth trajectory plus individual deviations.
Key Advantage:
LGCM provides latent intercept and slope factors, so you can quantify both initial status and change over time while separating true developmental change from measurement noise.
What You'll Learn:
How to specify a basic LGCM in OpenMx, interpret intercept and slope estimates (means, variances, and their covariance), and assess overall model fit.

Data Access

Data Download

ABCD data can be accessed through the DEAP platform or the NBDC Data Access Platform (LASSO), which provide user-friendly interfaces for creating custom datasets with point-and-click variable selection. For detailed instructions on accessing and downloading ABCD data, see the DEAP documentation.

Loading Data with NBDCtools

Once you have downloaded ABCD data files, the NBDCtools package provides efficient tools for loading and preparing your data for analysis. The package handles common data management tasks including:

  • Automatic data joining - Merges variables from multiple tables automatically
  • Built-in transformations - Converts categorical variables to factors, handles missing data codes, and adds variable labels
  • Event filtering - Easily selects specific assessment waves

For more information, visit the NBDCtools documentation.

Basic Usage

The create_dataset() function is the main tool for loading ABCD data:

library(NBDCtools)

# Define variables needed for this analysis
requested_vars <- c(
  "var_1",   # Variable 1
  "var_2",   # Variable 2
  "var_3"    # Variable 3
)

# Set path to downloaded ABCD data files
data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

# Load data with automatic transformations
abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)
Key Parameters
  • vars - Vector of variable names to load
  • release - ABCD data release version (e.g., "6.0")
  • format - File format, typically "parquet" for efficiency
  • categ_to_factor - Automatically converts categorical variables to factors
  • value_to_na - Converts ABCD missing value codes to R's NA
  • add_labels - Adds descriptive labels to variables and values
Additional Resources

For more details on using NBDCtools:

Data Preparation

NBDCtools Setup and Data Loading
R28 lines
### Load necessary libraries
library(NBDCtools)    # ABCD data access helper
library(tidyverse)    # Collection of R packages for data science
library(arrow)        # For reading Parquet files
library(gtsummary)    # Creating publication-quality tables
library(OpenMx)       # Matrix-based SEM engine
library(broom)        # For tidying model outputs
library(gt)           # For creating formatted tables

### Load harmonized ABCD data required for this analysis
requested_vars <- c(
    "ab_g_dyn__design_site",
    "ab_g_stc__design_id__fam",
    "mh_y_erq__suppr_mean"
)

data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)
Data Transformation
R27 lines
### Create a long-form dataset with relevant columns
df_long <- abcd_data %>%
  select(participant_id, session_id, ab_g_dyn__design_site, ab_g_stc__design_id__fam, mh_y_erq__suppr_mean) %>%
  # Filter to Years 3-6 annual assessments using NBDCtools
  filter_events_abcd(conditions = c("annual", ">=3", "<=6")) %>%
  arrange(participant_id, session_id) %>%
  mutate(
    session_id = factor(session_id,
                        levels = c("ses-03A", "ses-04A", "ses-05A", "ses-06A"),
                        labels = c("Year_3", "Year_4", "Year_5", "Year_6"))  # Relabel sessions for clarity
  ) %>%
  rename(  # Rename for simplicity
    site = ab_g_dyn__design_site,
    family_id = ab_g_stc__design_id__fam,
    suppression = mh_y_erq__suppr_mean
  ) %>%
  droplevels() %>%                                     # Drop unused factor levels
  drop_na(suppression)                                 # Remove rows with missing outcome data

### Reshape data from long to wide format
df_wide <- df_long %>%
  pivot_wider(
    names_from = session_id,
    values_from = suppression,
    names_prefix = "Suppression_"
  ) %>%
  drop_na(starts_with("Suppression_"))  # Require complete data across all time points
Descriptive Statistics
R26 lines
### Create descriptive summary table
descriptives_table <- df_long %>%
  select(session_id, suppression) %>%
  tbl_summary(
    by = session_id,
    missing = "no",
    label = list(
      suppression ~ "Suppression"
    ),
    statistic = list(all_continuous() ~ "{mean} ({sd})")
  ) %>%
  modify_header(all_stat_cols() ~ "{level}<br>N = {n}") %>%
  modify_spanning_header(all_stat_cols() ~ "Assessment Wave") %>%
  bold_labels() %>%
  italicize_levels()

### Apply compact styling
theme_gtsummary_compact()

descriptives_table <- as_gt(descriptives_table)

### Save the table as HTML
gt::gtsave(descriptives_table, filename = "descriptives_table.html")

### Print the table
descriptives_table
Characteristic
Assessment Wave
Year_3
N = 10318
1
Year_4
N = 9586
1
Year_5
N = 8784
1
Year_6
N = 5001
1
Suppression 3.10 (0.86) 3.35 (0.86) 3.35 (0.87) 3.42 (0.88)
1 Mean (SD)

Statistical Analysis

Define and Fit Basic LGCM with OpenMx
R54 lines
### Prepare data for OpenMx
# OpenMx expects a plain data.frame with only the manifest variables
mx_data <- df_wide %>%
  select(starts_with("Suppression_")) %>%
  as.data.frame()

manifest_vars <- c("Suppression_Year_3", "Suppression_Year_4",
                    "Suppression_Year_5", "Suppression_Year_6")
latent_vars <- c("intercept", "slope")

### Build the OpenMx growth model
model <- mxModel(
  "BasicLGCM",
  type = "RAM",
  manifestVars = manifest_vars,
  latentVars = latent_vars,

  # Data
  mxData(observed = mx_data, type = "raw"),

  # Factor loadings: intercept loads 1 on all indicators
  mxPath(from = "intercept", to = manifest_vars,
         free = FALSE, values = c(1, 1, 1, 1)),

  # Factor loadings: slope loads 0, 1, 2, 3 (linear time coding)
  mxPath(from = "slope", to = manifest_vars,
         free = FALSE, values = c(0, 1, 2, 3)),

  # Latent means (intercept and slope means)
  mxPath(from = "one", to = latent_vars,
         free = TRUE, values = c(3.0, 0.1),
         labels = c("mean_intercept", "mean_slope")),

  # Latent variances and covariance
  mxPath(from = latent_vars, arrows = 2,
         connect = "unique.pairs", free = TRUE,
         values = c(0.5, -0.02, 0.05),
         labels = c("var_intercept", "cov_i_s", "var_slope")),

  # Residual variances (freely estimated per time point)
  mxPath(from = manifest_vars, arrows = 2,
         free = TRUE, values = rep(0.3, 4),
         labels = c("resvar_yr3", "resvar_yr4", "resvar_yr5", "resvar_yr6")),

  # Zero manifest means (means captured by latent factors)
  mxPath(from = "one", to = manifest_vars,
         free = FALSE, values = 0)
)

### Fit the model
fit <- mxRun(model)

### Display model summary
summary(fit)
Format Model Summary Table
R26 lines
### Extract parameter estimates into a tidy table
param_table <- summary(fit)$parameters

model_summary_table <- param_table %>%
  select(name, Estimate, Std.Error) %>%
  mutate(
    z_value = Estimate / Std.Error,
    p_value = 2 * pnorm(-abs(z_value))
  ) %>%
  gt() %>%
  tab_header(title = "Latent Growth Curve Model Results (OpenMx)") %>%
  fmt_number(columns = c(Estimate, Std.Error, z_value, p_value), decimals = 3) %>%
  cols_label(
    name = "Parameter",
    Estimate = "Estimate",
    Std.Error = "Std. Error",
    z_value = "z",
    p_value = "p"
  )

### Save the gt table
gt::gtsave(
  data = model_summary_table,
  filename = "model_summary.html",
  inline_css = FALSE
)
Format Model Fit Indices Table
R36 lines
### Compute reference models for incremental fit indices
# OpenMx requires explicit saturated and independence models to compute
# chi-squared, CFI, TLI, and RMSEA
ref_models <- mxRefModels(fit, run = TRUE)
mx_summary <- summary(fit, refModels = ref_models)

# Extract fit indices
fit_data <- data.frame(
  Metric = c("chi-squared", "df", "p-value", "CFI", "TLI", "RMSEA", "AIC", "BIC"),
  Value = c(
    mx_summary$Chi,
    mx_summary$ChiDoF,
    mx_summary$p,
    mx_summary$CFI,
    mx_summary$TLI,
    mx_summary$RMSEA,
    mx_summary$AIC.Mx,
    mx_summary$BIC.Mx
  )
)

fit_indices_table <- fit_data %>%
  gt() %>%
  tab_header(title = "Model Fit Indices (OpenMx)") %>%
  fmt_number(columns = Value, decimals = 3) %>%
  cols_label(
    Metric = "Fit Measure",
    Value = "Value"
  )

### Save fit indices table
gt::gtsave(
  data = fit_indices_table,
  filename = "model_fit_indices.html",
  inline_css = FALSE
)
Latent Growth Curve Model Results (OpenMx)
Parameter Estimate Std. Error z p
resvar_yr3 0.439 0.016 26.880 0.000
resvar_yr4 0.424 0.012 36.743 0.000
resvar_yr5 0.376 0.011 35.180 0.000
resvar_yr6 0.292 0.015 20.147 0.000
var_intercept 0.322 0.015 21.066 0.000
cov_i_s โˆ’0.039 0.006 โˆ’6.492 0.000
var_slope 0.046 0.003 13.426 0.000
mean_intercept 3.109 0.012 253.293 0.000
mean_slope 0.110 0.005 20.502 0.000
Model Fit Indices (OpenMx)
Fit Measure Value
chi-squared 180.816
df 5.000
p-value 0.000
CFI 0.949
TLI 0.938
RMSEA 0.092
AIC 5,930.889
BIC โˆ’100,454.265
Interpretation
Interpretation

The LGCM fit was generally strong (CFI = 0.949, TLI = 0.938, RMSEA = 0.092), with only the RMSEA hinting at modest residual misfit. Average suppression at Year 3 was 3.109 (SE = 0.012, p < .001) and rose by 0.110 points per year (SE = 0.005, p < .001), indicating a small but reliable increase. Intercept and slope variances (0.322 and 0.046, both p < .001) confirmed that adolescents differed markedly in both starting levels and rates of change. The negative intercept-slope covariance (-0.039, p < .001) implies that youth who began with high suppression tended to grow more slowly, whereas those starting lower closed the gap. Residual variances declined from 0.439 at Year 3 to 0.292 by Year 6, suggesting that measurements became more stable across successive assessments. Overall, the model depicts a cohort-wide rise in suppression layered on top of substantial between-person heterogeneity.

Visualization
R23 lines
### Select a subset of participants
n_sample <- min(150, length(unique(df_long$participant_id)))
selected_ids <- sample(unique(df_long$participant_id), n_sample)
df_long_selected <- df_long %>% filter(participant_id %in% selected_ids)

### Plot Suppression Growth
visualization <- ggplot(df_long_selected, aes(x = session_id, y = suppression, group = participant_id)) +
    geom_line(alpha = 0.3, color = "gray") +
    geom_point(size = 1.5, color = "blue") +
    geom_smooth(aes(group = 1), method = "lm", color = "red", linewidth = 1.2, se = TRUE, fill = "lightpink") +
    labs(
        title = "Emotional Suppression Trajectories Over Time",
        subtitle = "Basic LGCM โ€” Years 3 to 6",
        x = "Time (Years from Baseline)",
        y = "Suppression Score"
    ) +
    theme_minimal()

ggsave(
  filename = "visualization.png",
  plot = visualization,
  width = 8, height = 6, dpi = 300
)
Emotional Suppression Trajectory Plot
Interpretation
Visualization Notes

Each gray line shows a participant's suppression trajectory across the four assessments, while blue points mark the observed scores and the red line traces the sample-wide mean. The upward tilt of the red line confirms the cohort-level increase in suppression, and the fan of gray lines illustrates the individual heterogeneity that the latent growth curve model is designed to capture.

Discussion

The analysis reveals heterogeneous suppression trajectories, with the overall trend indicating increasing suppression over time while individual trajectories vary substantially. The significant slope variance confirms that adolescents follow meaningfully different developmental paths โ€” some rise steeply, others remain stable or even decline. The negative intercept-slope covariance indicates that youth who begin with higher suppression tend to show slower growth, consistent with a regression-to-the-mean or ceiling-effect pattern.

The inclusion of both random intercepts and slopes provides a flexible framework for understanding variability in initial suppression levels and growth rates. Compared to a model with only fixed effects, the random-slope specification captures the heterogeneity visible in individual trajectory plots and yields more realistic standard errors for the population-average trend.

Extensions of the basic LGCM include conditional models with time-invariant covariates (to explain why individuals differ), piecewise specifications that allow change rates to differ across developmental periods, and multivariate models that examine co-development across multiple constructs. These extensions build directly on the intercept-slope framework established here.

Additional Resources

3

OpenMx Growth Model Tutorial

DOCS

Official OpenMx documentation for latent growth curve modeling, covering RAM-type specification, path diagrams, and model comparison in the matrix-based SEM framework.

Visit Resource

OpenMx User Guide

DOCS

Comprehensive user guide for the OpenMx package, including detailed coverage of RAM models, LISREL specification, data handling, and optimization options.

Visit Resource

Neale et al. (2016) โ€” OpenMx 2.0

PAPER

The primary citation for OpenMx, describing the structural equation modeling framework, matrix algebra approach, and full information maximum likelihood estimation used in this tutorial.

Visit Resource