Overview

Latent Growth Curve Modeling (LGCM) analyzes longitudinal change by estimating growth trajectories as latent factors while distinguishing systematic development from measurement error. Using intercept and slope parameters, LGCM captures both population-average patterns and individual differences in developmental processes, providing more accurate estimates than traditional repeated measures approaches. This tutorial applies LGCM to examine emotional suppression in ABCD youth across four annual assessments (Years 3–6), estimating the average trajectory and individual variation in initial levels and rates of change.

When to Use:

Ideal when you have repeated measures and want to model the average growth trajectory plus individual deviations.

Key Advantage:

LGCM provides latent intercept and slope factors, so you can quantify both initial status and change over time while separating true developmental change from measurement noise.

What You'll Learn:

How to specify a basic LGCM in OpenMx, interpret intercept and slope estimates (means, variances, and their covariance), and assess overall model fit.

Data Access

Data Download

ABCD data can be accessed through the DEAP platform or the NBDC Data Access Platform (LASSO), which provide user-friendly interfaces for creating custom datasets with point-and-click variable selection. For detailed instructions on accessing and downloading ABCD data, see the DEAP documentation.

Loading Data with NBDCtools

Once you have downloaded ABCD data files, the NBDCtools package provides efficient tools for loading and preparing your data for analysis. The package handles common data management tasks including:

Automatic data joining - Merges variables from multiple tables automatically
Built-in transformations - Converts categorical variables to factors, handles missing data codes, and adds variable labels
Event filtering - Easily selects specific assessment waves

For more information, visit the NBDCtools documentation.

Basic Usage

The create_dataset() function is the main tool for loading ABCD data:

library(NBDCtools)

# Define variables needed for this analysis
requested_vars <- c(
  "var_1",   # Variable 1
  "var_2",   # Variable 2
  "var_3"    # Variable 3
)

# Set path to downloaded ABCD data files
data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

# Load data with automatic transformations
abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)

Key Parameters

vars - Vector of variable names to load
release - ABCD data release version (e.g., "6.0")
format - File format, typically "parquet" for efficiency
categ_to_factor - Automatically converts categorical variables to factors
value_to_na - Converts ABCD missing value codes to R's NA
add_labels - Adds descriptive labels to variables and values

Additional Resources

For more details on using NBDCtools:

NBDCtools Getting Started Guide - Complete package overview
Joining Data - Advanced data merging strategies
Filtering Events - Selecting specific assessment waves
Data Transformations - Preprocessing and cleaning

Data Preparation

NBDCtools Setup and Data Loading

28 lines

### Load necessary libraries
library(NBDCtools)    # ABCD data access helper
library(tidyverse)    # Collection of R packages for data science
library(arrow)        # For reading Parquet files
library(gtsummary)    # Creating publication-quality tables
library(OpenMx)       # Matrix-based SEM engine
library(broom)        # For tidying model outputs
library(gt)           # For creating formatted tables

### Load harmonized ABCD data required for this analysis
requested_vars <- c(
    "ab_g_dyn__design_site",
    "ab_g_stc__design_id__fam",
    "mh_y_erq__suppr_mean"
)

data_dir <- Sys.getenv("ABCD_DATA_PATH", "/path/to/abcd/6_0/phenotype")

abcd_data <- create_dataset(
  dir_data = data_dir,
  study = "abcd",
  vars = requested_vars,
  release = "6.0",
  format = "parquet",
  categ_to_factor = TRUE,   # Convert categorical variables to factors
  value_to_na = TRUE,        # Convert missing codes (222, 333, etc.) to NA
  add_labels = TRUE          # Add variable and value labels
)

Show all 28 linesShow less

Data Transformation

27 lines

### Create a long-form dataset with relevant columns
df_long <- abcd_data %>%
  select(participant_id, session_id, ab_g_dyn__design_site, ab_g_stc__design_id__fam, mh_y_erq__suppr_mean) %>%
  # Filter to Years 3-6 annual assessments using NBDCtools
  filter_events_abcd(conditions = c("annual", ">=3", "<=6")) %>%
  arrange(participant_id, session_id) %>%
  mutate(
    session_id = factor(session_id,
                        levels = c("ses-03A", "ses-04A", "ses-05A", "ses-06A"),
                        labels = c("Year_3", "Year_4", "Year_5", "Year_6"))  # Relabel sessions for clarity
  ) %>%
  rename(  # Rename for simplicity
    site = ab_g_dyn__design_site,
    family_id = ab_g_stc__design_id__fam,
    suppression = mh_y_erq__suppr_mean
  ) %>%
  droplevels() %>%                                     # Drop unused factor levels
  drop_na(suppression)                                 # Remove rows with missing outcome data

### Reshape data from long to wide format
df_wide <- df_long %>%
  pivot_wider(
    names_from = session_id,
    values_from = suppression,
    names_prefix = "Suppression_"
  ) %>%
  drop_na(starts_with("Suppression_"))  # Require complete data across all time points

Show all 27 linesShow less

Descriptive Statistics

26 lines

### Create descriptive summary table
descriptives_table <- df_long %>%
  select(session_id, suppression) %>%
  tbl_summary(
    by = session_id,
    missing = "no",
    label = list(
      suppression ~ "Suppression"
    ),
    statistic = list(all_continuous() ~ "{mean} ({sd})")
  ) %>%
  modify_header(all_stat_cols() ~ "{level}<br>N = {n}") %>%
  modify_spanning_header(all_stat_cols() ~ "Assessment Wave") %>%
  bold_labels() %>%
  italicize_levels()

### Apply compact styling
theme_gtsummary_compact()

descriptives_table <- as_gt(descriptives_table)

### Save the table as HTML
gt::gtsave(descriptives_table, filename = "descriptives_table.html")

### Print the table
descriptives_table

Show all 26 linesShow less

Characteristic	Assessment Wave
Characteristic	Year_3 N = 10318¹	Year_4 N = 9586¹	Year_5 N = 8784¹	Year_6 N = 5001¹
Suppression	3.10 (0.86)	3.35 (0.86)	3.35 (0.87)	3.42 (0.88)
¹ Mean (SD)

Statistical Analysis

Define and Fit Basic LGCM with OpenMx

54 lines

### Prepare data for OpenMx
# OpenMx expects a plain data.frame with only the manifest variables
mx_data <- df_wide %>%
  select(starts_with("Suppression_")) %>%
  as.data.frame()

manifest_vars <- c("Suppression_Year_3", "Suppression_Year_4",
                    "Suppression_Year_5", "Suppression_Year_6")
latent_vars <- c("intercept", "slope")

### Build the OpenMx growth model
model <- mxModel(
  "BasicLGCM",
  type = "RAM",
  manifestVars = manifest_vars,
  latentVars = latent_vars,

  # Data
  mxData(observed = mx_data, type = "raw"),

  # Factor loadings: intercept loads 1 on all indicators
  mxPath(from = "intercept", to = manifest_vars,
         free = FALSE, values = c(1, 1, 1, 1)),

  # Factor loadings: slope loads 0, 1, 2, 3 (linear time coding)
  mxPath(from = "slope", to = manifest_vars,
         free = FALSE, values = c(0, 1, 2, 3)),

  # Latent means (intercept and slope means)
  mxPath(from = "one", to = latent_vars,
         free = TRUE, values = c(3.0, 0.1),
         labels = c("mean_intercept", "mean_slope")),

  # Latent variances and covariance
  mxPath(from = latent_vars, arrows = 2,
         connect = "unique.pairs", free = TRUE,
         values = c(0.5, -0.02, 0.05),
         labels = c("var_intercept", "cov_i_s", "var_slope")),

  # Residual variances (freely estimated per time point)
  mxPath(from = manifest_vars, arrows = 2,
         free = TRUE, values = rep(0.3, 4),
         labels = c("resvar_yr3", "resvar_yr4", "resvar_yr5", "resvar_yr6")),

  # Zero manifest means (means captured by latent factors)
  mxPath(from = "one", to = manifest_vars,
         free = FALSE, values = 0)
)

### Fit the model
fit <- mxRun(model)

### Display model summary
summary(fit)

Show all 54 linesShow less

Format Model Summary Table

26 lines

### Extract parameter estimates into a tidy table
param_table <- summary(fit)$parameters

model_summary_table <- param_table %>%
  select(name, Estimate, Std.Error) %>%
  mutate(
    z_value = Estimate / Std.Error,
    p_value = 2 * pnorm(-abs(z_value))
  ) %>%
  gt() %>%
  tab_header(title = "Latent Growth Curve Model Results (OpenMx)") %>%
  fmt_number(columns = c(Estimate, Std.Error, z_value, p_value), decimals = 3) %>%
  cols_label(
    name = "Parameter",
    Estimate = "Estimate",
    Std.Error = "Std. Error",
    z_value = "z",
    p_value = "p"
  )

### Save the gt table
gt::gtsave(
  data = model_summary_table,
  filename = "model_summary.html",
  inline_css = FALSE
)

Show all 26 linesShow less

Format Model Fit Indices Table

36 lines

### Compute reference models for incremental fit indices
# OpenMx requires explicit saturated and independence models to compute
# chi-squared, CFI, TLI, and RMSEA
ref_models <- mxRefModels(fit, run = TRUE)
mx_summary <- summary(fit, refModels = ref_models)

# Extract fit indices
fit_data <- data.frame(
  Metric = c("chi-squared", "df", "p-value", "CFI", "TLI", "RMSEA", "AIC", "BIC"),
  Value = c(
    mx_summary$Chi,
    mx_summary$ChiDoF,
    mx_summary$p,
    mx_summary$CFI,
    mx_summary$TLI,
    mx_summary$RMSEA,
    mx_summary$AIC.Mx,
    mx_summary$BIC.Mx
  )
)

fit_indices_table <- fit_data %>%
  gt() %>%
  tab_header(title = "Model Fit Indices (OpenMx)") %>%
  fmt_number(columns = Value, decimals = 3) %>%
  cols_label(
    Metric = "Fit Measure",
    Value = "Value"
  )

### Save fit indices table
gt::gtsave(
  data = fit_indices_table,
  filename = "model_fit_indices.html",
  inline_css = FALSE
)

Show all 36 linesShow less

Parameter	Estimate	Std. Error	z	p
Latent Growth Curve Model Results (OpenMx)
resvar_yr3	0.439	0.016	26.880	0.000
resvar_yr4	0.424	0.012	36.743	0.000
resvar_yr5	0.376	0.011	35.180	0.000
resvar_yr6	0.292	0.015	20.147	0.000
var_intercept	0.322	0.015	21.066	0.000
cov_i_s	−0.039	0.006	−6.492	0.000
var_slope	0.046	0.003	13.426	0.000
mean_intercept	3.109	0.012	253.293	0.000
mean_slope	0.110	0.005	20.502	0.000

Fit Measure	Value
Model Fit Indices (OpenMx)
chi-squared	180.816
df	5.000
p-value	0.000
CFI	0.949
TLI	0.938
RMSEA	0.092
AIC	5,930.889
BIC	−100,454.265

Interpretation

The LGCM fit was generally strong (CFI = 0.949, TLI = 0.938, RMSEA = 0.092), with only the RMSEA hinting at modest residual misfit. Average suppression at Year 3 was 3.109 (SE = 0.012, p < .001) and rose by 0.110 points per year (SE = 0.005, p < .001), indicating a small but reliable increase. Intercept and slope variances (0.322 and 0.046, both p < .001) confirmed that adolescents differed markedly in both starting levels and rates of change. The negative intercept-slope covariance (-0.039, p < .001) implies that youth who began with high suppression tended to grow more slowly, whereas those starting lower closed the gap. Residual variances declined from 0.439 at Year 3 to 0.292 by Year 6, suggesting that measurements became more stable across successive assessments. Overall, the model depicts a cohort-wide rise in suppression layered on top of substantial between-person heterogeneity.

Visualization

23 lines

### Select a subset of participants
n_sample <- min(150, length(unique(df_long$participant_id)))
selected_ids <- sample(unique(df_long$participant_id), n_sample)
df_long_selected <- df_long %>% filter(participant_id %in% selected_ids)

### Plot Suppression Growth
visualization <- ggplot(df_long_selected, aes(x = session_id, y = suppression, group = participant_id)) +
    geom_line(alpha = 0.3, color = "gray") +
    geom_point(size = 1.5, color = "blue") +
    geom_smooth(aes(group = 1), method = "lm", color = "red", linewidth = 1.2, se = TRUE, fill = "lightpink") +
    labs(
        title = "Emotional Suppression Trajectories Over Time",
        subtitle = "Basic LGCM — Years 3 to 6",
        x = "Time (Years from Baseline)",
        y = "Suppression Score"
    ) +
    theme_minimal()

ggsave(
  filename = "visualization.png",
  plot = visualization,
  width = 8, height = 6, dpi = 300
)

Show all 23 linesShow less

Visualization Notes

Each gray line shows a participant's suppression trajectory across the four assessments, while blue points mark the observed scores and the red line traces the sample-wide mean. The upward tilt of the red line confirms the cohort-level increase in suppression, and the fan of gray lines illustrates the individual heterogeneity that the latent growth curve model is designed to capture.

Discussion

The analysis reveals heterogeneous suppression trajectories, with the overall trend indicating increasing suppression over time while individual trajectories vary substantially. The significant slope variance confirms that adolescents follow meaningfully different developmental paths — some rise steeply, others remain stable or even decline. The negative intercept-slope covariance indicates that youth who begin with higher suppression tend to show slower growth, consistent with a regression-to-the-mean or ceiling-effect pattern.

The inclusion of both random intercepts and slopes provides a flexible framework for understanding variability in initial suppression levels and growth rates. Compared to a model with only fixed effects, the random-slope specification captures the heterogeneity visible in individual trajectory plots and yields more realistic standard errors for the population-average trend.

Extensions of the basic LGCM include conditional models with time-invariant covariates (to explain why individuals differ), piecewise specifications that allow change rates to differ across developmental periods, and multivariate models that examine co-development across multiple constructs. These extensions build directly on the intercept-slope framework established here.

Additional Resources

OpenMx Growth Model Tutorial

DOCS

Official OpenMx documentation for latent growth curve modeling, covering RAM-type specification, path diagrams, and model comparison in the matrix-based SEM framework.

Visit Resource

OpenMx User Guide

DOCS

Comprehensive user guide for the OpenMx package, including detailed coverage of RAM models, LISREL specification, data handling, and optimization options.

Visit Resource

Neale et al. (2016) — OpenMx 2.0

PAPER

The primary citation for OpenMx, describing the structural equation modeling framework, matrix algebra approach, and full information maximum likelihood estimation used in this tutorial.

Visit Resource