Hierarchical ODE State Space Model for Unemployment Dynamics

A Bayesian Approach with Pooled Labor Market Parameters

WORK IN PROGRESS

This report is currently under development. Not all statements, interpretations, and conclusions have been reviewed or validated by a human researcher. Please treat all content as preliminary analysis subject to revision.

Overview

This report presents a hierarchical Bayesian state space model for unemployment dynamics across education levels. The model explicitly represents labor market flows through an ordinary differential equation (ODE) framework, with hierarchical pooling to share information across education groups.

Key Features

Economic Foundation - Explicit labor market dynamics: \(\frac{dU}{dt} = s(1-U) - fU\) - Separation rate (\(s\)): probability of job loss per month - Finding rate (\(f\)): probability of job finding per month - Equilibrium unemployment: \(u^* = \frac{s}{s+f}\)

Hierarchical Structure - Population-level means with education-specific deviations - Pooling strengthens inference for small groups - Hierarchical parameters: equilibrium unemployment, adjustment speed, shock magnitudes, recovery rates, seasonal effects, spline smoothness

Shock Modeling - Explicit shock effects on separation rates (2008, 2020) - Education-specific shock magnitudes with pooling - Decay parameters capture recovery dynamics

Technical Innovations - Spline basis for smooth residual variation (25 basis functions) - Non-centered parameterization for sampling efficiency - Data-informed priors from exploratory model fits - Prior-centered initialization to avoid multimodality

Model Structure: A Comprehensive Mathematical Walkthrough

This section provides a detailed explanation of the Stan model, including the economic theory, mathematical derivations, and implementation details.

1. Economic Foundation: Labor Market Flow Dynamics

The model is built on the stock-flow framework from labor economics. At any point in time, the labor force is divided into two stocks: employed (\(E\)) and unemployed (\(U\)), with \(E + U = L\) (total labor force). Workers flow between these states:

       separation (s)
  E ─────────────────────→ U
    ←────────────────────
       finding (f)

1.1 The Continuous-Time ODE

The rate of change in unemployment follows the ordinary differential equation:

\[\frac{dU}{dt} = s \cdot E - f \cdot U = s \cdot (L - U) - f \cdot U\]

Dividing by \(L\) to get rates (lowercase \(u = U/L\)):

\[\frac{du}{dt} = s \cdot (1-u) - f \cdot u\]

Term-by-term interpretation:

  • \(s \cdot (1-u)\): Inflow to unemployment — employed workers (fraction \(1-u\)) lose jobs at rate \(s\)
  • \(f \cdot u\): Outflow from unemployment — unemployed workers (fraction \(u\)) find jobs at rate \(f\)
  • Net change = inflow − outflow

1.2 Equilibrium Analysis

At equilibrium, \(\frac{du}{dt} = 0\), so:

\[s \cdot (1-u^*) = f \cdot u^*\]

Solving for \(u^*\):

\[s - s \cdot u^* = f \cdot u^*\] \[s = u^* \cdot (s + f)\] \[\boxed{u^* = \frac{s}{s + f}}\]

Economic implications:

  • If \(s = 0.02\) (2% monthly separation) and \(f = 0.30\) (30% monthly finding): \(u^* = 0.02/(0.02+0.30) = 6.25\%\)
  • Higher education → Lower \(s\), higher \(f\) → Lower equilibrium unemployment
  • Shocks temporarily increase \(s\) → Push \(u\) above equilibrium

1.3 Dynamics: Speed of Adjustment

The ODE can be rewritten as:

\[\frac{du}{dt} = s - (s + f) \cdot u = (s + f) \cdot \left(\frac{s}{s+f} - u\right) = (s + f) \cdot (u^* - u)\]

This shows that \(u\) moves toward \(u^*\) at rate \((s + f)\). The half-life of adjustment is:

\[t_{1/2} = \frac{\ln(2)}{s + f}\]

With \(s + f \approx 0.32\) per month, the half-life is about 2 months — unemployment adjusts quickly to equilibrium.

2. Extending the Basic Model

The real world is more complex. We extend the model with:

2.1 Education Heterogeneity

Different education levels have different labor market characteristics:

\[\frac{du_i}{dt} = s_i \cdot (1-u_i) - f_i \cdot u_i\]

where \(i\) indexes education level. We use hierarchical priors:

\[s_i \sim \text{Normal}(\mu_s, \sigma_s) \quad \text{with } \mu_s \sim \text{Normal}(0.02, 0.01)\]

This allows education levels to share information while having distinct rates.

2.2 Economic Shocks

During recessions, separation rates spike. We model this as:

\[s_i^{\text{eff}}(t) = s_i + I_{2008}(t) \cdot \delta_{2008,i} + I_{2020}(t) \cdot \delta_{2020,i}\]

where \(I(t)\) is a shock impulse function and \(\delta_i\) is the education-specific shock effect.

Impulse function design (for 2008):

\[I_{2008}(t) = \begin{cases} 0 & t < t_{\text{onset}} \\ \frac{t - t_{\text{onset}}}{t_{\text{peak}} - t_{\text{onset}}} & t_{\text{onset}} \leq t \leq t_{\text{peak}} \\ e^{-\lambda_i (t - t_{\text{peak}})} & t > t_{\text{peak}} \end{cases}\]

  • Before onset (\(t < 2007.75\)): No effect
  • Rise phase (\(2007.75 \leq t \leq 2009.5\)): Linear rise as crisis develops
  • Decay phase (\(t > 2009.5\)): Exponential recovery with education-specific decay rate \(\lambda_i\)

The half-life of recovery is \(t_{1/2} = \ln(2)/\lambda_i\). Different education levels may recover at different speeds.

2.3 Seasonal Effects

The model includes a direct seasonal effect on unemployment (on the logit scale):

\[\text{logit}(u_{t,i}) = \text{logit}(u_{t-1,i}) + \Delta u_{t,i} + \gamma^u_{m(t),i} + \epsilon_{t,i}\]

This captures observed seasonal patterns such as academic calendar effects for PhDs, summer employment fluctuations, and annual hiring cycles. Prior: \(\gamma^u_{m,i} \sim \text{Normal}(0, 0.05)\) with a sum-to-zero constraint.

Design choice: We use a direct effect on unemployment rather than modulating the finding rate because the two approaches are not separately identifiable from the data. The direct approach is simpler and more flexible.

2.4 State Space Formulation

We observe counts (binomial samples from the latent unemployment rate), not the true rate. The model has two layers:

Process model (latent state evolution): \[\text{logit}(u_{t,i}) = \text{logit}(u_{t-1,i}) + \Delta u_{t,i} + \epsilon_{t,i}\]

where: - \(\Delta u_{t,i}\) is the ODE-predicted change - \(\epsilon_{t,i} \sim \text{Normal}(0, \sigma_{\text{state}})\) is the innovation (captures unexplained dynamics)

Observation model (data likelihood): \[n_{\text{unemployed},t,i} \sim \text{Beta-Binomial}(n_{\text{total},t,i}, u_{t,i} \cdot \phi, (1-u_{t,i}) \cdot \phi)\]

The beta-binomial distribution allows for overdispersion (more variance than binomial) through parameter \(\phi\).

3. Stan Implementation: Block by Block

The Stan model implements this in several blocks:

3.1 Functions Block

functions {
  real shock_impulse(real t, real onset, real peak, real decay_rate) {
    if (t < onset) return 0;
    else if (t <= peak) return (t - onset) / (peak - onset);
    else return exp(-decay_rate * (t - peak));
  }
}

This defines the impulse response function \(I(t)\) for shocks.

3.2 Data Block

data {
  int<lower=1> T;                           // Time points (months)
  int<lower=1> N_edu;                       // Education levels
  array[T, N_edu] int<lower=0> n_unemployed; // Counts
  array[T, N_edu] int<lower=0> n_total;      // Denominators
  array[T] int<lower=1, upper=12> month;     // For seasonality
  array[T] real<lower=0> year_frac;          // Continuous time
  // Shock timing...
}

3.3 Transformed Data Block

Pre-computes shock rise phases and time-since-peak for efficiency:

transformed data {
  array[T] real shock_2008_rise;
  array[T] real time_since_2008_peak;
  // ... (computed from onset/peak timing)
}

3.4 Parameters Block

parameters {
  // Latent states
  vector[N_edu] logit_u_init;              // Initial unemployment
  array[T-1] vector[N_edu] logit_u_innov;  // Innovations

  // Separation rates (hierarchical)
  real<lower=0> mu_separation;
  real<lower=0> sigma_separation;
  vector<lower=0, upper=0.2>[N_edu] separation_rate;

  // Finding rates (hierarchical)
  real<lower=0> mu_finding;
  real<lower=0> sigma_finding;
  vector<lower=0, upper=1>[N_edu] finding_rate;

  // Shock effects (education-specific)
  vector<lower=0>[N_edu] shock_2008_effect;
  vector<lower=0>[N_edu] shock_2020_effect;
  vector<lower=0.1, upper=5>[N_edu] decay_2008;  // Education-specific decay
  vector<lower=0.1, upper=5>[N_edu] decay_2020;

  // Seasonal effects (11 free, 12th constrained)
  matrix[11, N_edu] seasonal_raw;

  // Noise parameters
  real<lower=0> sigma_state;
  real<lower=1> phi;
}

3.5 Transformed Parameters Block

This is where the ODE dynamics are computed:

transformed parameters {
  array[T] vector<lower=0, upper=1>[N_edu] u;
  matrix[12, N_edu] seasonal;
  array[T] vector[N_edu] logit_u;
  array[T] vector[N_edu] shock_2008_intensity;
  array[T] vector[N_edu] shock_2020_intensity;

  // Build sum-to-zero seasonal constraint
  for (i in 1:N_edu) {
    seasonal[1:11, i] = seasonal_raw[, i];
    seasonal[12, i] = -sum(seasonal_raw[, i]);
  }

  // Compute education-specific shock intensities
  for (t in 1:T) {
    for (i in 1:N_edu) {
      shock_2008_intensity[t][i] = shock_2008_rise[t] *
        exp(-decay_2008[i] * time_since_2008_peak[t]);
      shock_2020_intensity[t][i] = shock_2020_rise[t] *
        exp(-decay_2020[i] * time_since_2020_peak[t]);
    }
  }

  // Initialize
  logit_u[1] = logit_u_init;
  u[1] = inv_logit(logit_u[1]);

  // State evolution
  for (t in 2:T) {
    for (i in 1:N_edu) {
      // Effective separation rate
      real s_eff = separation_rate[i]
                   + shock_2008_intensity[t][i] * shock_2008_effect[i]
                   + shock_2020_intensity[t][i] * shock_2020_effect[i];

      // Effective finding rate with seasonal
      real f_eff = finding_rate[i] * (1 + seasonal[month[t], i]);

      // Discretized ODE
      real du_dt = s_eff * (1 - u[t-1][i]) - f_eff * u[t-1][i];

      // Evolution with innovation
      logit_u[t][i] = logit_u[t-1][i] + du_dt + logit_u_innov[t-1][i];
    }
    u[t] = inv_logit(logit_u[t]);
  }
}

3.6 Model Block (Priors and Likelihood)

model {
  // === PRIORS ===

  // Separation rates (1-3% per month from literature)
  mu_separation ~ normal(0.02, 0.01);
  sigma_separation ~ exponential(50);
  separation_rate ~ normal(mu_separation, sigma_separation);

  // Finding rates (20-40% per month)
  mu_finding ~ normal(0.30, 0.10);
  sigma_finding ~ exponential(5);
  finding_rate ~ normal(mu_finding, sigma_finding);

  // Shock effects (positive by constraint)
  shock_2008_effect ~ normal(0.02, 0.01);
  shock_2020_effect ~ normal(0.03, 0.015);

  // Decay rates (education-specific)
  decay_2008 ~ normal(0.5, 0.3);
  decay_2020 ~ normal(1.0, 0.5);

  // Seasonal effects (substantial amplitude allowed)
  to_vector(seasonal_raw) ~ normal(0, 0.15);

  // State innovations
  sigma_state ~ exponential(20);
  for (t in 1:(T-1)) {
    logit_u_innov[t] ~ normal(0, sigma_state);
  }

  // Overdispersion
  phi ~ exponential(0.05);

  // === LIKELIHOOD ===

  for (t in 1:T) {
    for (i in 1:N_edu) {
      if (n_total[t, i] > 0) {
        real u_safe = fmin(fmax(u[t][i], 1e-6), 1 - 1e-6);
        n_unemployed[t, i] ~ beta_binomial(n_total[t, i],
                                            u_safe * phi,
                                            (1 - u_safe) * phi);
      }
    }
  }
}

3.7 Generated Quantities Block

Computes derived quantities and model decomposition:

  • Equilibrium rates: \(u^*_i = s_i / (s_i + f_i)\)
  • Half-lives: \(t_{1/2} = \ln(2) / \lambda_i\) (education-specific)
  • Trend trajectory: Evolution without seasonal effects
  • Pure ODE trajectory: Evolution without innovations or seasonality
  • Seasonal effects: Difference between full model and trend
  • Log-likelihood: For LOO cross-validation
  • Posterior predictive: For model checking

4. Why This Model Succeeds Where GAM Fails

Feature GAM Approach State Space Model
Shock mechanism Indicator variable (absorbed by trend) Explicit ODE term with decay
Temporal structure Smooth function (static) Dynamic state evolution
Prior information Penalty on curvature Informative economic priors
Shock identification Competes with 501-EDF trend Structurally separated from trend
Interpretation Abstract smooth values Separation/finding rates
Recovery dynamics Not modeled Education-specific decay rates
Seasonality Cyclic smooth Finding rate multiplier

The fundamental insight: By building economic structure into the model (separation → unemployment → finding), shock effects have a clear mechanistic role. The explicit ODE dynamics provide interpretable estimates of labor market flows.

Hierarchical Parameter Structure

The model uses hierarchical priors to pool information across education levels, improving estimation efficiency while allowing group-specific variation.

Hierarchical Parameters

1. Equilibrium Unemployment (\(u_{eq}\))

μ_logit_u_eq ~ Normal(-3.3, 0.3)           # Population mean (logit scale)
σ_logit_u_eq ~ Exponential(2)              # Between-education SD
logit_u_eq[i] = μ_logit_u_eq + σ_logit_u_eq * u_eq_raw[i]
  • Centers on ~3.5% equilibrium unemployment
  • Allows education-specific deviations
  • Non-centered for efficient sampling

2. Adjustment Speed (\(s + f\))

μ_log_adj_speed ~ Normal(2.3, 0.5)         # Population mean (log scale)
σ_log_adj_speed ~ Exponential(1)           # Between-education SD
log_adj_speed[i] = μ_log_adj_speed + σ_log_adj_speed * adj_speed_raw[i]
  • Data-informed prior: exp(2.3) ≈ 10
  • Controls speed of adjustment to equilibrium
  • Note: implicitly scaled ~30x by logit dynamics

3. Shock Magnitudes (2008, 2020)

μ_log_shock_2008 ~ Normal(-2, 0.8)         # Population mean
σ_log_shock_2008 ~ Exponential(1)          # Between-education SD
log_shock_2008[i] = μ_log_shock_2008 + σ_log_shock_2008 * shock_2008_raw[i]

μ_log_shock_2020 ~ Normal(-1.5, 0.8)       # Population mean
σ_log_shock_2020 ~ Exponential(1)          # Between-education SD
log_shock_2020[i] = μ_log_shock_2020 + σ_log_shock_2020 * shock_2020_raw[i]
  • Pools shock effects across education levels
  • 2008: exp(-2) ≈ 0.14 (14% increase in separation rate)
  • 2020: exp(-1.5) ≈ 0.22 (22% increase in separation rate)

4. Recovery Rates (Decay)

μ_decay_2008 ~ Normal(0, 0.5)              # Population mean (logit scale)
σ_decay_2008 ~ Exponential(1)              # Between-education SD
decay_2008[i] = 0.1 + 4.9 * inv_logit(μ_decay_2008 + σ_decay_2008 * decay_2008_raw[i])

μ_decay_2020 ~ Normal(0, 0.5)              # Population mean (logit scale)
σ_decay_2020 ~ Exponential(1)              # Between-education SD
decay_2020[i] = 0.1 + 4.9 * inv_logit(μ_decay_2020 + σ_decay_2020 * decay_2020_raw[i])
  • Constrained to [0.1, 5] range
  • Centers on ~2.5 (moderate recovery)
  • Pooling prevents overfitting with limited shock data

5. Seasonal Effects

μ_seasonal ~ Normal(0, 0.03)               # Population mean pattern (11 months)
sum(μ_seasonal) ~ Normal(0, 0.001)         # Soft sum-to-zero constraint
σ_seasonal ~ Exponential(10)               # Between-education SD (mean = 0.1)
seasonal_u[1:11, i] = μ_seasonal + σ_seasonal * seasonal_u_raw[, i]
  • Hierarchical pooling of monthly seasonal patterns
  • Population-level pattern with education-specific deviations
  • 12th month derived: seasonal_u[12, i] = -sum(seasonal_u[1:11, i])
  • Modest effects (~1-3% on logit scale)

6. Spline Smoothness (σ_spline)

μ_log_sigma_spline ~ Normal(-0.22, 0.4)   # Population mean on log scale
σ_log_sigma_spline ~ Exponential(2)       # Between-education SD (mean = 0.5)
sigma_spline[i] = exp(μ_log_sigma_spline + σ_log_sigma_spline * sigma_spline_raw[i])
  • Education-specific residual smoothness on spline coefficients
  • PhDs may have smoother trajectories than less-educated workers
  • Data-informed: exp(-0.22) ≈ 0.8 from exploratory fits
  • Hierarchical pooling prevents overfitting of smoothness parameters

Non-Hierarchical Parameters

Initial States - Starting unemployment levels - Not hierarchical (equilibrium handles long-run behavior) - Prior: Normal(-3.0, 0.5) on logit scale

Overdispersion - Beta-binomial φ parameter - Data-informed: log(φ-1) ~ Normal(8.5, 0.5) → φ ≈ 5000 - Accounts for extra-binomial variation

Benefits of Hierarchical Pooling

  1. Improved Estimates: Small groups (PhDs, professionals) borrow strength from population
  2. Regularization: Prevents overfitting of education-specific parameters
  3. Identifiability: Constrains some_college adjustment speed through population effect
  4. Interpretability: Population means are directly interpretable aggregates

Hierarchical Variance Analysis: Comparing Pooling Strength

A key question for hierarchical models: Which unemployment dynamics show the most education-specific variation?

The hierarchical structure allows us to directly compare the strength of pooling across all parameter groups by examining the between-education standard deviation (\(\sigma\)) for each.

Between-Education Variance Summary

Show code
# Load the variance analysis results
variance_file <- here("results", "hierarchical-variance-summary.rds")

if (!file.exists(variance_file)) {
  cat("Running hierarchical variance analysis...\n")
  system("Rscript scripts/analyze-hierarchical-variance.R")
}

variance_summary <- readRDS(variance_file)

# Display summary table
cat("\n=== Between-Education Variance (σ) ===\n")

=== Between-Education Variance (σ) ===
Show code
cat("Smaller σ = stronger pooling = more similar across education\n\n")
Smaller σ = stronger pooling = more similar across education
Show code
knitr::kable(
  variance_summary[, .(Parameter = parameter,
                       `σ (mean)` = sprintf("%.3f", sigma_mean),
                       `90% CI` = sprintf("[%.3f, %.3f]", sigma_q5, sigma_q95),
                       `Edu CV` = sprintf("%.1f%%", edu_cv * 100),
                       `Pooling Strength` = pooling)],
  align = c("l", "r", "r", "r", "l")
)
Parameter σ (mean) 90% CI Edu CV Pooling Strength
2020 decay rate 5.503 [3.056, 8.756] 1.8% Very weak (large differences)
2008 decay rate 2.245 [0.955, 3.672] 22.0% Very weak (large differences)
Adjustment speed (adj_speed) 0.900 [0.538, 1.440] 69.4% Very weak (large differences)
Equilibrium unemployment (u_eq) 0.660 [0.419, 1.014] 55.5% Very weak (large differences)
2020 shock effect 0.381 [0.185, 0.692] 31.1% Moderate (some education differences)
2008 shock effect 0.353 [0.192, 0.613] 26.8% Moderate (some education differences)
Spline smoothness 0.100 [0.007, 0.268] 3.3% Very tight (nearly identical)
Seasonal effects 0.051 [0.041, 0.062] 44.0% Tight (similar with small variation)

Between-education variance (σ) for all hierarchical parameters

Show code
knitr::include_graphics(here("reports", "figures", "hierarchical-variance-comparison.png"))

Comparing pooling strength across all hierarchical parameters

Interpretation: Universal vs. Education-Specific Dynamics

Tight Pooling (σ < 0.15) - Nearly universal across education:

Show code
tight <- variance_summary[sigma_mean < 0.15, parameter]
if (length(tight) > 0) {
  cat(paste0("  • ", tight, collapse = "\n"), "\n\n")
  cat("→ These dynamics are fundamental to all education levels\n")
  cat("→ Education-specific differences are small relative to uncertainty\n")
  cat("→ The data supports a common mechanism\n")
} else {
  cat("No parameters show tight pooling.\n")
}
  • Spline smoothness
  • Seasonal effects 

→ These dynamics are fundamental to all education levels
→ Education-specific differences are small relative to uncertainty
→ The data supports a common mechanism

Moderate Pooling (0.15 ≤ σ < 0.4) - Some education differences:

Show code
moderate <- variance_summary[sigma_mean >= 0.15 & sigma_mean < 0.4, parameter]
if (length(moderate) > 0) {
  cat(paste0("  • ", moderate, collapse = "\n"), "\n\n")
  cat("→ Modest education-specific variation exists\n")
  cat("→ Pooling provides moderate regularization\n")
  cat("→ Groups are similar but distinguishable\n")
} else {
  cat("No parameters show moderate pooling.\n")
}
  • 2020 shock effect
  • 2008 shock effect 

→ Modest education-specific variation exists
→ Pooling provides moderate regularization
→ Groups are similar but distinguishable

Weak Pooling (σ ≥ 0.4) - Substantial education differences:

Show code
weak <- variance_summary[sigma_mean >= 0.4, parameter]
if (length(weak) > 0) {
  cat(paste0("  • ", weak, collapse = "\n"), "\n\n")
  cat("→ Education levels genuinely differ in these dynamics\n")
  cat("→ Pooling provides modest shrinkage but preserves differences\n")
  cat("→ The data supports education-specific mechanisms\n")
} else {
  cat("No parameters show weak pooling.\n")
}
  • 2020 decay rate
  • 2008 decay rate
  • Adjustment speed (adj_speed)
  • Equilibrium unemployment (u_eq) 

→ Education levels genuinely differ in these dynamics
→ Pooling provides modest shrinkage but preserves differences
→ The data supports education-specific mechanisms

Education-Specific Parameter Values

To visualize the differences, compare selected parameters with different pooling strengths:

Show code
knitr::include_graphics(here("reports", "figures", "education-specific-parameters.png"))

Education-specific parameter values for parameters with different pooling strengths

Key Insights

To meaningfully compare variability across parameters with different units, we use the coefficient of variation (CV = σ/|μ|), which measures relative variability:

  1. Most Variable (by CV): Adjustment speed (adj_speed) (CV = 69.4%)
    • Education levels differ substantially in this aspect of unemployment dynamics
    • Hierarchical prior prevents overfitting while allowing genuine differences
    • Raw σ = 0.900
  2. Least Variable (by CV): 2020 decay rate (CV = 1.8%)
    • All education levels share nearly identical dynamics for this parameter
    • Strong pooling is appropriate and improves estimation efficiency
    • Raw σ = 5.503
  3. Practical Implications:
    • Parameters with high CV require education-specific interpretation
    • Parameters with low CV can be summarized at the population level
    • The hierarchy automatically discovers which dynamics are universal vs. education-specific
    • Note: Comparing raw σ values across parameters is invalid because they have different units

Data Preparation

Show code
# Data and model already loaded in setup chunk for report efficiency
# Display data summary
cat(sprintf("Time series length: %d months\n", stan_data$T))
Time series length: 310 months
Show code
cat(sprintf("Education levels: %d\n", stan_data$N_edu))
Education levels: 7
Show code
cat(sprintf("Education categories: %s\n",
            paste(stan_data$education_levels, collapse = ", ")))
Education categories: bachelors, high_school, less_than_hs, masters, phd, professional, some_college
Show code
cat(sprintf("Date range: %.2f to %.2f\n",
            min(stan_data$year_frac), max(stan_data$year_frac)))
Date range: 2000.04 to 2025.88

Model Fitting

We use the efficient spline-based model which reduces dimensionality from ~2100 to ~105 latent parameters, enabling proper MCMC sampling.

Show code
# Model already loaded in setup chunk for report efficiency
# Display diagnostics
cat("\n=== Convergence Diagnostics ===\n")

=== Convergence Diagnostics ===
Show code
cat(sprintf("Divergent transitions: %d\n", result$diagnostics$num_divergent))
Divergent transitions: 0
Show code
cat(sprintf("Max treedepth exceeded: %d\n", result$diagnostics$max_treedepth_exceeded))
Max treedepth exceeded: 0
Show code
cat(sprintf("E-BFMI: %s\n", paste(round(result$diagnostics$ebfmi, 3), collapse = ", ")))
E-BFMI: 0.685, 0.632, 0.686, 0.668
Show code
# Check all chains completed
if (length(result$diagnostics$ebfmi) < 4) {
  cat(sprintf("WARNING: Only %d of 4 chains completed!\n", length(result$diagnostics$ebfmi)))
}

Economic Parameter Estimates

Separation and Finding Rates

The model estimates education-specific job separation rates (probability of losing a job per month) and job finding rates (probability of finding a job per month while unemployed).

Show code
params <- extract_economic_params(result)

# Separation rates
cat("\n=== Monthly Job Separation Rates ===\n")

=== Monthly Job Separation Rates ===
Show code
sep_summary <- params$separation_rates
sep_summary$education <- stan_data$education_levels
print(sep_summary[, c("education", "mean", "q5", "q95")], digits = 4)
# A tibble: 7 × 4
  education     mean    q5   q95
  <chr>        <dbl> <dbl> <dbl>
1 bachelors    0.182 0.116 0.258
2 high_school  0.373 0.348 0.398
3 less_than_hs 0.302 0.279 0.325
4 masters      0.308 0.285 0.332
5 phd          0.338 0.291 0.393
6 professional 0.398 0.327 0.472
7 some_college 0.347 0.280 0.424
Show code
# Finding rates
cat("\n=== Monthly Job Finding Rates ===\n")

=== Monthly Job Finding Rates ===
Show code
find_summary <- params$finding_rates
find_summary$education <- stan_data$education_levels
print(find_summary[, c("education", "mean", "q5", "q95")], digits = 3)
# A tibble: 7 × 4
  education     mean    q5   q95
  <chr>        <dbl> <dbl> <dbl>
1 bachelors     2.90  1.91  4.05
2 high_school   6.88  6.43  7.34
3 less_than_hs  6.84  6.34  7.35
4 masters      11.4  10.5  12.3 
5 phd          15.1  12.9  17.7 
6 professional 29.8  24.4  35.2 
7 some_college 24.1  19.4  29.5 
Show code
# Equilibrium unemployment
cat("\n=== Equilibrium Unemployment Rates ===\n")

=== Equilibrium Unemployment Rates ===
Show code
cat("(Long-run steady state: u* = s/(s+f))\n\n")
(Long-run steady state: u* = s/(s+f))
Show code
eq_summary <- params$equilibrium_rates
eq_summary$education <- stan_data$education_levels
print(eq_summary[, c("education", "mean", "q5", "q95")], digits = 4)
# A tibble: 7 × 4
  education      mean     q5    q95
  <chr>         <dbl>  <dbl>  <dbl>
1 bachelors    0.0589 0.0547 0.0629
2 high_school  0.0514 0.0509 0.0519
3 less_than_hs 0.0423 0.0418 0.0427
4 masters      0.0263 0.0259 0.0267
5 phd          0.0219 0.0214 0.0223
6 professional 0.0132 0.0126 0.0138
7 some_college 0.0142 0.0136 0.0148
Show code
eq_df <- data.frame(
  education = stan_data$education_levels,
  mean = eq_summary$mean,
  lower = eq_summary$q5,
  upper = eq_summary$q95
)

# Order by mean rate
eq_df$education <- factor(eq_df$education,
                          levels = eq_df$education[order(eq_df$mean)])

ggplot(eq_df, aes(x = education, y = mean * 100)) +
  geom_point(size = 4, color = "steelblue") +
  geom_errorbar(aes(ymin = lower * 100, ymax = upper * 100),
                width = 0.2, linewidth = 1, color = "steelblue") +
  coord_flip() +
  labs(x = "Education Level",
       y = "Equilibrium Unemployment Rate (%)",
       title = "Long-Run Unemployment Rates by Education",
       subtitle = "Based on estimated separation and finding rates") +
  theme(axis.text.y = element_text(size = 11))

Equilibrium unemployment rates by education level (with 90% credible intervals)

Shock Effects

The key advantage of the state space model: shock effects are positive and identifiable. With hierarchical priors, we estimate both population-level means and education-specific effects.

Show code
# Population-level shock parameters (hierarchical)
cat("\n=== Hierarchical Shock Parameters (Population Level) ===\n")

=== Hierarchical Shock Parameters (Population Level) ===
Show code
hier_params <- c("mu_log_shock_2008", "sigma_log_shock_2008",
                 "mu_log_shock_2020", "sigma_log_shock_2020")
hier_summary <- result$fit$summary(hier_params)
print(hier_summary[, c("variable", "mean", "sd", "q5", "q95", "rhat", "ess_bulk")], digits = 3)
# A tibble: 4 × 7
  variable               mean    sd     q5    q95  rhat ess_bulk
  <chr>                 <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>
1 mu_log_shock_2008    -1.47  0.394 -2.20  -0.942  1.00    4292.
2 sigma_log_shock_2008  0.691 0.316  0.324  1.31   1.00    3788.
3 mu_log_shock_2020    -0.798 0.424 -1.59  -0.215  1.00    5160.
4 sigma_log_shock_2020  0.632 0.293  0.290  1.20   1.00    3854.
Show code
cat("\n\nInterpretation on natural scale:\n")


Interpretation on natural scale:
Show code
cat(sprintf("  2008 mean effect: exp(%.2f) = %.1f%% increase in separation rate\n",
            hier_summary$mean[hier_summary$variable == "mu_log_shock_2008"],
            100 * exp(hier_summary$mean[hier_summary$variable == "mu_log_shock_2008"])))
  2008 mean effect: exp(-1.47) = 23.0% increase in separation rate
Show code
cat(sprintf("  2020 mean effect: exp(%.2f) = %.1f%% increase in separation rate\n",
            hier_summary$mean[hier_summary$variable == "mu_log_shock_2020"],
            100 * exp(hier_summary$mean[hier_summary$variable == "mu_log_shock_2020"])))
  2020 mean effect: exp(-0.80) = 45.0% increase in separation rate
Show code
cat("\n=== 2008 Financial Crisis Shock Effects (Education-Specific) ===\n")

=== 2008 Financial Crisis Shock Effects (Education-Specific) ===
Show code
cat("(Additional monthly separation probability during shock)\n\n")
(Additional monthly separation probability during shock)
Show code
shock08 <- params$shock_2008_effects
shock08$education <- stan_data$education_levels
print(shock08[, c("education", "mean", "q5", "q95")], digits = 4)
# A tibble: 7 × 4
  education     mean    q5   q95
  <chr>        <dbl> <dbl> <dbl>
1 bachelors    0.423 0.296 0.564
2 high_school  0.630 0.594 0.668
3 less_than_hs 0.541 0.507 0.576
4 masters      0.372 0.337 0.406
5 phd          0.304 0.254 0.361
6 professional 0.398 0.308 0.495
7 some_college 0.337 0.264 0.414
Show code
cat("\n=== 2020 COVID-19 Shock Effects (Education-Specific) ===\n")

=== 2020 COVID-19 Shock Effects (Education-Specific) ===
Show code
cat("(Additional monthly separation probability during shock)\n\n")
(Additional monthly separation probability during shock)
Show code
shock20 <- params$shock_2020_effects
shock20$education <- stan_data$education_levels
print(shock20[, c("education", "mean", "q5", "q95")], digits = 4)
# A tibble: 7 × 4
  education     mean    q5   q95
  <chr>        <dbl> <dbl> <dbl>
1 bachelors    0.573 0.291 0.879
2 high_school  1.11  1.02  1.20 
3 less_than_hs 1.20  1.11  1.31 
4 masters      1.04  0.936 1.15 
5 phd          0.779 0.643 0.922
6 professional 1.79  1.43  2.12 
7 some_college 0.796 0.573 1.03 
Show code
cat("\n=== 2008 Shock Half-Lives (Education-Specific) ===\n")

=== 2008 Shock Half-Lives (Education-Specific) ===
Show code
cat("(Time in years for shock effect to decay by 50%)\n\n")
(Time in years for shock effect to decay by 50%)
Show code
hl08 <- params$halflife_2008
hl08$education <- stan_data$education_levels
print(hl08[, c("education", "mean", "q5", "q95")], digits = 2)
# A tibble: 7 × 4
  education     mean    q5   q95
  <chr>        <dbl> <dbl> <dbl>
1 bachelors     1.66  1.26  2.09
2 high_school   2.13  2.05  2.22
3 less_than_hs  2.13  2.03  2.23
4 masters       2.00  1.84  2.16
5 phd           2.17  1.82  2.54
6 professional  3.26  2.48  4.18
7 some_college  2.54  1.93  3.23
Show code
cat("\n=== 2020 COVID Half-Lives (Education-Specific) ===\n")

=== 2020 COVID Half-Lives (Education-Specific) ===
Show code
cat("(Time in years for shock effect to decay by 50%)\n\n")
(Time in years for shock effect to decay by 50%)
Show code
hl20 <- params$halflife_2020
hl20$education <- stan_data$education_levels
print(hl20[, c("education", "mean", "q5", "q95")], digits = 2)
# A tibble: 7 × 4
  education     mean    q5   q95
  <chr>        <dbl> <dbl> <dbl>
1 bachelors    0.160 0.139 0.213
2 high_school  0.140 0.139 0.143
3 less_than_hs 0.139 0.139 0.141
4 masters      0.140 0.139 0.146
5 phd          0.140 0.139 0.145
6 professional 0.140 0.139 0.147
7 some_college 0.140 0.139 0.145
Show code
# Combine shock data
shock_df <- rbind(
  data.frame(
    shock = "2008 Financial Crisis",
    education = stan_data$education_levels,
    mean = shock08$mean,
    lower = shock08$q5,
    upper = shock08$q95
  ),
  data.frame(
    shock = "2020 COVID-19",
    education = stan_data$education_levels,
    mean = shock20$mean,
    lower = shock20$q5,
    upper = shock20$q95
  )
)

ggplot(shock_df, aes(x = education, y = mean * 100, color = shock)) +
  geom_point(position = position_dodge(width = 0.5), size = 3) +
  geom_errorbar(aes(ymin = lower * 100, ymax = upper * 100),
                position = position_dodge(width = 0.5), width = 0.3) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
  coord_flip() +
  scale_color_manual(values = c("2008 Financial Crisis" = "#E74C3C",
                                "2020 COVID-19" = "#3498DB")) +
  labs(x = "Education Level",
       y = "Additional Separation Rate (%)",
       color = "Economic Shock",
       title = "Shock Effects on Job Separation by Education",
       subtitle = "Positive values indicate increased job loss during crisis") +
  theme(legend.position = "bottom")

Comparison of shock effects across education levels

Latent Unemployment Trajectories

Show code
# Extract latent unemployment rates
latent <- extract_latent_rates(result, summary = TRUE)

# Create data for plotting
plot_data <- data.frame(
  year_frac = latent$year_frac,
  education = latent$education,
  mean = latent$mean,
  lower = latent$q5,
  upper = latent$q95
)

# Add observed rates for comparison
for (i in seq_along(stan_data$education_levels)) {
  edu <- stan_data$education_levels[i]
  idx <- plot_data$education == edu
  obs_rate <- stan_data$n_unemployed[, i] / stan_data$n_total[, i]
  plot_data$observed[idx] <- obs_rate
}
Show code
ggplot(plot_data, aes(x = year_frac)) +
  geom_ribbon(aes(ymin = lower * 100, ymax = upper * 100, fill = education),
              alpha = 0.3) +
  geom_line(aes(y = mean * 100, color = education), linewidth = 0.8) +
  geom_point(aes(y = observed * 100), alpha = 0.3, size = 0.5) +
  geom_vline(xintercept = c(2008.75, 2020.25), linetype = "dashed",
             color = "red", alpha = 0.5) +
  facet_wrap(~education, scales = "free_y", ncol = 2) +
  scale_x_continuous(breaks = seq(2000, 2025, 5)) +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +
  labs(x = "Year",
       y = "Unemployment Rate",
       title = "Latent Unemployment Trajectories by Education Level",
       subtitle = "Points: observed data; Lines: model estimates; Bands: 90% CI; Red lines: shock onsets") +
  theme(legend.position = "none",
        strip.text = element_text(size = 10))

Latent unemployment rates from state space model (with 90% credible bands)

Non-Seasonal Trend

The state space model allows us to extract the underlying trend by removing seasonal effects. This shows the unemployment dynamics driven only by:

  • Baseline separation and finding rates
  • Economic shock effects (2008, 2020)
  • Stochastic innovations
Show code
# Extract the non-seasonal trend
trend <- extract_trend(result, summary = TRUE)

# Create data for plotting
trend_data <- data.frame(
  year_frac = trend$year_frac,
  education = trend$education,
  mean = trend$mean,
  lower = trend$q5,
  upper = trend$q95
)
Show code
ggplot(trend_data, aes(x = year_frac)) +
  geom_ribbon(aes(ymin = lower * 100, ymax = upper * 100, fill = education),
              alpha = 0.3) +
  geom_line(aes(y = mean * 100, color = education), linewidth = 1) +
  geom_vline(xintercept = c(2008.75, 2020.25), linetype = "dashed",
             color = "red", alpha = 0.7) +
  annotate("text", x = 2009.5, y = Inf, label = "2008 Crisis",
           vjust = 2, hjust = 0, size = 3, color = "red") +
  annotate("text", x = 2021, y = Inf, label = "COVID-19",
           vjust = 2, hjust = 0, size = 3, color = "red") +
  facet_wrap(~education, scales = "free_y", ncol = 2) +
  scale_x_continuous(breaks = seq(2000, 2025, 5)) +
  scale_y_continuous(labels = scales::percent_format(scale = 1)) +
  labs(x = "Year",
       y = "Unemployment Rate (Trend)",
       title = "Non-Seasonal Unemployment Trend by Education Level",
       subtitle = "Seasonal effects removed; shows baseline dynamics + shock impacts") +
  theme(legend.position = "none",
        strip.text = element_text(size = 10))

Non-seasonal unemployment trend (shock + baseline dynamics only)

Seasonal Oscillation: Direct Visualization

The seasonal effect is the difference between the full model (with seasonality) and the trend (without). This directly shows how much unemployment oscillates due to seasonal hiring patterns.

Show code
# Extract seasonal effects directly computed by the model
seasonal_effects <- extract_seasonal_effects(result, summary = TRUE)

# Focus on key education levels
sample_edu <- c("phd", "bachelors", "less_than_hs")
seas_subset <- seasonal_effects[seasonal_effects$education %in% sample_edu, ]

# Plot the seasonal oscillation over time
ggplot(seas_subset, aes(x = year_frac)) +
  geom_ribbon(aes(ymin = q5 * 100, ymax = q95 * 100, fill = education),
              alpha = 0.3) +
  geom_line(aes(y = mean * 100, color = education), linewidth = 0.6) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray40") +
  facet_wrap(~education, scales = "free_y", ncol = 1) +
  scale_x_continuous(breaks = seq(2000, 2025, 5)) +
  labs(x = "Year",
       y = "Seasonal Effect (percentage points)",
       title = "Seasonal Oscillation in Unemployment by Education",
       subtitle = "Positive = unemployment above trend; Negative = below trend; Band = 90% CI") +
  theme(legend.position = "none",
        strip.text = element_text(size = 12, face = "bold"))

Seasonal oscillation in unemployment rates (Full model minus Trend)

Monthly Seasonal Pattern

The model includes a direct seasonal effect on unemployment (on the logit scale). This captures observed seasonal patterns like academic hiring calendars, summer employment fluctuations, and annual hiring cycles.

Show code
# Extract direct seasonal unemployment parameters
seasonal_u_summary <- result$fit$summary(variables = "seasonal_u")

# Parse indices
seasonal_u_summary$month <- as.integer(
  gsub("seasonal_u\\[(\\d+),\\d+\\]", "\\1", seasonal_u_summary$variable)
)
seasonal_u_summary$edu_index <- as.integer(
  gsub("seasonal_u\\[\\d+,(\\d+)\\]", "\\1", seasonal_u_summary$variable)
)
seasonal_u_summary$education <- stan_data$education_levels[seasonal_u_summary$edu_index]

# Create month labels
month_labels <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
                  "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
seasonal_u_summary$month_label <- factor(month_labels[seasonal_u_summary$month],
                                          levels = month_labels)

# Plot for selected education levels
seasonal_u_plot <- seasonal_u_summary[seasonal_u_summary$education %in% sample_edu, ]

ggplot(seasonal_u_plot, aes(x = month_label, y = mean, group = education)) +
  geom_ribbon(aes(ymin = q5, ymax = q95, fill = education),
              alpha = 0.3) +
  geom_line(aes(color = education), linewidth = 1.2) +
  geom_point(aes(color = education), size = 3) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray40") +
  facet_wrap(~education, ncol = 1, scales = "free_y") +
  labs(x = "Month",
       y = "Seasonal Effect (logit scale)",
       title = "Direct Seasonal Effects on Unemployment",
       subtitle = "Positive = higher unemployment; Negative = lower unemployment; Band = 90% CI") +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 45, hjust = 1),
        strip.text = element_text(size = 12, face = "bold"))

Direct monthly seasonal effects on unemployment by education level

Model Decomposition: Full vs Trend vs Pure ODE

This visualization compares three trajectories:

  1. Full model (blue): Includes all components - ODE dynamics, shocks, seasonality, and stochastic innovations
  2. Trend (orange): Removes seasonality but keeps innovations (shows underlying dynamics)
  3. Pure ODE (red): Only structural ODE dynamics - no innovations, no seasonality (shows what separation/finding rates alone predict)
Show code
# Extract pure ODE trajectory
pure_ode <- extract_pure_ode(result, summary = TRUE)

# Debug: Check data dimensions
cat("Full model data rows:", nrow(plot_data), "\n")
Full model data rows: 2170 
Show code
cat("Trend data rows:", nrow(trend_data), "\n")
Trend data rows: 2170 
Show code
cat("Pure ODE data rows:", nrow(pure_ode), "\n")
Pure ODE data rows: 2170 
Show code
# Create separate data frames with only needed columns
pure_ode_df <- data.frame(
  year_frac = pure_ode$year_frac,
  education = pure_ode$education,
  rate = pure_ode$mean,
  type = "Pure ODE"
)

full_df <- data.frame(
  year_frac = plot_data$year_frac,
  education = plot_data$education,
  rate = plot_data$mean,
  type = "Full Model"
)

trend_df <- data.frame(
  year_frac = trend_data$year_frac,
  education = trend_data$education,
  rate = trend_data$mean,
  type = "Trend (no seasonal)"
)

# Combine all
decomp_data <- rbind(pure_ode_df, trend_df, full_df)

# Debug: Check combined data
cat("Combined data rows:", nrow(decomp_data), "\n")
Combined data rows: 6510 
Show code
cat("Types present:", unique(decomp_data$type), "\n")
Types present: Pure ODE Trend (no seasonal) Full Model 
Show code
decomp_data$type <- factor(decomp_data$type,
                           levels = c("Pure ODE", "Trend (no seasonal)", "Full Model"))

# Subset to key education levels
decomp_subset <- decomp_data[decomp_data$education %in% sample_edu, ]
cat("Subset rows:", nrow(decomp_subset), "\n")
Subset rows: 2790 
Show code
cat("Types in subset:", unique(as.character(decomp_subset$type)), "\n")
Types in subset: Pure ODE Trend (no seasonal) Full Model 
Show code
# Plot with explicit linewidths (no aesthetic mapping)
ggplot(decomp_subset, aes(x = year_frac, y = rate * 100, color = type)) +
  geom_line(linewidth = 0.8) +
  geom_vline(xintercept = c(2008.75, 2020.25), linetype = "dashed",
             color = "gray40", alpha = 0.5) +
  facet_wrap(~education, scales = "free_y", ncol = 1) +
  scale_color_manual(values = c("Pure ODE" = "#E74C3C",
                                "Trend (no seasonal)" = "#F39C12",
                                "Full Model" = "#3498DB")) +
  scale_x_continuous(breaks = seq(2000, 2025, 5)) +
  labs(x = "Year",
       y = "Unemployment Rate (%)",
       color = "Model Component",
       title = "Decomposition of Unemployment Dynamics",
       subtitle = "Red = pure ODE; Orange = trend + innovations; Blue = full model with seasonality") +
  theme(legend.position = "bottom",
        strip.text = element_text(size = 12, face = "bold"))

Model decomposition showing structural dynamics vs observed trajectory

Zoom: Seasonal Oscillation Detail (2017-2019)

A closer look at a few years to see the seasonal oscillation clearly:

Show code
# Zoom into 2017-2019 to see seasonal pattern clearly
zoom_years <- decomp_subset[decomp_subset$year_frac >= 2017 &
                             decomp_subset$year_frac <= 2020, ]

cat("Zoom rows:", nrow(zoom_years), "\n")
Zoom rows: 324 
Show code
cat("Types in zoom:", unique(as.character(zoom_years$type)), "\n")
Types in zoom: Pure ODE Trend (no seasonal) Full Model 
Show code
ggplot(zoom_years, aes(x = year_frac, y = rate * 100, color = type)) +
  geom_line(linewidth = 0.8) +
  facet_wrap(~education, scales = "free_y", ncol = 1) +
  scale_color_manual(values = c("Pure ODE" = "#E74C3C",
                                "Trend (no seasonal)" = "#F39C12",
                                "Full Model" = "#3498DB")) +
  scale_x_continuous(breaks = seq(2017, 2020, 1),
                     minor_breaks = seq(2017, 2020, 0.25)) +
  labs(x = "Year",
       y = "Unemployment Rate (%)",
       color = "Component",
       title = "Seasonal Pattern Detail (2017-2019)",
       subtitle = "Blue oscillation around orange trend shows seasonal hiring patterns") +
  theme(legend.position = "bottom",
        strip.text = element_text(size = 12, face = "bold"),
        panel.grid.minor.x = element_line(color = "gray90"))