The Federal Labor Data Landscape

The Federal Labor Data Landscape

The Lesson

When building an analytical warehouse from multiple federal data products, the single most important architectural decision is identifying the stable external key that connects them. For labor data, that key is the Standard Occupational Classification (SOC) code — every design decision flows from treating occupation as the backbone, not as one attribute among many.

Context

The JobClass project ingests four federal data products — SOC taxonomy, OEWS employment/wage surveys, O*NET occupational descriptors, and BLS Employment Projections — into a layered DuckDB warehouse. Each product is published by a different office on a different schedule in a different format, but they all describe the same ~870 occupations using the same SOC code system. The challenge was designing a warehouse that treats these as four views of one reality rather than four independent datasets.

What Happened

  1. The SOC taxonomy was identified as the backbone. The Standard Occupational Classification assigns hierarchical codes (e.g., 11-1011 for Chief Executives, where 11 is the Management major group and 1011 narrows to the specific occupation). SOC 2018 defines roughly 870 detailed occupations and is updated approximately every 10 years.
  2. Four data products were mapped to a single key. OEWS provides employment counts and wage statistics by occupation and geography (annual, May reference period). O*NET provides skills, knowledge, abilities, and tasks (continuously versioned). BLS Projections provide 10-year employment outlook (biennial). All three join to the SOC taxonomy through occupation codes.
  3. The SOC dimension was loaded first in every pipeline run. Because every other data product joins to dim_occupation through the SOC code, the taxonomy must be present before any facts can be conformed. The orchestrator enforces this ordering.
  4. Unmappable codes were handled by exclusion. Approximately 5 National Employment Matrix 2024 codes have no corresponding SOC 2018 entry. The projections loader performs an inner join against dim_occupation, silently excluding these rows — an expected and documented gap.

Key Insights

Related Lessons