Data Engineering

Data engineering is building the infrastructure that moves and transforms data. Pipelines that pull data from sources, transform it into something useful, and serve it to analysts, data scientists, and applications. The work spans ingestion, transformation, quality, orchestration, governance, and observability.

If data science asks "what does the data say?" and analytics asks "what should we do about it?", data engineering asks "how do we make sure the right data gets to the right place, on time, correctly, every time?"

The track

Projects span from a simple batch pipeline to complex streaming architectures with governance and compliance. You'll direct AI to build data pipelines, transformation layers, and quality systems for fictional clients, then verify whether the data arrives correctly, the transformations are right, and the system handles failure gracefully.

The skill you're building isn't writing dbt models or Dagster pipelines from scratch. It's directing AI to build data infrastructure and verifying the result: knowing what a correct pipeline looks like, where data quality breaks down, and when AI's implementation will fail silently in production.

Before you start

Read the Introduction: what the field is, how the work flows, what tools you'll use
Complete the Platform Setup: accounts, terminal, Claude Code, Git (same for all tracks)
Complete the Data Engineering Setup: Python, pipeline tools, and a hands-on demo

Data Engineering

The track

Before you start

Projects

Building and Verifying a CSV Pipeline

API Extraction and Schema Design

dbt Framework: From Scripts to Models

Multi-Source Extraction and Layer Architecture