Learn by Directing AI
Chat with Katrine MollerAsk questions, get feedback, discuss the project

The Brief

Katrine Moller runs performance analytics at VindKraft Analytics in Esbjerg, Denmark. The company monitors 14 onshore wind farms across Denmark and northern Germany -- about 340 turbines sending SCADA data every 10 minutes: power output, wind speed, rotor RPM, nacelle temperature, pitch angle, yaw direction.

The problem is temporal. Turbines change over time. Gearbox replacements, blade upgrades, software updates. When a farm owner spent 2 million DKK on a blade upgrade and asks "did it actually improve power output?" Katrine should be able to answer definitively. She can't, because her data doesn't track what configuration a turbine had when it generated each reading. She's comparing post-upgrade data to a baseline that includes a mix of configurations.

The second problem is definitional. Katrine reports "availability" and "capacity factor" to farm owners, but not everyone agrees on what those terms mean. One client counts scheduled maintenance as downtime, another doesn't. The same turbine gets a different number depending on who's asking.

She has the data. She needs the infrastructure to make it trustworthy.

Your Role

You're building the data infrastructure that tracks turbine configurations over time and standardizes metric calculations across all 14 farms. The schema you design decides what historical analysis is possible -- and what history is permanently lost. The metric definitions you formalize become contracts that every report inherits.

The schemas have a temporal dimension -- tracking not just what a turbine measured, but what configuration it had when it measured it. Data governance here is enforcement, not awareness: actual PII masking and verification that the masking holds across every surface the pipeline touches.

The pipeline spec and CLAUDE.md come as templates this time. You populate them from Katrine's requirements rather than receiving completed versions.

What's New

Last time you built Roberto's quality analysis pipeline with complex transformations -- window functions, Jinja macros, Soda Core, CI/CD quality gates, Dagster freshness policies. The transformations were the hard part.

This time the schema design is the hard part. You'll decide how the data warehouse handles change over time -- which dimensions preserve their full history and which simply overwrite. AI will have strong opinions about this. You'll need to evaluate them against what Katrine actually needs.

MetricFlow enters as a semantic layer -- defining metrics once so every consumer gets the same calculation. A valid formula and a correct formula are not the same thing -- the difference shows up when Katrine's clients compare reports.

Governance moves from the question planted in P5 ("who is this data about?") to enforcement. Technician names in maintenance logs are PII. Masking them in the output tables is necessary but not sufficient. PII leaks through surfaces you won't expect.

Tools

  • dbt Core with DuckDB adapter -- transformation framework, plus MetricFlow semantic layer and model contracts (new this project)
  • DuckDB -- local analytical database
  • Soda Core -- quality monitoring
  • Dagster -- orchestration with freshness policies
  • GitHub Actions -- CI/CD quality gates
  • Claude Code -- AI directing tool, with context curation as deliberate practice (new this project)
  • Git / GitHub -- version control

Materials

You'll receive:

  • SCADA data -- a 200-row sample for orientation and a 15,000-row full dataset covering 20 turbines across 6 farms over 6 months
  • Component change log -- 50 records tracking gearbox replacements, blade upgrades, software updates, and generator changes (manually maintained, not always complete)
  • Turbine mapping table -- cross-referencing farm-assigned IDs and manufacturer serial numbers across all farms (some entries out of date)
  • Maintenance logs -- 280 records with technician names, maintenance types, and durations
  • Pipeline spec template -- empty structure for you to fill from Katrine's requirements
  • SCD design template -- explains the design decision you'll make about how the schema handles change
  • CLAUDE.md template -- project governance file for you to populate
  • MetricFlow guide -- how to define metrics in the semantic layer
  • PII classification checklist -- framework for identifying and masking personal data