Learn by Directing AI

The Brief

Fatimah Al-Rashidi is VP of Supply Chain at Al-Bina'a Building Materials in Kuwait. Four factories producing concrete blocks, pipes, and precast construction elements. Materials flow from factories to construction projects across the country and to GCC export customers.

The board needs cost attribution by factory, by product line, by project -- and they need it fast enough for board meetings, not a week after the fact. The on-premise database can't keep up with the data volumes anymore. IT recommended BigQuery. Fatimah wants to know what that will cost, because she's heard cloud queries cost real money and she doesn't want a surprise bill.

Four factories. Different systems. A CFO waiting for numbers.

Your Role

You're migrating Al-Bina'a's supply chain analytics to BigQuery -- designing the schema with cost-conscious architecture, building the transformation layers, implementing access controls, and setting up monitoring that tells Fatimah's CFO something useful, not just "a job failed."

Every query in BigQuery costs money based on how much data it scans. The design decisions you've been making for free in DuckDB -- materializing tables, running full scans, rebuilding entire models -- now have a dollar sign attached. Partitioning and clustering aren't performance optimizations here. They're cost architecture.

For the first time, you'll connect AI directly to the database and pipeline via MCP. AI reading the schema directly instead of relying on your descriptions changes what it can do -- and changes what you need to verify.

Templates provide structure. The brief has deliberate ambiguity that you'll need to resolve through conversation with Fatimah.

What's New

Last time you built the AI infrastructure before the pipeline -- project memory files that changed what AI knew at session start. You designed incremental extraction with MERGE for Kyaw Zin Oo's rice mills and experienced the before/after contrast that made "infrastructure determines outcomes" concrete.

This time cost enters as a design concern. The same SQL that was free in DuckDB costs money in BigQuery. You'll design partitioning, clustering, and materialization strategy as financial decisions. RBAC enters -- who can see what data, tested by actually querying as each role. Quality testing becomes a deliberate strategy: which tests at which layer, with coverage analysis as judgment about risk. And business-outcome alerting replaces generic "job failed" alerts.

The hard part is that everything is connected. Partitioning affects cost. RBAC affects cost. The wrong incremental strategy produces silently stale data. A quality test at the wrong layer catches the failure after it cascades instead of before. This is the densest project yet, by design.

Tools

dbt Core with DuckDB adapter (local) and BigQuery adapter patterns (cloud) -- incremental models with strategy selection, partitioning, clustering (new this project)
BigQuery (emulated via DuckDB with BigQuery patterns) -- cloud warehouse with cost attribution via INFORMATION_SCHEMA.JOBS (new this project)
DuckDB -- local analytical database
Soda Core -- quality monitoring, trend-based anomaly detection
Dagster -- orchestration, business-outcome alerting (new this project)
Claude Code -- AI directing tool, with MCP connections to DuckDB and Dagster (new this project), context briefs (new this project)
Git / GitHub -- version control

Materials

You'll receive:

Factory data -- export samples from all four factories (CSV from Factories 1-2, JSON from Factories 3-4), each with different column names and material coding schemes
Material mapping -- master reconciliation table mapping all four coding schemes to standard product codes
Pipeline spec template -- empty structure for you to fill from your work with Fatimah
Project governance file -- CLAUDE.md with the project context, schema design reference, and known data issues