P8: BigQuery Cost Architecture + First MCP
Client
Fatimah Al-Rashidi, VP of Supply Chain at Al-Bina'a Building Materials Co. in Kuwait. Four factories producing concrete blocks, pipes, and precast elements. Supplies construction projects across Kuwait and exports to GCC neighbors. ~400 employees.
What you're building
A cloud analytics pipeline migrating Al-Bina'a's supply chain data from an on-premise database to BigQuery. The pipeline ingests data from four factories (different formats, different material coding schemes), transforms it through staging/intermediate/mart layers with cost-conscious design (partitioning, clustering, materialization strategy), implements RBAC for data governance, and provides the CFO with cost attribution by factory, product line, and project.
Tech stack
- dbt Core with DuckDB adapter (local development) and BigQuery adapter patterns (cloud target)
- DuckDB for local warehouse and development
- BigQuery patterns: partitioning, clustering, INFORMATION_SCHEMA.JOBS, RBAC
- Dagster for orchestration, scheduling, asset lineage
- Soda Core for data quality monitoring and trend-based checks
- MCP connections: DuckDB MCP (database access), Dagster MCP (pipeline observability)
- Git/GitHub for version control
Data sources
Four factory exports with different formats and material coding schemes:
| Factory | Format | Material codes | Key fields |
|---|---|---|---|
| F1 | CSV | SAP (MAT-BLK-001) | delivery_date, material_code, quantity_delivered, customer_name, billing_status |
| F2 | CSV | Internal (F2-101) | ship_date, item_code, qty, buyer, billing_type, quality_inspector (PII) |
| F3 | JSON | Precast (PC-A-001) | date, material, amount, client, billing |
| F4 | JSON | Precast (PC-B-001) | date, material, amount, client, billing |
Master mapping: material-mapping.xlsx reconciles all four coding schemes to standard codes. Three entries marked UNVERIFIED.
Schema design
Layer architecture
- Staging:
stg_factory1_deliveries,stg_factory2_deliveries,stg_factory3_production,stg_factory4_production-- source-conform, material codes resolved via mapping table - Intermediate:
int_deliveries_unified,int_production_unified-- cross-factory joins - Mart:
fct_daily_deliveries(partitioned by date, clustered by factory/product),fct_cost_attribution,dim_materials,dim_factories
Naming conventions
- Staging:
stg_[source]_[entity] - Intermediate:
int_[entity]_[qualifier] - Fact:
fct_[entity] - Dimension:
dim_[entity]
Partitioning and clustering
- Fact tables: partitioned by date (delivery_date/ship_date)
- fct_daily_deliveries: clustered by factory_id, product_line
- fct_cost_attribution: clustered by factory_id, project_code
Materialization strategy
- Staging: view (lightweight, always current)
- Intermediate: table (frequently joined)
- Fact: incremental (large, append for events, merge for dimensions)
- Dimension: table (small, full refresh)
Known data issues
- Factory 2 quality_inspector column contains full names (PII) -- must be masked for non-quality roles
- Material coding differs across all four factories -- mapping table required
- Some mapping entries are UNVERIFIED -- flag in quality checks
- Billing method varies (delivery vs milestone) -- affects when revenue appears
- Correction records appear as duplicates with different values (no correction flag)
RBAC roles
| Role | Access |
|---|---|
| data_engineer | Full access (debugging, development) |
| analyst | Mart-layer only, masked PII |
| quality_team | Quality data with inspector names visible |
| cfo_reporting | Cost attribution views only |
Quality testing strategy
- Staging: schema drift detection, column presence, data types, value ranges, null patterns
- Intermediate: join key correctness, material code resolution completeness
- Mart: business logic validation (cost totals match source), freshness constraints
Monitoring
- Business-outcome alerts (report freshness for CFO)
- Cost alerts (daily BigQuery spend by model)
- Trend-based monitoring (row count anomaly detection)
- Backfill annotation mechanism
Verification targets
- All four staging models resolve to standard material codes
- Cost attribution totals match source sums
- RBAC tested by querying as each role
- Incremental models handle schema evolution after full refresh
- Quality tests at all three layers (staging, intermediate, mart)
Commit convention
Commit after each unit of work. Messages describe what changed and why. Include the unit context in the message (e.g., "Add staging models with material code resolution").