Pipeline Specification
Data Sources
List each data source: name, format, location, refresh frequency, known issues.
Schema Design
Staging Layer
For each staging model: source, target name, column mappings, transformations applied.
Intermediate Layer
For each intermediate model: inputs, join logic, deduplication approach.
Mart Layer
For each mart model: business purpose, grain (what one row represents), key metrics.
Partitioning Strategy
For each partitioned table: partition column, partition type (time-based, integer range), rationale tied to query patterns.
Clustering Strategy
For each clustered table: cluster columns, rationale tied to filter patterns.
Materialization Strategy
For each model: materialization type (view, table, incremental), incremental strategy if applicable (append, merge, delete+insert), rationale.
RBAC Design
For each role: name, purpose, data access scope, column restrictions.
Quality Testing Strategy
Staging Tests
Tests that validate source data at the boundary.
Intermediate Tests
Tests that verify transformation logic and joins.
Mart Tests
Tests that verify business rules and consumer contracts.
Monitoring Strategy
Alert conditions, recipients, escalation paths, threshold rationale.
Cost Targets
Expected query patterns, estimated cost per pattern, total daily/monthly budget.