Learn by Directing AI
All materials

project-spec-template.md

Pipeline Specification

Data Sources

List each data source: name, format, location, refresh frequency, known issues.

Schema Design

Staging Layer

For each staging model: source, target name, column mappings, transformations applied.

Intermediate Layer

For each intermediate model: inputs, join logic, deduplication approach.

Mart Layer

For each mart model: business purpose, grain (what one row represents), key metrics.

Partitioning Strategy

For each partitioned table: partition column, partition type (time-based, integer range), rationale tied to query patterns.

Clustering Strategy

For each clustered table: cluster columns, rationale tied to filter patterns.

Materialization Strategy

For each model: materialization type (view, table, incremental), incremental strategy if applicable (append, merge, delete+insert), rationale.

RBAC Design

For each role: name, purpose, data access scope, column restrictions.

Quality Testing Strategy

Staging Tests

Tests that validate source data at the boundary.

Intermediate Tests

Tests that verify transformation logic and joins.

Mart Tests

Tests that verify business rules and consumer contracts.

Monitoring Strategy

Alert conditions, recipients, escalation paths, threshold rationale.

Cost Targets

Expected query patterns, estimated cost per pattern, total daily/monthly budget.