Pipeline Specification
Project Overview
Who is the client? What is the business context? What data sources are involved?
Client: Business: Core problem:
Data Sources
For each source: name, format, refresh cadence, volume, key columns. What do you know about each source's quality?
| Source | Format | Refresh cadence | Approximate volume | Key columns | Known quality issues |
|---|---|---|---|---|---|
Layer Architecture
What does each layer do? What naming convention does each follow?
Staging (stg_): Source-conform only. What transformations happen here? What does NOT happen here?
Intermediate (int_): Business logic. What transformations happen here?
Mart (dim_, fct_): Final consumer-facing tables. What joins, aggregations, or reshaping happens here?
Transformation Logic
Key transformations that need attention. Unit conversions, identity resolution, temporal logic.
| Transformation | Source | Target | Logic | Risk |
|---|---|---|---|---|
SCD Strategy
For each dimension: which SCD type and why? What does the business lose with each choice?
| Dimension | SCD type | Rationale (tied to client's analytical needs) |
|---|---|---|
Metric Definitions
For each metric: the exact formula, the business owner, how to verify it matches business intent.
| Metric | Formula | Business owner | Verification method |
|---|---|---|---|
Quality Strategy
What tests catch what failures? Where do dbt tests end and Soda Core begins?
dbt tests:
Soda Core checks:
Freshness policies:
Governance
What fields are PII? What masking approach? Which surfaces need verification?
| PII field | Classification | Masking approach | Verification surfaces |
|---|---|---|---|
Orchestration
Dagster configuration. Refresh cadence. Asset dependencies.
Schedule: Asset graph: Freshness policies: