Learn by Directing AI
All materials

pipeline-spec-template.md

Pipeline Specification

Project Overview

Who is the client? What is the business context? What data sources are involved?

Client: Business: Core problem:

Data Sources

For each source: name, format, refresh cadence, volume, key columns. What do you know about each source's quality?

Source Format Refresh cadence Approximate volume Key columns Known quality issues

Layer Architecture

What does each layer do? What naming convention does each follow?

Staging (stg_): Source-conform only. What transformations happen here? What does NOT happen here?

Intermediate (int_): Business logic. What transformations happen here?

Mart (dim_, fct_): Final consumer-facing tables. What joins, aggregations, or reshaping happens here?

Transformation Logic

Key transformations that need attention. Unit conversions, identity resolution, temporal logic.

Transformation Source Target Logic Risk

SCD Strategy

For each dimension: which SCD type and why? What does the business lose with each choice?

Dimension SCD type Rationale (tied to client's analytical needs)

Metric Definitions

For each metric: the exact formula, the business owner, how to verify it matches business intent.

Metric Formula Business owner Verification method

Quality Strategy

What tests catch what failures? Where do dbt tests end and Soda Core begins?

dbt tests:

Soda Core checks:

Freshness policies:

Governance

What fields are PII? What masking approach? Which surfaces need verification?

PII field Classification Masking approach Verification surfaces

Orchestration

Dagster configuration. Refresh cadence. Asset dependencies.

Schedule: Asset graph: Freshness policies: