Pipeline Specification
Project overview
[Describe the project: who is the client, what do they need, what does the pipeline produce]
Sources
| Source name | Format | Location | Refresh strategy | Watermark column | Notes |
|---|---|---|---|---|---|
| [Source 1] | [CSV/JSON/API] | [File path or endpoint] | [Full / Incremental] | [Column name or N/A] | [Any special considerations] |
| [Source 2] |
Target schema
Staging layer
[Define the staging models. What does each one do? Source-conform only -- no business logic.]
| Model name | Source | Key fields | Materialization |
|---|---|---|---|
| [stg...]_ | [Source name] | [List key fields] | [table / incremental] |
Intermediate layer
[Define intermediate models that combine or reshape staging data.]
| Model name | Sources | Purpose | Materialization |
|---|---|---|---|
| [int...]_ | [Which staging models] | [What this model does] | [table / incremental] |
Mart layer
[Define the mart models that serve business users.]
| Model name | Purpose | Grain | Key metrics |
|---|---|---|---|
| [fct...]_ | [What business question it answers] | [One row per...] | [Key columns] |
| [dim...]_ | [What entity it tracks] | [One row per...] | [Key columns] |
Extraction pattern
Full vs incremental decision
[For each source, decide: full refresh or incremental? Document your reasoning.]
| Source | Strategy | Rationale |
|---|---|---|
| [Source 1] | [Full / Incremental] | [Why this strategy for this source] |
| [Source 2] |
Watermark design
[For incremental sources: which column is the watermark? How trustworthy is it? What are the edge cases?]
MERGE key design
[Define the natural key for MERGE (upsert) operations. What uniquely identifies a record?]
| Model | MERGE key columns | Rationale |
|---|---|---|
| [stg...]_ | [Column list] | [Why these columns uniquely identify a record] |
Quality checks
dbt tests
[List the dbt tests: unique, not_null, accepted_values, custom business logic tests]
Soda Core checks
[List the Soda Core trend checks: row count ranges, statistical bounds, freshness]
Monitoring
Watermark progression
[How will you monitor that the watermark advances on each run? What's the alert threshold?]
Schedule
[How often does the pipeline run? What triggers it? Schedule-based or sensor-based?]