Learn by Directing AI
All materials

scd-design-template.md

Slowly-Changing Dimension Design

What are slowly-changing dimensions

Dimension tables describe the entities in your warehouse -- customers, products, locations, equipment. These entities change over time. A customer moves to a new address. A product's price changes. A turbine gets a new gearbox. How you handle these changes in the schema determines what historical analysis is possible and what complexity every downstream query inherits.

Type 1: Overwrite

The current value replaces the old value. No history is preserved.

How it works: When a turbine's gearbox model changes, the dimension row is updated in place. The old gearbox model is gone.

When to use it: When the business does not need the historical value. If nobody will ever ask "what gearbox did this turbine have last year?" then preserving that history adds complexity without analytical value.

What you lose: All history. You cannot compare performance under the old configuration to performance under the new one, because the old configuration no longer exists in the data.

What you gain: Simplicity. Every query returns the current state. No date filtering. No duplicate rows. No effective date logic.

Type 2: History tracking

Every change creates a new row with effective date ranges. The full history is preserved.

How it works: When a turbine's blade model changes from LM 40.3 to LM 43.2 on March 15, the old row gets an effective_to date of 2024-03-15 and a new row begins with effective_from of 2024-03-15 and effective_to of NULL (or a sentinel date like 9999-12-31). Both rows coexist in the dimension table.

When to use it: When the business needs to answer questions that span configuration changes. "Did the blade upgrade improve performance?" requires linking each data point to the configuration that was active at the time.

What you gain: Full history. Every data point links to the right configuration. Before-and-after comparisons become possible.

What you pay: Query complexity. Every join to the dimension must include a date range condition (timestamp BETWEEN effective_from AND effective_to). Missing this filter produces a Cartesian product -- every data point matches every configuration, silently inflating your row count. Table growth -- each change adds a row, so the dimension table grows over time.

Decision criteria

If the business needs... Use Type... Because...
Historical analysis across changes Type 2 Only Type 2 preserves the before/after boundary
Only the current state Type 1 No analytical need for historical values justifies the query complexity
To compare performance under different configurations Type 2 The comparison requires linking data to the right configuration via effective dates
A simple, always-current lookup Type 1 Simplicity serves downstream consumers who only care about "what is it now?"

AI defaults to Type 2 for everything. Type 2 is more complete, but completeness nobody needs is complexity everybody pays for. Evaluate each dimension against what Katrine actually asks about. If she never queries the historical value of a field, Type 1 is the right choice -- simpler schema, simpler queries, no risk of Cartesian products from missing date filters.

Your SCD strategy

For each dimension attribute, choose Type 1 or Type 2. Tie the rationale to Katrine's actual analytical needs -- what questions does she ask that require history?

Dimension attribute SCD type Rationale (what does Katrine need?)