Learn by Directing AI
Unit 4

Build the SCD Type 2 dimension and link to SCADA data

Step 1: Build the SCD Type 2 dimension

Now build the dimension model that implements your SCD strategy. The dim_turbine_configuration model represents each turbine in a specific configuration, with effective_from and effective_to date columns.

Direct Claude to build the model. Feed it the SCD design decision from your CLAUDE.md and the component change staging model. Each row in the dimension represents a turbine during a specific configuration period. When a component change occurs, the old record's effective_to is set to the change date and a new record begins with effective_from equal to the change date.

For the current configuration (no subsequent change), use a sentinel date for effective_to -- 9999-12-31 rather than NULL. This simplifies range queries: every temporal join uses BETWEEN effective_from AND effective_to without needing a COALESCE for null handling.

AI commonly gets edge case handling wrong for temporal dimension models -- boundary conditions at the exact change date, overlapping records, and sentinel values. Run verification queries before trusting the output.

Step 2: Verify the SCD output

Query the dimension to check the structure:

SELECT turbine_id, COUNT(*) as config_records
FROM dim_turbine_configuration
GROUP BY turbine_id
ORDER BY config_records DESC;

Turbines with component changes should have multiple records. Turbines with no changes should have exactly one record. If a turbine with two gearbox replacements and a blade upgrade has fewer than four records, something was lost in the SCD logic.

Check for overlapping date ranges -- a critical SCD failure:

SELECT a.turbine_id, a.effective_from, a.effective_to, b.effective_from as overlap_from
FROM dim_turbine_configuration a
JOIN dim_turbine_configuration b 
  ON a.turbine_id = b.turbine_id 
  AND a.effective_from < b.effective_from 
  AND a.effective_to > b.effective_from;

This query should return zero rows. If it returns anything, two configuration periods overlap for the same turbine, which means a single SCADA reading could match two configurations -- doubling that reading in any downstream join.

Step 3: Build the fact table with temporal joins

Build fct_turbine_performance. This model joins SCADA data to the configuration dimension -- but not with a simple key join. The join condition uses a date range:

scada.timestamp BETWEEN config.effective_from AND config.effective_to

This is a temporal join. Every SCADA reading links to the configuration that was active at the time it was recorded. The blade upgrade on March 15 means pre-March readings link to the old blade model and post-March readings link to the new one.

Direct Claude to build this model. Be explicit about the temporal join condition in your prompt.

Step 4: Test the temporal join

Query a specific turbine that had a blade upgrade. DK01-T02 received a blade upgrade from LM 46.8 to LM 43.2 on March 15, 2024.

SELECT turbine_id, 
       DATE_TRUNC('month', timestamp) as month,
       blade_model,
       ROUND(AVG(power_output_kw), 1) as avg_power,
       COUNT(*) as readings
FROM fct_turbine_performance
WHERE turbine_id = 'DK01-T02'
GROUP BY turbine_id, month, blade_model
ORDER BY month;

January and February readings should link to blade_model LM 46.8. April onward should link to LM 43.2. March might split between the two depending on when readings fall relative to the change date.

Now test what happens if the temporal join is broken. Run the same query but remove the date range filter -- join only on turbine_id:

SELECT COUNT(*) as with_temporal_join FROM fct_turbine_performance WHERE turbine_id = 'DK01-T02';

Compare this row count to what you'd get with a simple key join (without the effective date condition). If the turbine has two configuration records, a key-only join produces a Cartesian product -- every SCADA reading matches both configurations, doubling the row count. The numbers look plausible. No error is thrown. The average power output changes because each reading is counted twice. This is the SCD version of a silent failure.

Step 5: Cross-check the SCD implementation

Direct a second Claude session (or start a fresh session with consolidated context) to review the SCD implementation. Feed the new session the SCD dimension model and the fact table model, without the design decisions that led to them.

Ask the reviewer to look for:

  • Edge cases in the effective date logic. What happens to a SCADA reading timestamped on the exact date of a component change? Is it assigned to the old configuration, the new one, or both?
  • Whether the sentinel date (9999-12-31) is handled correctly in all queries
  • Whether the temporal join could produce duplicates under any condition

Fresh context catches problems that established context normalizes. The first session built these models and sees them as correct because it designed them. The second session has no history of why these choices were made -- it evaluates the models on their structural merits.

✓ Check

Check: Query a turbine that had a blade upgrade. Does the fact table correctly separate pre-upgrade and post-upgrade SCADA records? What happens if you remove the date filter from the join -- does the row count change?