Step 1: Add SCD-specific dbt tests
SCD models have failure modes that standard dbt tests don't cover. You need tests designed for temporal data.
Direct Claude to add these dbt tests:
No overlapping effective date ranges. For the same turbine, no two configuration records should have overlapping periods. If turbine DK01-T02 has one record ending on March 15 and another starting on March 14, those records overlap -- and any SCADA reading between March 14-15 would match both configurations, doubling the rows in the fact table.
No orphaned SCADA records. Every SCADA reading in the fact table should join to exactly one configuration. Test: count SCADA records that match zero configurations (orphans) and records that match more than one (duplicates from Cartesian products).
Current configuration identification. Every turbine should have exactly one record with effective_to equal to the sentinel date (9999-12-31). Zero means the turbine's history was truncated. More than one means conflicting "current" records.
These tests catch problems that don't produce errors during normal pipeline execution. A Cartesian product from overlapping dates doesn't fail -- it returns more rows than expected, silently inflating every downstream metric.
Step 2: Add Soda Core checks for SCD patterns
Soda Core monitors trends and batch-level patterns that row-level dbt tests miss.
Add a trend check on the SCD dimension: the number of new configuration records per day should be low. Component changes are infrequent events -- a turbine might go months without a change. An unexpected spike in new configuration records suggests data quality issues in the change log, not a sudden wave of turbine upgrades.
Step 3: Configure Dagster differential freshness policies
The SCADA fact table receives new data every 10 minutes. The SCD dimension table updates only when a component change is logged -- which might not happen for weeks.
These tables need different freshness policies. A 1-hour freshness threshold on the fact table is reasonable -- if no new SCADA data arrives for an hour, something is wrong with the data feed. But a 1-hour threshold on the SCD dimension would trigger constant violations because the dimension legitimately goes weeks without updates.
Direct Claude to configure Dagster freshness policies. AI commonly sets the same threshold for both tables. Evaluate the suggested thresholds against the data's actual update cadence. The fact table needs tight monitoring. The dimension needs loose monitoring -- a 7-day or even 14-day freshness window that only alerts when the dimension table hasn't been touched for an unusually long period.
graph TD
S1[scada_source<br/>every 10 min] --> STG1[stg_scada]
S2[component_changes<br/>infrequent] --> STG2[stg_component_changes]
STG1 --> FCT[fct_turbine_performance<br/>Freshness: 1 hour]
STG2 --> DIM[dim_turbine_configuration<br/>Freshness: 7 days]
DIM --> FCT
A freshness policy is a contract with Katrine and the farm owners. A pipeline that runs successfully but whose source hasn't been updated produces fresh runs with stale data. The freshness policy catches the case where everything looks green but nothing new has arrived.
Step 4: Update the CI/CD pipeline
Update the GitHub Actions CI/CD workflow to include the SCD-specific tests and Soda Core checks. The quality gate should run all dbt tests (including the new temporal tests) and the Soda Core checks on every push.
Step 5: Run the full test suite
Run the complete test suite: dbt tests, Soda Core checks, and verify the CI/CD pipeline triggers correctly.
dbt test
Fix any failures. If the overlapping date test catches an issue, trace it back to the SCD dimension build logic. If the orphan test catches unmatched SCADA records, decide whether the fix is in the dimension build or the staging layer.
After all tests pass, commit and push to verify the CI/CD pipeline runs the quality gate.
Check: If someone added a component change with overlapping dates (effective_from before the previous record's effective_to), would your tests catch it?