Learn by Directing AI
Unit 5

Validate and interpret

Step 1: Design the validation strategy

You have results from the interrupted time series analysis. Before trusting them, design the validation strategy. What needs checking for this kind of analysis?

Time series data has specific concerns that cross-sectional data does not. Use meta-prompting: "I have run an interrupted time series analysis on air quality data -- help me figure out what could go wrong with this model and what checks would catch each problem."

The key concerns for interrupted time series:

  • Autocorrelation in the residuals. Time series observations are not independent -- today's PM2.5 is correlated with yesterday's. If the model has not captured this structure, the standard errors are wrong and the significance test is unreliable.
  • Seasonal model adequacy. If the seasonal controls do not fully capture the real seasonal pattern, the residual seasonal variation could be confused with the regulation's effect.
  • Sensitivity to time resolution. Does the estimated effect change substantially if you use weekly averages instead of monthly? If it does, the result is fragile.

Document your validation strategy in the methodology memo. What you are checking, why each check matters for this specific analysis, and what a failing result would mean.

Step 2: Run the validation checks

Run the checks you designed. Assumption checks appropriate for time series data. Plot the residuals. Look at the autocorrelation function -- are the residuals independent after accounting for seasonal and weather effects?

Compute effect sizes and confidence intervals for the regulation's impact in each city. Not just "significant at p < 0.05" but "PM2.5 decreased by X micrograms per cubic meter (95% CI: Y to Z)." The confidence interval is what Astrid's report needs -- it tells the agency the range of plausible effects.

Step 3: Check for data anomalies

Look at the station-level results. Does the effect direction and magnitude make sense across all 40 stations? Are there any stations that behave very differently from the others?

If you have not already asked Astrid about data quality at specific stations, this is where anomalies in the data might lead you to ask. Some stations may have readings that shifted for reasons unrelated to the regulation. The data can tell you something is unusual; only Astrid can tell you why.

Step 4: Interpret honestly

What does the analysis support? What does it not support?

If the regulation's effect is statistically significant with a meaningful effect size -- say, PM2.5 decreased by 2-4 micrograms per cubic meter with a confidence interval that does not include zero -- then the evidence supports a real effect. But "supports" is not "proves." The analysis controlled for weather, season, and trend. It did not control for every possible confounder. The honest interpretation acknowledges what was controlled for and what was not.

If the effect is uncertain -- a wide confidence interval, or significant in one city but not another -- that is a finding too. Astrid said it explicitly: "If the data doesn't support a clear conclusion, she needs to say that."

Step 5: Cross-model review

Direct a second AI perspective to review your methodology and interpretation. Give it the model specification, the results, the assumption checks. Ask: does this analysis support the conclusions drawn? Are there methodological concerns?

Compare its assessment with your own. Did it flag anything you missed? Does it agree with your interpretation of the confidence intervals?

✓ Check

Check: Validation strategy designed and documented. Autocorrelation checked. Seasonal model assessed. Effect sizes and confidence intervals computed. Malmo station anomaly addressed. Cross-model review completed.