Learn by Directing AI

Step 1: Analyze correlations on clean data

With the calibration drift handled and the data path verified, run the analysis on the clean joined dataset. Direct AI to compute correlations between water quality aggregates and production outcomes for the three sensor-equipped ponds.

Look at the results. Which water quality parameters show the strongest associations with survival rate? With average weight? With total yield?

Step 2: Compare sensor vs non-sensor ponds

Direct AI to compare production outcomes across all 8 ponds: the 3 with sensors versus the 5 without. Is there a visible difference in survival rates, weights, or yields?

The sensor-equipped ponds likely show better performance. The question is why.

Step 3: Discover the confound

Budi put sensors in the newer ponds. The newer ponds may perform better for reasons that have nothing to do with sensors or water quality monitoring -- better construction, better location, newer pond liners that maintain water quality more consistently.

This is a selection confound. The ponds with sensors were not randomly chosen. Budi chose them because they were his newer, presumably better ponds. Any difference between sensor and non-sensor ponds conflates sensor presence with pond quality, age, location, and every other way the newer ponds differ from the older ones.

If this point does not surface naturally from the data, ask Budi: "Why did you install sensors in those specific ponds?" His answer -- "I put them in the newer ponds" -- makes the confound concrete.

Step 4: Assess what three ponds can tell us

Three sensor-equipped ponds across four harvest cycles gives twelve observations. That is enough to describe patterns (which parameters correlate with better outcomes) but not enough to prove causation or make robust statistical claims.

The validation must match the question type. This is descriptive and exploratory analysis on a small sample, not a controlled experiment. Reporting correlations with effect sizes is appropriate. Claiming that dissolved oxygen causes better survival is not -- the sample is too small and the design is observational.

Direct AI to summarize what the data supports and what it does not.

Step 5: Identify the strongest signal

Minimum dissolved oxygen per cycle likely emerges as the strongest predictor of survival rate. This makes biological sense -- one episode of low oxygen can cause a mortality spike that average oxygen over the cycle would obscure.

This validates the aggregation decision from Unit 3. Choosing minimum DO instead of mean DO was a design act that captured the signal. A different aggregation choice would have hidden it.

Step 6: Frame the sensor recommendation

Budi wants to know: should he install sensors in the remaining five ponds?

The honest answer is: the data suggests water quality monitoring is valuable (the patterns are real and biologically plausible), but the data cannot prove that sensors caused better outcomes in the three ponds that have them. The recommendation is to install sensors, with the caveat that the evidence is suggestive, not definitive.

Frame this in terms Budi understands -- practical farming decisions, not statistical confidence levels.

✓ Check