Learn by Directing AI
Unit 3

The connected analysis

Step 1: Profile sensors via DuckDB

Direct AI to profile the sensor readings through the DuckDB connection. Summary statistics, temporal coverage, missing value patterns per sensor.

Watch how AI accesses the data. It may run broad queries -- scanning the full table when you only need summary statistics. AI connected to a database tends to query eagerly. If you notice AI running large scans, scope the request: "Just give me the count, min, max, and mean for each sensor -- don't pull all the rows."

Step 2: Profile production via DuckDB

Direct AI to profile the production records through DuckDB. Harvest outcomes by pond, seasonal patterns, yield distributions across the four cycles.

The production table is small (32 rows), so eager querying is not a concern here. Look at the patterns: do some ponds consistently outperform others? Is there a seasonal difference between C1 (wet season) and C2 (dry season)?

Step 3: Create the join

Direct AI to join the two tables using Budi's mapping: SID-001 = Pond C, SID-003 = Pond E, SID-006 = Pond G. The sensor data is hourly; the production records are per-cycle. Before the tables can join meaningfully, the sensor readings need to be aggregated to the cycle level.

This is a design decision. Which aggregation matters? The mean pH over a 90-day cycle tells you the average conditions. The minimum dissolved oxygen tells you whether there were critical low-oxygen events. For shrimp farming, the minimum matters more than the mean -- one bad day of low DO can cause a mortality spike that the average obscures.

Direct AI to compute both: mean and minimum for each parameter, per pond, per cycle. Then join with the production records.

Step 4: Verify the join logic

Check what AI produced. Look at the SQL query or the resulting data:

  • Does the join include only the 3 sensor-equipped ponds (Pond C, Pond E, Pond G)? Or did AI try to join all 8?
  • Does the temporal alignment use the cycle date ranges? Or did AI aggregate over the entire 6-month period?
  • Is the pond mapping correct? SID-001 to Pond C, not SID-001 to Pond A?

If the join logic is wrong, the downstream analysis is wrong regardless of how sophisticated the statistics are. Verifying the data path -- the queries, joins, and filters AI used -- is a different verification task from checking the statistical output.

Step 5: Compute water quality aggregates

Direct AI to produce a clean joined table: each row is one pond in one cycle, with water quality aggregates (mean pH, min DO, mean temperature, mean salinity, plus max temperature as a stress indicator) alongside production metrics (survival rate, average weight, FCR, total yield).

Step 6: Analyze correlations

Direct AI to compute correlations between the water quality aggregates and the production outcomes for the three sensor-equipped ponds.

With only three ponds and four cycles each, the sample is small. Correlations are descriptive, not definitive. Look for which parameters show the strongest associations. Minimum dissolved oxygen and survival rate is the relationship to watch.

✓ Check

Check: The joined dataset should have production records for the 3 sensor-equipped ponds with water quality aggregates attached. The join produced rows only for the 3 mapped ponds (not all 8), and the temporal aggregation used per-cycle periods (not the full 6-month average).