Hartono Aquaculture -- Shrimp Production Analysis
Client
Budi Hartono, Owner and Operations Manager at Hartono Aquaculture in Sidoarjo, East Java, Indonesia. Small shrimp farm with 8 circular ponds producing vannamei shrimp for export markets.
What you are building
A multi-source analysis connecting IoT sensor readings (water quality) with production records (harvest outcomes) to determine whether water quality parameters explain differences in harvest performance across Budi's ponds. The analysis will help Budi decide whether to install sensors in his remaining 5 ponds.
Tech stack
- Python 3.11+ (conda "ds" environment)
- DuckDB (analytical database -- data loaded from CSVs)
- DuckDB MCP server (connects Claude Code to the database)
- pandas, statsmodels, scipy, matplotlib/seaborn
- Git / GitHub
Data sources
Two CSV files loaded into DuckDB:
-
sensor-readings.csv -- hourly water quality readings from 3 sensor-equipped ponds (SID-001, SID-003, SID-006) over 6 months. Columns: sensor_id, timestamp, ph, dissolved_oxygen_mg_l, temperature_c, salinity_ppt.
-
production-records.csv -- per-cycle harvest data for all 8 ponds (Pond A through Pond H) over 2 years (4 cycles). Columns: pond_name, cycle_id, cycle_start_date, cycle_end_date, stocking_density_per_m2, survival_rate_pct, avg_weight_g, feed_conversion_ratio, total_yield_kg.
Key data notes
- Sensor ponds use IDs (SID-001, SID-003, SID-006). Production records use names (Pond A-H). Mapping: SID-001=Pond C, SID-003=Pond E, SID-006=Pond G.
- Sensor data has gaps from power outages (~2% missing per sensor).
- Temporal aggregation required: hourly sensor data must be aggregated to per-cycle level before joining to production records.
Analytical approach
- Descriptive and inferential analysis of water quality vs harvest outcomes
- Focus on which water quality parameters (pH, DO, temperature, salinity) correlate with production metrics (survival rate, weight, yield)
- Assess whether sensor data supports the investment in more sensors
- Account for confounds: sensor ponds are newer ponds
Verification targets
- Data path verification: check SQL join logic, filter correctness, temporal aggregation level
- Statistical verification: effect sizes alongside correlations, appropriate for small sample (3 ponds)
- Communication verification: findings use language appropriate for client (no jargon)
Commit convention
Commit after each major analytical milestone with descriptive messages.