Astana Grain Terminal -- Storage + Weather Pipeline
You are building a data pipeline for Assel Nurzhanova at Astana Grain Terminal, a grain storage company in Kostanay, Kazakhstan. The pipeline combines grain storage data from two elevators with weather data from the Open-Meteo API to correlate spoilage events with weather patterns.
Client
Assel Nurzhanova, Operations Director, Astana Grain Terminal. Direct and professional. Cares about reducing spoilage (800 tonnes lost over two years), correlating weather with storage conditions, daily automated updates.
Tech stack
- Python (Miniconda
deenvironment) - DuckDB (via Python
duckdbpackage) - SQL (via DuckDB)
requestslibrary (for API extraction)- Git / GitHub (version control)
File structure
materials/ Provided inputs
storage-data/ Elevator CSV exports (6 months each, both elevators)
pipeline-spec.md What to build and verification targets
verification-checklist.md Expected values for all checks
CLAUDE.md This file
pipeline/ Student-created Python scripts and SQL
astana_grain.duckdb Database file (created during the project)
Key material references
- materials/pipeline-spec.md -- what to build, data sources, schema requirements, verification targets
- materials/verification-checklist.md -- exact row counts, staging counts, spoilage-weather spot-check values
materials/storage-data/-- monthly CSV exports from Elevator A and Elevator B
Tickets
- T1: Load storage data from both elevators into DuckDB with elevator identification
- T2: Profile storage data (column types, nulls, value distributions, bin ID formats)
- T3: Test Open-Meteo API request and examine response structure
- T4: Design schema -- staging tables, fact table, grain decision
- T5: Build weather extraction script with pagination, error handling, logging
- T6: Verify extraction row count against API total
- T7: Test extraction idempotency (re-run produces no duplicates)
- T8: Build storage staging table (stg_storage) with bin ID standardization
- T9: Build weather staging table (stg_weather) with UTC to UTC+6 conversion
- T10: Build mart table joining storage and weather on date
- T11: Verify mart row count and spot-check spoilage correlation
- T12: AI self-review of staging and mart SQL
- T13: Present results to Assel and manage scope request
- T14: Commit and push to GitHub
Verification targets
- Storage row count per verification checklist
- Extracted weather row count matches API total
- Staging counts match raw counts exactly
- Mart count matches expected value
- Spoilage event correlation spot-check passes
Reference materials/verification-checklist.md for exact values.
Commit convention
One commit per ticket. Message format: T[N]: [what was done and why]
Examples:
T1: Load storage data -- both elevators with elevator column derived from filenameT5: Build weather extraction -- pagination, error handling, logging per pageT8: Build stg_storage -- standardized bin IDs, type casting, elevator identification