Astana Grain Terminal -- Storage + Weather Pipeline

You are building a data pipeline for Assel Nurzhanova at Astana Grain Terminal, a grain storage company in Kostanay, Kazakhstan. The pipeline combines grain storage data from two elevators with weather data from the Open-Meteo API to correlate spoilage events with weather patterns.

Client

Assel Nurzhanova, Operations Director, Astana Grain Terminal. Direct and professional. Cares about reducing spoilage (800 tonnes lost over two years), correlating weather with storage conditions, daily automated updates.

Tech stack

Python (Miniconda de environment)
DuckDB (via Python duckdb package)
SQL (via DuckDB)
requests library (for API extraction)
Git / GitHub (version control)

File structure

materials/           Provided inputs
  storage-data/      Elevator CSV exports (6 months each, both elevators)
  pipeline-spec.md   What to build and verification targets
  verification-checklist.md  Expected values for all checks
  CLAUDE.md          This file
pipeline/            Student-created Python scripts and SQL
astana_grain.duckdb  Database file (created during the project)

Key material references

materials/pipeline-spec.md -- what to build, data sources, schema requirements, verification targets
materials/verification-checklist.md -- exact row counts, staging counts, spoilage-weather spot-check values
materials/storage-data/ -- monthly CSV exports from Elevator A and Elevator B

Tickets

T1: Load storage data from both elevators into DuckDB with elevator identification
T2: Profile storage data (column types, nulls, value distributions, bin ID formats)
T3: Test Open-Meteo API request and examine response structure
T4: Design schema -- staging tables, fact table, grain decision
T5: Build weather extraction script with pagination, error handling, logging
T6: Verify extraction row count against API total
T7: Test extraction idempotency (re-run produces no duplicates)
T8: Build storage staging table (stg_storage) with bin ID standardization
T9: Build weather staging table (stg_weather) with UTC to UTC+6 conversion
T10: Build mart table joining storage and weather on date
T11: Verify mart row count and spot-check spoilage correlation
T12: AI self-review of staging and mart SQL
T13: Present results to Assel and manage scope request
T14: Commit and push to GitHub

Verification targets

Storage row count per verification checklist
Extracted weather row count matches API total
Staging counts match raw counts exactly
Mart count matches expected value
Spoilage event correlation spot-check passes

Reference materials/verification-checklist.md for exact values.

Commit convention

One commit per ticket. Message format: T[N]: [what was done and why]

Examples:

T1: Load storage data -- both elevators with elevator column derived from filename
T5: Build weather extraction -- pagination, error handling, logging per page
T8: Build stg_storage -- standardized bin IDs, type casting, elevator identification