Learn by Directing AI
All materials

CLAUDE.md

Astana Grain Terminal -- Storage + Weather Pipeline

You are building a data pipeline for Assel Nurzhanova at Astana Grain Terminal, a grain storage company in Kostanay, Kazakhstan. The pipeline combines grain storage data from two elevators with weather data from the Open-Meteo API to correlate spoilage events with weather patterns.

Client

Assel Nurzhanova, Operations Director, Astana Grain Terminal. Direct and professional. Cares about reducing spoilage (800 tonnes lost over two years), correlating weather with storage conditions, daily automated updates.

Tech stack

  • Python (Miniconda de environment)
  • DuckDB (via Python duckdb package)
  • SQL (via DuckDB)
  • requests library (for API extraction)
  • Git / GitHub (version control)

File structure

materials/           Provided inputs
  storage-data/      Elevator CSV exports (6 months each, both elevators)
  pipeline-spec.md   What to build and verification targets
  verification-checklist.md  Expected values for all checks
  CLAUDE.md          This file
pipeline/            Student-created Python scripts and SQL
astana_grain.duckdb  Database file (created during the project)

Key material references

Tickets

  • T1: Load storage data from both elevators into DuckDB with elevator identification
  • T2: Profile storage data (column types, nulls, value distributions, bin ID formats)
  • T3: Test Open-Meteo API request and examine response structure
  • T4: Design schema -- staging tables, fact table, grain decision
  • T5: Build weather extraction script with pagination, error handling, logging
  • T6: Verify extraction row count against API total
  • T7: Test extraction idempotency (re-run produces no duplicates)
  • T8: Build storage staging table (stg_storage) with bin ID standardization
  • T9: Build weather staging table (stg_weather) with UTC to UTC+6 conversion
  • T10: Build mart table joining storage and weather on date
  • T11: Verify mart row count and spot-check spoilage correlation
  • T12: AI self-review of staging and mart SQL
  • T13: Present results to Assel and manage scope request
  • T14: Commit and push to GitHub

Verification targets

  • Storage row count per verification checklist
  • Extracted weather row count matches API total
  • Staging counts match raw counts exactly
  • Mart count matches expected value
  • Spoilage event correlation spot-check passes

Reference materials/verification-checklist.md for exact values.

Commit convention

One commit per ticket. Message format: T[N]: [what was done and why]

Examples:

  • T1: Load storage data -- both elevators with elevator column derived from filename
  • T5: Build weather extraction -- pagination, error handling, logging per page
  • T8: Build stg_storage -- standardized bin IDs, type casting, elevator identification