Learn by Directing AI
All materials

CLAUDE.md

Bois du Littoral Chain of Custody Pipeline

Client

Francoise Mbeki, Head of Operations at Bois du Littoral SARL (Douala, Cameroon). Timber export company managing four forest concessions, a sawmill, and port logistics. Exports sawn timber to 30+ countries. EU FLEGT compliance requires chain of custody documentation.

What you're building

A multi-source data pipeline that connects three separate systems -- forestry inventory, sawmill production, and customs/export -- to produce automated chain of custody documentation. The pipeline resolves identity across systems (forestry log tags to sawmill batch numbers via a mapping logbook), quarantines unresolvable records, and produces a mart showing inventory, chain of custody status, and yield by species.

Tech stack

  • Python (Miniconda de environment)
  • DuckDB (local analytical database)
  • dbt Core + dbt-duckdb adapter (transformation framework)
  • Dagster (pipeline orchestration)
  • Git / GitHub (version control)

File structure

bois-du-littoral/
  materials/
    CLAUDE.md                    (this file)
    pipeline-spec-template.md    (requirements + schema design template)
    verification-checklist.md    (expected values for verification)
    forestry-sample.csv          (12-row sample)
    sawmill-sample.csv           (12-row sample)
    customs-sample.csv           (12-row sample)
    tag-batch-mapping-sample.csv (12-row sample with inconsistencies)
    forestry-data.csv            (500 rows -- full dataset)
    sawmill-data.csv             (320 rows -- full dataset)
    customs-data.csv             (180 rows -- full dataset)
    tag-batch-mapping.csv        (480 rows -- full mapping logbook)
    dbt-scaffold/                (pre-configured dbt project structure)
    dagster-scaffold/            (pre-configured Dagster project structure)
  dbt_project/                   (student's dbt project -- built during work)
  dagster_project/               (student's Dagster project -- built during work)

Data dictionary

Forestry system

  • concession_id: C1-C4 (C3, C4 are partnership concessions)
  • log_tag: Unique within concession. Format "0NNN" for C1/C2, "PC3-NNN"/"PC4-NNN" for C3/C4
  • species: sapelli, ayous, azobe, iroko, moabi
  • volume_m3: Log volume in cubic meters

Sawmill system

  • batch_number: Unique. Format "SB-YYYY-NNN"
  • log_volume_in_m3: Total log volume input
  • sawn_timber_out_m3: Sawn timber output (35-65% of input)
  • waste_percentage: Calculated from input/output

Customs/export system

  • export_permit_number: Unique. Format "EXP-YYYY-NNN"
  • batch_numbers: Comma-separated list of sawmill batch numbers
  • flegt_status: "complete", "pending", or "incomplete"

Tag-to-batch mapping

  • forestry_log_tag: Should match forestry log_tag (has inconsistencies: missing leading zeros, whitespace)
  • sawmill_batch_number: References sawmill batch_number

Naming conventions

  • Staging: stg_{source}__{entity} (e.g., stg_forestry__logs, stg_sawmill__batches)
  • Intermediate: int_{concept} (e.g., int_chain_of_custody, int_yield)
  • Mart: fct_{entity} or dim_{entity}

Ticket backlog

  • T1: Profile and extract all data sources into DuckDB with extraction metadata and error quarantine
  • T2: Build staging layer with identity resolution (tag-to-batch mapping with cleaning)
  • T3: Build intermediate layer (chain of custody assembly, yield calculations) and mart
  • T4: Orchestrate pipeline with Dagster (asset definitions, dependency declarations, lineage verification)
  • T5: Present results to Francoise, manage scope, and close project

Verification targets

See verification-checklist.md for specific expected values:

  • Per-source row counts
  • Tag-to-batch match rate (~92%)
  • Chain of custody completion rate
  • Yield ranges by species

Commit convention

  • feat: -- new extraction, model, or asset
  • fix: -- corrections to existing models or logic
  • verify: -- verification findings and corrections
  • docs: -- README, decision records, documentation
  • Commit after each ticket completion