Bois du Littoral Chain of Custody Pipeline
Client
Francoise Mbeki, Head of Operations at Bois du Littoral SARL (Douala, Cameroon). Timber export company managing four forest concessions, a sawmill, and port logistics. Exports sawn timber to 30+ countries. EU FLEGT compliance requires chain of custody documentation.
What you're building
A multi-source data pipeline that connects three separate systems -- forestry inventory, sawmill production, and customs/export -- to produce automated chain of custody documentation. The pipeline resolves identity across systems (forestry log tags to sawmill batch numbers via a mapping logbook), quarantines unresolvable records, and produces a mart showing inventory, chain of custody status, and yield by species.
Tech stack
- Python (Miniconda
deenvironment) - DuckDB (local analytical database)
- dbt Core + dbt-duckdb adapter (transformation framework)
- Dagster (pipeline orchestration)
- Git / GitHub (version control)
File structure
bois-du-littoral/
materials/
CLAUDE.md (this file)
pipeline-spec-template.md (requirements + schema design template)
verification-checklist.md (expected values for verification)
forestry-sample.csv (12-row sample)
sawmill-sample.csv (12-row sample)
customs-sample.csv (12-row sample)
tag-batch-mapping-sample.csv (12-row sample with inconsistencies)
forestry-data.csv (500 rows -- full dataset)
sawmill-data.csv (320 rows -- full dataset)
customs-data.csv (180 rows -- full dataset)
tag-batch-mapping.csv (480 rows -- full mapping logbook)
dbt-scaffold/ (pre-configured dbt project structure)
dagster-scaffold/ (pre-configured Dagster project structure)
dbt_project/ (student's dbt project -- built during work)
dagster_project/ (student's Dagster project -- built during work)
Data dictionary
Forestry system
concession_id: C1-C4 (C3, C4 are partnership concessions)log_tag: Unique within concession. Format "0NNN" for C1/C2, "PC3-NNN"/"PC4-NNN" for C3/C4species: sapelli, ayous, azobe, iroko, moabivolume_m3: Log volume in cubic meters
Sawmill system
batch_number: Unique. Format "SB-YYYY-NNN"log_volume_in_m3: Total log volume inputsawn_timber_out_m3: Sawn timber output (35-65% of input)waste_percentage: Calculated from input/output
Customs/export system
export_permit_number: Unique. Format "EXP-YYYY-NNN"batch_numbers: Comma-separated list of sawmill batch numbersflegt_status: "complete", "pending", or "incomplete"
Tag-to-batch mapping
forestry_log_tag: Should match forestrylog_tag(has inconsistencies: missing leading zeros, whitespace)sawmill_batch_number: References sawmillbatch_number
Naming conventions
- Staging:
stg_{source}__{entity}(e.g.,stg_forestry__logs,stg_sawmill__batches) - Intermediate:
int_{concept}(e.g.,int_chain_of_custody,int_yield) - Mart:
fct_{entity}ordim_{entity}
Ticket backlog
- T1: Profile and extract all data sources into DuckDB with extraction metadata and error quarantine
- T2: Build staging layer with identity resolution (tag-to-batch mapping with cleaning)
- T3: Build intermediate layer (chain of custody assembly, yield calculations) and mart
- T4: Orchestrate pipeline with Dagster (asset definitions, dependency declarations, lineage verification)
- T5: Present results to Francoise, manage scope, and close project
Verification targets
See verification-checklist.md for specific expected values:
- Per-source row counts
- Tag-to-batch match rate (~92%)
- Chain of custody completion rate
- Yield ranges by species
Commit convention
feat:-- new extraction, model, or assetfix:-- corrections to existing models or logicverify:-- verification findings and correctionsdocs:-- README, decision records, documentation- Commit after each ticket completion