Learn by Directing AI
All materials

pipeline-spec-template.md

Pipeline Specification -- Bois du Littoral Chain of Custody

1. Overview

Bois du Littoral SARL is a timber export company based in Douala, Cameroon. The company manages four forest concessions, operates a sawmill, and handles export logistics out of Douala port. They export sawn timber to approximately 30 countries across Europe, Asia, and West Africa.

EU buyers require FLEGT (Forest Law Enforcement, Governance and Trade) chain of custody documentation: proof that every piece of timber in a shipment can be traced back to a legally managed forest concession. The current manual process takes one week per shipment. The company exports approximately 40 times per year.

The pipeline connects three separate data systems to produce automated chain of custody documentation and operational reporting.

2. Data Sources

Forestry Inventory System

  • Format: CSV export
  • Records: ~500 rows (Jan 2023 - Dec 2024)
  • Key columns: concession_id, concession_name, log_tag, species, gps_lat, gps_lon, harvest_date, harvest_permit_number, volume_m3, harvest_team
  • Identification: Log tags (format "0NNN") painted on the log end at the forest concession. Unique within each concession but not globally unique across concessions.
  • Notes: Four concessions (C1-C4). C3 and C4 are partnership concessions with data in a different format. GPS coordinates have ~5% null rate (some entries logged without GPS).

Sawmill Production Database

  • Format: CSV export from production database
  • Records: ~320 rows (Jan 2023 - Dec 2024)
  • Key columns: batch_number, processing_date, log_intake_count, log_volume_in_m3, sawn_timber_out_m3, waste_percentage, species, grade
  • Identification: Batch numbers (format "SB-YYYY-NNN") assigned at the sawmill when logs are processed. No reference to forestry log tags.
  • Notes: Yield (sawn_timber_out / log_volume_in) ranges from 35-65% depending on species and grade. Hardwoods typically yield lower than softwoods.

Customs/Export System

  • Format: CSV export
  • Records: ~180 rows (Jan 2023 - Dec 2024)
  • Key columns: export_permit_number, container_id, destination_port, destination_country, shipment_date, batch_numbers (comma-separated), total_weight_kg, flegt_status
  • Identification: Export permit numbers (format "EXP-YYYY-NNN"). References sawmill batch numbers directly.
  • Notes: FLEGT status is "complete" (~85%), "pending" (~10%), or "incomplete" (~5%). The batch_numbers field contains comma-separated sawmill batch number references.

Tag-to-Batch Mapping Logbook

  • Format: CSV (digitized paper logbook from sawmill gate)
  • Records: ~480 rows
  • Key columns: forestry_log_tag, sawmill_batch_number, entry_date, recorded_by
  • Notes: This is the bridge between forestry and sawmill systems. The logbook was digitized manually and contains inconsistencies: some tags recorded without leading zeros ("247" instead of "0247"), some with trailing whitespace, some data entry errors producing unmatchable entries. Partnership concession tags use format "PC3-NNN"/"PC4-NNN".

3. Requirements

  1. Chain of custody: Connect forest inventory to sawmill production to export documentation. Every shipment must trace back to specific forestry log tags from specific concessions.
  2. FLEGT documentation: Automatically identify which shipments have complete chain of custody and which have gaps.
  3. Inventory view: Total inventory across all four concessions showing species, volume, and stage in the pipeline (forest, sawmill, exported).
  4. Yield tracking: Sawmill yield (sawn timber output / log volume input) by species.
  5. Gap detection: Flag any shipment where the chain of custody has a break -- a sawmill batch that can't be traced to forestry tags, or forestry tags that can't be matched to a sawmill batch.

4. Schema Design

5. Layer Architecture

6. Verification Targets

See verification-checklist.md for specific expected values to check your pipeline against.