Learn by Directing AI
All materials

field-mapping.md

Field Mapping: Mill 1 (CSV) to Mill 2 (JSON)

Both mills record the same business data -- paddy intake from farmers -- but use different systems with different field names and formats.

Field mapping

Mill 1 field Mill 2 field Unified name Notes
record_id id record_id Sequential integer. Not a natural key -- assigned by each mill's system independently.
farmer_name supplier_name farmer_name Same concept: the person delivering paddy. Mill 2's newer system uses "supplier" terminology.
paddy_weight_kg weight_kg paddy_weight_kg Weight of paddy delivered in kilograms. Null in Mill 2 for advance payment records.
moisture_pct moisture_percent moisture_pct Moisture content as a percentage. Null in Mill 2 for advance payment records.
grade harvest_quality grade Mill 1 uses text (premium/standard/low). Mill 2 uses letter codes (A/B/C). Map A->premium, B->standard, C->low. Null in Mill 2 for advance payments.
price_mmk payment_amount price_mmk Amount paid in Myanmar Kyat. Present on all records including advance payments.
mill_date processing_date mill_date Date of the milling operation. Mill 2 uses ISO format (YYYY-MM-DD).
intake_time timestamp intake_time Exact time of the transaction. Mill 2 uses ISO timestamp format.

Advance payment records (Mill 2 only)

Some Mill 2 records represent advance payments to farmers for future paddy delivery. These records have:

  • supplier_name and payment_amount populated (who was paid, how much)
  • weight_kg, moisture_percent, and harvest_quality set to null (no paddy delivered yet)

These are normal business operations. Kyaw Zin Oo pays farmers in advance to secure future supply. The payment appears in the data before the corresponding paddy delivery. When the paddy is eventually delivered, it appears as a separate record with all fields populated.

Key considerations

  • record_id / id is NOT a natural key for MERGE. Each mill assigns IDs independently. The same ID number means different records at different mills.
  • The natural key for deduplication should combine mill identifier + farmer/supplier name + mill_date (and potentially grade) to uniquely identify a paddy intake event.
  • Grade mapping must be explicit -- do not rely on AI to infer that A=premium. Define the mapping in the staging model.