Learn by Directing AI
All materials

data-dictionary.md

Data Dictionary — Khumalo & Grain Sales Data

Source: Exported from Xero, 12 months ending December 2024. File: khumalo-sales.csv

Columns

Column Name Type Description Constraints / Notes
order_id string Unique order identifier Format: KG-XXXX (sequential)
order_date date Date the order was placed YYYY-MM-DD. Spans January 2024 through December 2024 (12 months).
product_type string Category of furniture piece Values: dining_table, shelving, desk, custom, side_table, bench (exactly six categories)
product_name string Specific piece name Free text. Each piece has a descriptive name (e.g., "Karoo Farmhouse Table", "Woodstock Wall Unit").
channel string Sales channel Values: retail, commission, online (exactly three channels)
retailer_name string Retail partner name Populated when channel = retail. Values: "Weylandts", "@home", "Stellenbosch Gallery". NULL for commission and online orders.
sale_amount decimal Sale price in South African Rand (ZAR) Negative values represent refunds. Refund amounts match the original sale price for that product type. Positive values are the sale price charged to the customer.
retailer_commission_rate decimal Commission percentage taken by retailer Values: 0.35, 0.40, or 0.45 when channel = retail. NULL for commission and online channels.
material_cost decimal Cost of materials in ZAR Always positive. Represents raw material cost for the piece (timber, hardware, finish).
production_days integer Workshop days to produce the piece Dining tables: 16-20 days. Shelving: 5-6 days. Desks: 8-12 days. Custom: 12-25 days. Side tables: 3-5 days. Benches: 4-6 days.
delivery_cost decimal Delivery cost in ZAR R0 for in-store pickup (some retail orders). Otherwise varies by region: Cape Town R500-R1,200; Western Cape R1,500-R2,200; Gauteng R2,500-R3,200; KZN R2,000-R2,800; Online-national R1,800-R3,500. Never NULL.
delivery_region string Delivery destination region Values: "Cape Town", "Western Cape", "Gauteng", "KZN", "Online-national". Retail skews Cape Town and Western Cape. Online skews Gauteng and national.
timber_source string Source of timber used Values: "reclaimed railway sleeper", "sustainable plantation oak", "kiaat", "yellowwood", "imported walnut". For Naledi's own tracking — not part of this analysis.

Notes

  • Row count: 845 rows.
  • Refunds are coded as negative sale_amount values — 25 rows (approximately 3% of the dataset). Refunds are distributed across channels proportionally. Revenue metrics exclude refund rows per the metric definitions.
  • NULL patterns: retailer_name and retailer_commission_rate are NULL for non-retail orders. All other columns are always populated. delivery_cost can be 0 (in-store pickup) but is never NULL.
  • Date distribution: Roughly 55-95 rows per month across the twelve-month period.