Learn by Directing AI
All materials

data-dictionary-v2.md

Data Dictionary -- subscribers-v2.csv

Updated dataset with three additional months of subscriber data and a segment column.

Column Type Description Expected Range
subscriber_id int Unique identifier for each subscriber 1 to ~9,000
tenure_months int Number of months the subscriber has been active 1 to 72
monthly_minutes float Average monthly voice call minutes 0 to 2,000
data_usage_gb float Average monthly data consumption in gigabytes 0 to 50
complaints_count int Number of complaints logged in the observation period 0 to 15
plan_type categorical Subscriber plan tier Basic, Standard, Premium
payment_method categorical How the subscriber pays Bank Transfer, Credit Card, Electronic Check, Mailed Check
contract_type categorical Contract commitment length Month-to-month, One year, Two year
monthly_charges float Monthly subscription charge in NGN 18 to 120
total_charges float Cumulative charges over the subscriber's tenure in NGN 18 to 8,600
churn binary (0/1) Whether the subscriber left during the observation period 0 (stayed) or 1 (churned)
segment categorical Customer segment based on billing type prepaid, postpaid

Notes

  • The dataset contains approximately 9,000 rows (original ~7,000 plus three months of additional data).
  • The churn rate is approximately 8% overall. Prepaid customers churn at approximately 12%; postpaid customers at approximately 4%.
  • The segment split is approximately 55% prepaid / 45% postpaid.
  • Some columns have missing values: monthly_charges (~2%), total_charges (~3%), complaints_count (~1%).
  • The segment column is new compared to the P1 dataset -- Emeka's data team added it after noticing the prepaid gap.