Data Dictionary -- subscribers-v2.csv
Updated dataset with three additional months of subscriber data and a segment column.
| Column | Type | Description | Expected Range |
|---|---|---|---|
| subscriber_id | int | Unique identifier for each subscriber | 1 to ~9,000 |
| tenure_months | int | Number of months the subscriber has been active | 1 to 72 |
| monthly_minutes | float | Average monthly voice call minutes | 0 to 2,000 |
| data_usage_gb | float | Average monthly data consumption in gigabytes | 0 to 50 |
| complaints_count | int | Number of complaints logged in the observation period | 0 to 15 |
| plan_type | categorical | Subscriber plan tier | Basic, Standard, Premium |
| payment_method | categorical | How the subscriber pays | Bank Transfer, Credit Card, Electronic Check, Mailed Check |
| contract_type | categorical | Contract commitment length | Month-to-month, One year, Two year |
| monthly_charges | float | Monthly subscription charge in NGN | 18 to 120 |
| total_charges | float | Cumulative charges over the subscriber's tenure in NGN | 18 to 8,600 |
| churn | binary (0/1) | Whether the subscriber left during the observation period | 0 (stayed) or 1 (churned) |
| segment | categorical | Customer segment based on billing type | prepaid, postpaid |
Notes
- The dataset contains approximately 9,000 rows (original ~7,000 plus three months of additional data).
- The churn rate is approximately 8% overall. Prepaid customers churn at approximately 12%; postpaid customers at approximately 4%.
- The segment split is approximately 55% prepaid / 45% postpaid.
- Some columns have missing values: monthly_charges (~2%), total_charges (~3%), complaints_count (~1%).
- The segment column is new compared to the P1 dataset -- Emeka's data team added it after noticing the prepaid gap.