Learn by Directing AI
Unit 2

Data quality assessment

Step 1: The quality assessment template

Open materials/quality-assessment-template.md. Four sections:

  1. Data overview -- source, date range, row count, column summary
  2. Quality issue classification -- each issue categorized as fixable, flaggable, or blocking
  3. Business impact summary -- what the issues mean for Diego's dashboard
  4. Recommendations -- prioritized actions

A quality assessment is a professional deliverable, not a checklist you run through and forget. The classification matters. A fixable issue (inconsistent casing in class names) is different from a flaggable issue (members with zero revenue that need investigation) is different from a blocking issue (primary key violations that make the data unreliable). AI will generate statistics. You provide the judgment about what those statistics mean for Diego's questions.

Step 2: Comprehensive quality checks

Direct AI to run quality checks on the FitPro data. Be specific about what to look for:

Run a comprehensive quality check on the FitPro data:
- Check for duplicate rows on (member_id, transaction_type, transaction_date, class_type)
- Check for null patterns in each column
- Check for impossible values: negative revenue, future dates, members with multiple active memberships
- Check consistency of categorical values: location names, membership types, class types
- Check value ranges for amount_crc by location and membership type

AI commonly generates generic quality checks -- the same checks regardless of whether you are building a revenue dashboard or a geographic segmentation. Watch for this. The checks that matter depend on what Diego is asking for. Revenue amounts matter a lot. Postal codes do not.

Step 3: Classifying findings

Work through the findings and classify each one. Fill in the quality assessment template's classification table.

Some findings will be straightforward. Inconsistent casing in class names ("YOGA" vs "yoga" vs "Yoga") is fixable -- standardize and move on.

Other findings need investigation. Members with zero revenue in their first month could be a data error or could be a business pattern. Revenue amounts at one location that are consistently higher than other locations could be a data issue or could be a real pricing difference. These are flaggable -- you cannot classify them until you understand the business context.

If you find something that would make the analysis unreliable -- duplicate primary keys, data from the wrong time period, impossible values in critical fields -- that is blocking. The professional response is to stop and report, not to work around it.

Step 4: Ask Diego

The flaggable issues point to questions only Diego can answer. Email Diego about what you have found.

If members have zero revenue in their first month, ask about it. If one location's prices are noticeably different, ask about location-specific pricing. If trainer names appear in unusual patterns, ask about the staffing setup.

Diego responds fast. He is on his phone. He will answer what you ask -- but he will not volunteer information you did not ask about. The questions you ask determine what you discover.

Step 5: Business impact

Now that Diego has explained the context behind the flaggable issues, write the business impact section of the quality assessment. Translate technical findings into terms Diego and his investors would understand.

"12% of new members in the past 6 months show zero revenue for their first month" is a technical finding. "These are referral-promotion members who are active but not yet paying -- if included in 'average revenue per member,' the metric is deflated by approximately X CRC" is a business impact. The second version tells Diego what it means for the dashboard he is asking for.

Step 6: Cross-check the assessment

Direct a second AI -- a different model or a fresh context window -- to review your quality assessment. Give the second AI only the data and the completed assessment:

Here is a data quality assessment for a gym chain's management data. Review the classification of each issue. Are any "fixable" issues actually "flaggable"? Are any "flaggable" issues actually "blocking"? Challenge the business impact statements.

Cross-checking works because a fresh context does not carry the assumptions you built up while writing the assessment. Issues that felt obvious to you may look different to a reviewer who has not been inside the analysis for the past hour.

✓ Check

Check: How many quality issues did you classify? How many are fixable, how many flaggable, how many blocking? Can you explain the business impact of each flaggable issue in one sentence?