Step 1: Data quality assessment
Open materials/analysis-template.md and work through the Data Quality Assessment section. This is a professional deliverable -- a structured evaluation that someone reading your work six months from now could use to understand the data's fitness.
For each of the three sources, assess four dimensions:
Completeness. What percentage of fields are populated? Where are the nulls? The taproom's payment_type column has some blanks. The distribution's payment_status has some blanks on standard invoices. The online discount_code is null for ~75% of orders (no discount applied -- that's expected, not a quality issue).
Consistency. Do the three sources agree where they overlap? If the same beer was sold through all three channels in March, do the date ranges align? Do the product catalog's prices match the unit prices in the transaction data?
Accuracy. Can you spot-check known values? If the product catalog says IPA costs 4.20 EUR per liter, does the taproom's unit_price_eur for IPA make sense given the markup for pint sales?
Timeliness. What's the most recent record in each source? Are there gaps -- days with zero transactions that might indicate missing data rather than genuinely no sales?
Step 2: Document cleaning decisions
Fill in the Cleaning Decisions Log section of the analysis template. For each decision you made during the project:
- Product name mapping -- how you matched names across systems. Which rules you used. How many names required manual correction after AI's fuzzy matching.
- Date parsing -- how you handled the three formats. How you verified the parse was correct.
- Consignment return handling -- which option you chose and why. Revenue impact.
- Currency conversion -- which option you chose and why. How much online revenue was affected.
For each decision, note the trade-off. What information was lost or approximated? How many rows were affected? This isn't bureaucracy -- it's the record that makes the analysis reproducible. The next analyst who works with Carmen's data should know exactly what you did and why.
Step 3: Write the metric definitions section
Fill in the Metric Definitions section of the analysis template with the final versions of revenue, volume, and growth. Each metric needs:
- Plain-language definition
- SQL expression
- Edge cases documented
- Cross-check findings (what the second AI found, what you fixed)
These definitions are the governance record. They prevent the next analyst from re-litigating what "revenue" means. They prevent Carmen's sales manager from computing his own version with a different formula.
Step 4: Review the analysis template
Read the complete analysis template from top to bottom. Is it a document a colleague could pick up and understand?
Check: does every metric definition have both a plain-language description and a SQL expression? Are all four quality dimensions assessed for all three sources? Are all cleaning decisions documented with their trade-offs?
If any section is thin -- a quality dimension with "looks fine" instead of specific findings -- fill it in. The quality assessment is the professional deliverable from this project alongside the dashboard.
Step 5: Final commit and push
Commit all remaining work and push to GitHub:
git add -A
git commit -m "Complete analysis template: quality assessment, metric definitions, cleaning decisions"
git push
The repository should now contain: the unified dataset, the metric definitions, the cleaning decisions, the segmentation analysis, and the Metabase dashboard configuration. Carmen has her Monday-morning dashboard. The next analyst has the documentation to understand every number on it.
Check: Open your analysis template. Does every metric definition have both a plain-language description and a SQL expression? Are all cleaning decisions documented with their trade-offs?