Learn by Directing AI
Unit 4

The inferential analysis

Step 1: Run the regression model

Direct AI to fit a linear regression predicting monthly confirmed bookings. Use statsmodels OLS for the full coefficient output -- you need coefficients, standard errors, confidence intervals, and p-values, not just predictions.

The predictors:

  • marketing_shift (binary) -- the key variable
  • Seasonal indicators (month dummies or peak-season flag)
  • Luxor launch indicator (binary)

Ask for the full model summary. Read it carefully. The marketing_shift coefficient is the central finding. It tells Hassan: "after controlling for seasonality and the Luxor launch, the shift to digital marketing is associated with approximately X additional bookings per month."

Look at the R-squared. It tells you how much of the variation in monthly bookings the model explains. Look at the F-statistic. It tells you whether the model as a whole is significant. But the individual coefficients are what answer Hassan's question.

Step 2: Interpret the coefficients

Each coefficient has a specific meaning. Direct AI to interpret them in plain language:

  • The marketing_shift coefficient: the estimated change in monthly bookings associated with the shift to digital marketing, holding other factors constant. Is it positive? Is the confidence interval narrow or wide? Does it cross zero?
  • The seasonal indicators: which months have significantly higher or lower bookings than the baseline month? These confirm the pattern you saw in the descriptive analysis.
  • The Luxor launch coefficient: the estimated change associated with the new premium packages. If significant, the Luxor packages contribute to growth independently of the marketing shift.

The confidence intervals matter as much as the point estimates. A coefficient of 47 with a 95% CI of [16, 78] means the true effect could be anywhere in that range. A coefficient of 47 with a CI of [-10, 104] means the effect might not exist at all.

Step 3: Compute effect sizes

A p-value tells you whether the effect is statistically distinguishable from zero. It does not tell you whether the effect is large enough to act on. Hassan needs both.

Direct AI to compute effect sizes for the marketing_shift coefficient. Options include Cohen's d (standardized mean difference), partial eta-squared (proportion of variance explained by the marketing shift after controlling for other predictors), or the raw coefficient interpreted in business terms (additional bookings per month).

A p-value of 0.003 with a Cohen's d of 0.15 is a real effect too small to change strategy over. A p-value of 0.003 with a Cohen's d of 0.8 is a large effect that justifies doubling the digital budget. The effect size is what converts a statistical finding into a business decision.

AI commonly reports significance without computing effect sizes. If AI's output includes p-values but no effect sizes, direct it explicitly: "compute the effect size for the marketing_shift coefficient."

Step 4: Check what AI defaults to

This is a useful exercise. Ask AI to suggest the "best approach" for Hassan's data without specifying inference. Something like: "Given booking data and marketing spend data, what's the best way to analyze whether the business is growing and why?"

Read AI's suggestion. Compare it with the inferential approach you chose. AI will likely suggest a prediction model -- a time series forecast, a random forest, or something with train/test splits and accuracy metrics. That approach answers "how many bookings next quarter?" not "did the marketing shift work?"

Document the comparison in the methodology memo. Note what AI suggested, why it is not wrong (prediction is a valid question type), and why it does not serve Hassan's actual decision (he needs to know which factors are driving growth, not what the forecast is).

This comparison is useful evidence for the methodology memo. It shows that the question type choice was deliberate, not default.

Step 5: Address the confounding

The marketing shift happened at a specific point in time. Other things also changed around the same time: the general trend of increasing international tourism to Egypt, the exchange rate making Egypt cheaper for European visitors, the Luxor packages launching. The regression controls for what it can measure -- seasonal patterns and the Luxor launch -- but it cannot separate the marketing effect from unmeasured confounders like the exchange rate or tourism trends.

This means the finding is an association, not a causal claim. The regression can say "the marketing shift is associated with an increase of X bookings per month after controlling for seasonality and the Luxor launch." It cannot say "the marketing shift caused X additional bookings." The distinction matters for Hassan's decision: the evidence supports increasing the digital budget, but it does not prove that the digital marketing alone produced the growth.

Direct AI to write a clear statement of this limitation. Include it in the methodology memo and in the findings.

Step 6: Update the methodology memo

Direct AI to update the methodology memo with everything from this unit:

  • Model specification: OLS regression, monthly confirmed bookings as the dependent variable, marketing_shift + seasonal indicators + Luxor launch as predictors
  • Coefficients and interpretation: the marketing_shift coefficient with CI and p-value, interpreted in business terms
  • Effect sizes: Cohen's d or partial eta-squared for the marketing_shift coefficient
  • AI's prediction default: what AI suggested, why it was not chosen
  • Confounding limitations: what the regression controls for, what it cannot control for, why the finding is association not causation

The methodology memo with the materials/methodology-memo-template.md structure is the backbone of the technical appendix Hassan's silent partner will receive. Keep it precise and honest.

✓ Check

Check: Regression model run with marketing_shift as key predictor. Coefficients interpreted with confidence intervals. Effect sizes computed (not just p-values). AI's prediction default observed and documented. Confounding limitations stated honestly. Methodology memo captures the full inferential rationale.