Learn by Directing AI

The Brief

Samir Bouzid runs Kabylie Gold, an olive oil producer and exporter based in the Bejaia province of Algeria. Three generations of Kabylie olive groves. Forty wholesale buyers across France, Germany, Canada, and the US placing orders through a platform his son Yacine had built by a development agency in Algiers. Docker containers on a rented server. A monitoring stack the developer set up that Samir has never looked at.

The problem is France. French food safety inspectors are asking about digital security practices as part of EU export certification. Traceability compliance -- they want proof that buyer data and shipment records are secure. France is 45% of Kabylie Gold's revenue. If Samir can't show the inspectors a proper security assessment, he could lose the certification.

He reaches out on Slack. He types fast, goes on tangents about olive varieties, drops French words when the English escapes him, and sends five short messages where one long one would do. Underneath the enthusiasm, the stakes are clear: his business depends on this.

Your Role

You are assessing the ordering platform and the server it runs on. The deliverable is a report Samir can hand to the French inspectors.

The scope is wider than before. The ordering platform is the application target, but the Docker containers themselves are now part of the assessment. So is the monitoring stack. You are looking at infrastructure, not just an application inside it.

This time, the TTP selection guide describes categories of testing -- passive reconnaissance, active reconnaissance, web application testing, container security testing -- without listing every specific test. You decide what to test within each category. You also receive a threat model template and fill it in yourself for Samir's system. Cross-checking enters as a verification technique: a second AI reviews your detection rules with fresh context, catching what self-review normalizes.

What's New

Last time you assessed a web application with multi-vulnerability exploitation, wrote Sigma rules for each attack type, remediated in priority order, and produced a multi-finding report. You know the purple team pipeline. You know that different attacks produce different log signatures. You know that AI's field names in Sigma rules may not match the actual log format.

Two new domains enter at once.

Passive reconnaissance. Before you run any scans, you search certificate transparency logs, Shodan, and Google dorks for what's already publicly visible about the target. This is intelligence you gather without touching Samir's infrastructure -- and it reveals things he may not know are exposed. The challenge is that AI treats every result as equally relevant. You impose scope.

Container hardening. The Docker containers running Samir's platform were built with default configurations. After the assessment, you harden them -- non-root execution, pinned base images, vulnerability scanning, a CI pipeline that catches problems before deployment. This is the first time you work with the infrastructure layer rather than just the application layer. Build-time security decisions differ fundamentally from the runtime fixes you have done before.

The hard part is holding both perspectives at once. Passive OSINT and container hardening look like separate concerns, but they connect through the same principle: least privilege extends from what the public can see about your infrastructure to what a compromised container can access on the host.

Tools

crt.sh -- certificate transparency log search. New.
Shodan/Censys -- passive infrastructure discovery. New.
Google dorks -- targeted search engine reconnaissance. New.
Nmap -- multi-protocol scanning, now including UDP and OS detection. Extended.
ffuf -- content discovery and directory brute-forcing. New.
Wireshark/tshark -- packet-level network analysis. New.
Trivy -- container image vulnerability scanning. New.
GitHub Actions -- CI pipeline for automated security scanning. New.
Docker -- running the vulnerable application and monitoring stack, then hardening the containers.
Grafana/Loki/Alloy -- log viewing and monitoring. Continuing.
pySigma -- Sigma rule to LogQL conversion. Continuing.
Claude Code -- AI agent directing all tool execution.
Git/GitHub -- version control and project submission.

Materials

Scope document -- assessment boundaries, now including container infrastructure and monitoring.
TTP selection guide -- testing categories without enumerated tests. You choose the specific tests.
Threat model template -- blank STRIDE template for you to fill in for Samir's system.
Sigma rule starter -- YAML structure template for detection rule authoring.
Report template -- assessment report structure with compliance framing for the French inspectors.
Docker environment -- the ordering platform, database, Grafana, Loki, and Alloy running in containers.
CLAUDE.md -- project governance file with tickets and verification targets.