The hottest AI demos break in the one place you can’t afford “close enough”. If your pen test plan can’t be defended in a safety review or an audit, it’s not an OT pen test. It’s a lab experiment.
OT/ICS testing is different because the outcome isn’t just “data loss”. It can be downtime, damaged equipment, environmental impact, or safety incidents.
Where LLMs still fall short for OT/ICS pen tests:
1) Reliability
LLMs can hallucinate protocol behavior, device capabilities, CVE applicability, or remediation steps. In enterprise IT, that’s wasted time. In OT, it can drive unsafe actions.
2) Determinism and traceability
Assessments need repeatable steps, evidence, and clear provenance. “The model suggested…” is not a defensible control narrative.
3) Safety-first constraints
OT testing requires strict change control, defined stop conditions, and an understanding of process state. LLMs don’t inherently reason about physical consequence or operational context.
4) Liability and accountability
When guidance is wrong, who owns the risk: the tester, the vendor, the model provider? In regulated or safety-critical environments, that ambiguity is unacceptable.
AI still has a role, just not as the decision-maker.
Use LLMs to accelerate low-consequence work: summarizing vendor docs, drafting test plans for human review, parsing logs, mapping findings to standards, generating reporting language.
But keep final calls human-led: what to probe, how far to go, when to stop, and what is safe to recommend.
If you’re building AI for OT security, the bar isn’t “helpful”. It’s defensible, deterministic, and safe under audit.