Why LLMs still don’t belong in OT/ICS pen tests (and what to automate instead)
Hot take: the biggest risk isn’t that LLMs will miss vulnerabilities. It’s that they’ll make you overconfident and move faster than your safety controls can tolerate.
OT/ICS testing is not a web app sprint.
You are working in safety- and uptime-critical environments where:
– A wrong assumption can trigger downtime
– “Probably safe” actions can create real-world impact
– Context lives in diagrams, vendor quirks, and plant procedures, not in prompts
Where LLMs are risky in OT/ICS:
– AI-led exploitation: hallucinated commands, wrong protocol details, unsafe payloads
– Autonomous decision-making: chaining actions without understanding process state
– “Confident” triage: misranking findings when risk is process-dependent
What to automate instead (high leverage, low blast radius):
– Pre-engagement: scope drafting, rules of engagement, outage windows, asset lists
– Documentation: turning notes into clean test evidence, timelines, and reports
– Data wrangling: log parsing, packet metadata summaries, config diffing
– Test readiness: checklists, safety gates, runbooks, peer-review prompts
– Comms: stakeholder updates, change-control language, finding summaries
Principle: use AI to accelerate preparation and clarity, not to drive actions on live control networks.
If you are building or buying “AI for OT security,” ask one question:
What stops the model from doing something unsafe when it is almost right?