I'm deploying AI agents that can call external APIs – process refunds,
send emails, modify databases. The agent decides what to do based on
user input and LLM reasoning.
My concern: the agent sometimes attempts actions it shouldn't, and
there's no clear audit trail of what it did or why.
Current options I see:
1. Trust the agent fully (scary)
2. Manual review of every action (defeats automation)
3. Some kind of permission/approval layer (does this exist?)
For those running AI agents in production:
- How do you limit what the agent CAN do?
- Do you require approval for high-risk operations?
- How do you audit what happened after the fact?
Curious what patterns have worked.
Also, You should never connect an agent directly to a sensitive database server or an order/fulfillment system, etc. Rather, you'd use "middleware proxy" to arbitrate the requests, consult with a policy engine, log processing context, etc before relaying the requests on to the target system.
Also consider subtleties in the threat model and types of attack vector. how many systems the agent(s) connect to concurrently. See the lethal trifecta https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
reply