Security for Agents

Security for Agents

2026-03-12

by Uri Walevski

AI agents are client-side code. Not literally, but the security model is the same. Your users interact with the agent directly, they can say anything to it, and you have no control over what they'll try. Just like a browser running JavaScript, the agent executes in an environment you don't fully control. Every decision you make about what the agent can access and what it can do needs to start from that assumption.

This isn't a theoretical concern. If your agent has access to API keys, users will try to extract them. If your agent can see other users' conversations, someone will find a way to get at them. If your agent hallucinates, users will trust the wrong information and blame you.

Most agent platforms don't have an answer for this. Tools like OpenClaw give the LLM direct access to a machine, and whatever credentials are on that machine are fair game. The LLM is the runtime. There's no separation between the brain making decisions and the hands executing them, so there's no place to insert security controls.

prompt2bot has a different architecture. The outer LLM, the one talking to your users, is separate from the VM where code runs. The LLM decides what to do, and a separate coding agent on the VM carries it out. This separation creates a security boundary. You can control what crosses it. Secrets, user context, tool access, all of it flows through a layer you own, not through the LLM's context window. That's what makes the rest of this post possible.

Secrets

Your agent needs API keys to do real work. GitHub tokens, database credentials, cloud provider keys. The challenge is that prompt2bot agents run on VMs with a full coding agent inside. That coding agent can write and execute arbitrary code, inspect environment variables, read files, and make network requests. It's a general-purpose programmer. So how do you give it access to a secret without it being able to leak it?

If you inject the secret as an environment variable, the coding agent can just run printenv and read it. If you put it in a file, it can cat the file. If you pass it through the outer LLM's context, the user can social-engineer it out. Any approach where the secret exists in cleartext anywhere on the VM is fundamentally broken.

Our solution: the secret never reaches the VM at all. When you add a secret, you provide a name, a value, and a list of hostnames it's allowed to be used with. The value gets AES-GCM encrypted immediately. It's never shown again, not even to you.

The outer LLM, the one talking to your users, never sees the secret value either. It knows the secret exists and which hosts it's for, but the actual credential is not in its context window, not in any tool result, nowhere it can access.

On the VM, the coding agent gets an HMAC placeholder instead of the real value. It's a deterministic hash computed from the encrypted value, keyed to that specific machine. Meaningless on any other server, and meaningless to anyone who reads it. When the coding agent makes an outbound HTTP request to an approved host, a reverse proxy intercepts the request, finds any HMAC placeholders in headers or the request body, and hot-swaps them for the real decrypted values right before the request leaves our infrastructure. The real secret exists in cleartext for a fraction of a second, at the network boundary, and nowhere else.

The host restriction is key. Even if the coding agent tries to send the placeholder to some other server, the proxy won't swap it. The placeholder arrives as a meaningless hash. The secret only materializes for requests to hosts you explicitly approved.

Users also paste API keys into chat constantly. "Here's my GitHub token, set it up for me." We detect these automatically using secret identification heuristics and regexes. The secret gets replaced with a <<SECRET_xxxxxxxxxxxx>> placeholder before the message reaches the model. The actual value is encrypted and stored. The LLM only ever sees the placeholder.

Alice & Bot for Encrypted Agent Communication

When agents interact with users over the web, the communication channel itself is a security surface. Most chat integrations send messages through a third-party server in plaintext. The platform can read everything, store everything, and change anything.

Alice & Bot is different. It's an end-to-end encrypted communication protocol built for agent-to-human and agent-to-agent interaction. Identities are Ed25519 key pairs you own, not accounts on someone else's server. Messages are encrypted client-side before they leave the browser or the agent's process. The server relays ciphertext it can't read.

This matters for agents that handle sensitive data. A medical assistant, a financial advisor, a legal research bot. The conversation content shouldn't be readable by the infrastructure operator. With Alice & Bot, it isn't.

Audio calls work the same way. The E2EE channel handles WebRTC signaling, so even the call setup metadata is encrypted. The bridge server that handles media transcoding processes raw audio frames, but the signaling and control messages are end-to-end encrypted.

Isolating Users from Each Other

There are two fundamentally different kinds of agents. A personal assistant should remember context across conversations and build up knowledge over time. A therapist bot or customer service agent must not leak one user's conversation into another user's session.

The isolateUsers toggle controls this. When enabled, the agent can't access cross-user tools. Specifically, tools that read conversation history across users or access information about other users get filtered out. The agent literally doesn't know those tools exist.

This isn't a prompt-level instruction that the LLM might ignore. It's structural. The tools aren't in the function schema, so the model can't call them even if prompted to. A user can't social-engineer the bot into revealing another user's data because the bot has no mechanism to access it.

When isolateUsers is off, the agent has full cross-conversation context. It can look up past interactions with any user, reference previous conversations, and build a holistic picture. That's what you want for a personal assistant. It's catastrophic for a customer service bot.

Hallucination Handling

Hallucinations are a fact of life in 2026. Every LLM will confidently state things that aren't true. You can't prevent it, but you can detect and correct it.

After the bot sends a message, a separate supervision model checks the response against the full conversation history. It compares tool results, prior messages, and the user's question against what the bot actually said, and evaluates whether the response is consistent with the data.

If the supervisor detects a hallucination, it injects a corrective note into the conversation, the agent re-runs with this correction, and admins get an email notification. The user gets a corrected response instead of fabricated information.

It's not perfect, no hallucination detection system is. But it catches the common failure modes: inventing data that contradicts tool results, fabricating details that were never in the conversation, and confidently answering questions the bot doesn't have information for.

Malicious User Detection

Going back to the client-side code analogy. Just like you expect users to try extracting your JavaScript source, you should expect users to try extracting your agent's prompt. Prompt injection is the XSS of AI applications.

The agent includes built-in instructions (for non-admin users) to never reveal its system prompt, tool configurations, or internal instructions. When the bot detects a prompt extraction attempt, it bans the user temporarily and notifies admins.

There's also rate limiting to prevent brute-force prompt extraction and simple denial-of-service through message flooding.

But here's the important part. Assume prompts will be extracted eventually. That's why the architecture treats prompts as public information from a security standpoint. Secrets don't live in the prompt. Tool implementations are server-side calls, not inline code. Even if a user extracts the full system message, they get instructions and tool descriptions, not credentials or business logic. The damage is limited to seeing how the bot is configured, which is annoying but not a security breach.

Remote Tools and User Context Signing

When your agent calls external APIs (remote tools), those APIs need to know who's making the request. Not just "a bot is calling," but which specific user, in which conversation, on which platform. Without this, any user could impersonate another by manipulating the conversation.

The solution is server-side context injection. When the agent makes a remote tool call, the server attaches metadata about the user, conversation, and platform, then cryptographically signs the entire payload. The LLM constructs the tool parameters, but the context and signature are added after the model returns its response. The model never sees or produces them.

Your API endpoint verifies the signature and trusts the user identity because it was injected by trusted infrastructure, not by the AI. The signature covers the full payload, so any tampering invalidates it. Timestamps prevent replay attacks.

A user interacting with the bot can't make it call your API as a different user. The identity fields are ground truth from the server, not LLM output.

The Frontend for Security-Conscious Startups

Most agent platforms optimize for ease of setup. Get a bot running in 5 minutes, worry about security later. That works for demos and internal tools. It doesn't work when your agent handles real user data, real credentials, and real money.

prompt2bot is a frontend for AI agents that takes the security layer seriously from the start. Encrypted secrets the agent can't access, E2EE communication channels, user isolation, hallucination supervision, anti-hacking protections, and cryptographically signed API calls. You write the prompt and the business logic. The security infrastructure is already there.

The hard part of deploying agents isn't making them smart. It's making them safe. That's the problem we're solving.

← All posts