
AI agent sandbox isolation is a security technique that confines autonomous AI agents—which independently execute code and tools—within isolated environments with restricted file, process, and network access, thereby preventing unauthorized access to internal data and credentials. Since agents make their own decisions and take actions, simply prohibiting certain behaviors through instructions is insufficient; it is necessary to constrain what can be executed at the environment level itself.
This article is intended for information systems and security personnel at Japanese companies responsible for deploying and operating AI agents. It covers everything from why isolation is necessary to a three-step implementation process encompassing the execution environment, network, and audit logging. By the end, readers will be equipped to design "what to block, and at which layer" for their own agents.
Because autonomous agents "act on their own judgment within the scope of permissions granted by humans," the potential damage when unexpected operations occur is significant. This section first addresses the threats unique to autonomous agents and explains why application-layer controls alone are insufficient to prevent them.
AI agents operate by combining powerful capabilities such as tool execution, file read/write access, and external communication. When these capabilities are exploited by attackers or through malfunctions, they primarily lead to three types of threats:
Particularly problematic is indirect prompt injection. If malicious instructions are embedded in external data or web pages that the agent reads, the agent may misinterpret them as legitimate instructions and carry out the operations described above. OWASP has also organized the risks specific to LLM applications, and AI Security Lessons from the OWASP LLM Top 10 serves as a useful starting point for countermeasures.
Application-layer measures—such as "instructing the agent via prompt to avoid prohibited actions" or "restricting operations in application code"—are necessary, but they alone provide a thin line of defense. The reason is straightforward: application-layer controls can be broken through bugs or workarounds.
For example, prohibitions stated in prompts can be overridden by cleverly crafted prompt injections. If there are gaps in the application's checks, unexpected file paths or commands may slip through. In configurations where the agent can execute arbitrary code, the very assumptions underlying the application layer can be circumvented.
This is precisely why, in addition to "request-based" application-layer controls, a defense-in-depth approach is needed—one that makes certain actions simply impossible to execute or reach at the OS and network layers. Sandbox isolation serves as the last line of defense at this lowest layer. It is worth understanding, in conjunction with the LLM Security Implementation Guide, that sandbox isolation only becomes meaningful when combined with secure implementation at the application layer.
Sandbox isolation is easier to reason about when considered across four protection domains: "files," "processes," "network," and "permission policies." Before diving into implementation steps, it is helpful to first survey what each domain is designed to protect.
The first domain is restricting the files an agent can access and the operations it can perform. In a Linux environment, this can be achieved using OS kernel features.
These form the foundation for creating a state where "even if an agent runs amok, the scope of what it can touch is inherently narrow." The critical point is that enforcement is handled by the kernel, not by the goodwill of the application. Even if a bug in the application causes a check to be missed, the Landlock and seccomp restrictions continue to be enforced at the kernel level.
The second domain is communication. Data exfiltration and lateral movement of credentials most often occur over the network. Therefore, it is effective to restrict outbound communication from the agent's execution environment to only necessary destinations.
Concretely, this means denying all external communication by default and adding only the endpoints required for the task (such as the LLM API in use and permitted internal services) to an allowlist. When an agent connects to external tools or data sources, those connection targets should also be limited.
Even in configurations using MCP, which standardizes external tool integration, layering network-level control over "which tools and which destinations are permitted" can compensate for misconfigurations on the tool definition side. Since model invocations are themselves network communications, fixing the endpoints in use and blocking transmissions to unintended destinations is the most direct way to prevent the leakage of credentials and sensitive data.
As the third and fourth domains, the approach is to define permission policies declaratively and operate under the principle of least privilege. "Declarative" means explicitly writing out allow/deny rules in configuration files and the like, rather than relying on conditional branches scattered throughout the code.
The principle of least privilege means "granting only the permissions that agent genuinely needs to perform its task." For example, not granting write permissions to a task that only requires reading, or permitting access only to the specific directory an agent uses.
There are two benefits to combining declarative policies with least privilege. First, since the scope of permissions is visible at a glance, review and auditing are straightforward. Second, adding or removing permissions is handled entirely through configuration changes, and the change history can be tracked. For building the organizational structure needed to operationalize this, AgentOps (AI Agent Operations Organization Design) is also a useful reference.
Before implementing isolation, take stock of "what permissions this agent needs" and "what needs to be protected." Skipping this step will result in isolation that is either too loose to be meaningful or too strict to allow work to proceed.
The first step is to identify the target agent's responsibilities (what it does) and the permissions required to carry them out. Writing things out from the following perspectives makes this easier to organize.
The accuracy of this inventory determines the quality of the subsequent permission design. A common mistake is "provisionally granting broad permissions," which is the exact opposite of least privilege. Starting with a narrow configuration and adding only what is needed for the work to function will, in the end, result in a safer setup. For risks related to agents proliferating in the field without oversight, Shadow AI Risks and Governance is also worth consulting.
In parallel with the permissions inventory, identify the "assets to protect." This involves checking whether anything that would be problematic if leaked is present in or around the agent's execution environment.
Credentials in particular are often stored in plain text in environment variables or configuration files, and if they are in a location readable by the agent, the risk increases dramatically. The basic approach is to isolate them in a secrets management system so they are not visible from the agent's working directory. For Japanese companies operating across ASEAN countries, the handling of personal data is also subject to local data protection laws. For guidance on organizing governance during multi-country expansion, please refer to Building an AI Governance Framework for Companies Expanding into ASEAN.
In Step 1, the agent runs inside a "disposable sandbox" isolated from the production environment. This involves selecting between a container or MicroVM, and configuring file and process boundaries.
It is common practice to base execution environment isolation on either a container or a MicroVM. The choice between the two involves a trade-off between isolation strength and startup cost.
| Method | Isolation Strength | Startup Speed | Suitable For |
|---|---|---|---|
| Container | Medium (shared kernel) | Fast | Code with a reasonable level of trust, high startup frequency |
| MicroVM | High (isolated kernel) | Somewhat slow | Executing low-trust code, cases requiring strong isolation |
In conclusion, if you are handling externally sourced code or instructions and require strong isolation, MicroVM is the baseline choice; if you are prioritizing startup efficiency for internal use, containers are the standard option. Even when using containers, additional hardening measures should be applied in combination: do not run as root, drop unnecessary capabilities, and default to a read-only filesystem. Regardless of the method, rebuilding and discarding the environment for each task prevents contamination from a previous task from carrying over to the next, making it safer.
Once the isolated environment is in place, further narrow the file and process boundaries within it. Using the aforementioned Landlock and seccomp, configure the following:
The key to configuration is to use an allowlist approach: start by denying everything, then open up only the minimum required for the task at hand. A denylist approach—blocking individual items that seem dangerous—will inevitably leave gaps. After configuration, have the agent run through its normal operations and verify via logs that the necessary access is permitted without excess or deficiency, and that unexpected access is being denied.
In Step 2, outbound communications from the agent environment are restricted using "default deny + allowlist" rules. Since data exfiltration and lateral movement of credentials almost always occur via network communication, this is the critical control point for preventing leaks.
Network policies should start with a default deny-all (including outbound traffic). From there, only the destinations required for business operations are added to the allowlist.
Typical destinations to include in the allowlist are as follows:
Allowances should be minimized to include not just "domain/IP" but also "port" and "protocol." Granting broad range permissions in bulk can turn those ranges into potential exfiltration paths. Particular care should be taken with DNS and seemingly harmless services — allowing these unconditionally leaves room for them to be exploited as channels for sending data out in small chunks. The allowlist should be reviewed periodically, and destinations that are no longer in use should be removed.
When introducing network policies (and permission policies in general), switching immediately to full enforce mode (blocking violations) risks disrupting business operations. For this reason, audit mode (logging violations but allowing them through) and enforce mode should be used in combination.
This phased approach avoids the incident of "the policy was too strict and brought operations to a halt," while ultimately achieving strong control. Logs recorded during the audit phase also serve as material for refining the allowlist. Even after transitioning to enforce mode, audit logs should be retained to monitor for unexpected communication attempts.
Step 3 incorporates audit logs that allow the agent's actions to be traced after the fact, along with a mechanism for human approval before high-risk operations are executed. This is the layer that uses isolation to narrow down "what is possible," and then addresses remaining risks through operational controls.
Audit logs should record "what the agent attempted to do, and whether it was permitted or denied." At a minimum, the following information should be retained:
Having this allow/deny trail enables both incident investigation (what happened) and policy improvement (where the gaps or excesses are). Logs should be aggregated in a location that is resistant to tampering, configured so that the agent itself cannot modify them. For guidance on building an operational structure that continuously reviews logs, AgentOps Design is a useful reference.
Isolation and policies can prevent most issues, but there remain operations that are "acceptable to execute, but costly if done incorrectly." These include deletion of production data, outbound transmissions, and operations involving payments or contracts. Human approval (HITL) should be incorporated for these.
The fundamental design principle is to classify operations by risk level:
For a concrete example of implementing an approval flow with human confirmation built into tool execution, MCP Tool Execution and HITL for a Lao Language AI Agent is a practical reference. Adding too many operations requiring approval will make the workflow unmanageable, so the design should limit human intervention to operations that are "truly costly if wrong." Presenting the content and scope of impact of an operation alongside the log as decision-making material enables reviewers to make fast and accurate judgments.
Here are common pitfalls when implementing sandbox isolation, along with ways to avoid them.
Pitfall 1: Isolation is too loose to be meaningful. Simply "putting it in a container" while allowing broad permissions and unrestricted communication inside. The workaround is to enforce least privilege and default-deny, permitting only what has been explicitly opened via an allowlist.
Pitfall 2: Isolation is too strict and disrupts operations. Blocking even necessary communication and file access, causing the agent to stop functioning. The workaround is to run actual workloads in audit mode first, identify the required permissions, and then switch to enforce mode.
Pitfall 3: Credentials are visible to the agent. Plaintext keys stored in environment variables or configuration files that can be read from within the isolated environment. The workaround is to move secrets into a dedicated secrets management system and keep them out of the working directory.
Pitfall 4: Logs are not retained or can be tampered with. Being unable to trace what happened during an incident, or allowing the agent itself to modify the logs. The workaround is to aggregate audit logs externally and make them immutable.
In all cases, treating this as something to "configure and forget" is a mistake. Approaching it as an ongoing process of adjustment during operations makes it easier to avoid both extremes of too-loose and too-strict.
Q1. Is putting it in a container enough? Containerization is a starting point, and on its own it is often insufficient. If the container runs with root privileges, excessive capabilities, or unrestricted external communication, damage can still occur inside it. Isolation only truly exists when least privilege, default-deny, and file/process boundary configurations are all in place.
Q2. Is all of this necessary even for a small team? The answer depends on the sensitivity of the data involved. For agents that handle personal information or credentials, a minimum level of isolation—least privilege, communication restrictions, and audit logging—is worth implementing regardless of team size. If you are unsure where to start, the checklist in AI × Cyber Risk Measures for SMBs is a good place to get organized.
Q3. If we use a cloud managed service, is isolation unnecessary? Even in a managed environment, the design of permissions granted to the agent, the scope of communication, and data access controls remain the responsibility of the user. The robustness of the underlying infrastructure and the permission design your organization builds on top of it are separate concerns. The prerequisite review and policy design covered in this article should still be carried out.
Q4. Does isolation degrade agent performance? When designed properly, all access necessary for business operations is permitted, so actual operational performance is largely maintained. The real issue is not performance but rather "over- or under-permissioning." The practical approach is to identify required access in audit mode before switching to enforce mode.
Sandbox isolation for AI agents is the foundation for protecting internal data and credentials from autonomously operating agents. Because controls at the prompt and application layers can be circumvented, a defense-in-depth approach is essential—one that creates a state where actions are simply not executable or reachable at the OS, network, and permission policy layers.
Implementation can be organized into three steps: isolate the execution environment using containers or MicroVMs and restrict file and process boundaries (Step 1); control the network with default-deny and an allowlist (Step 2); and incorporate audit logging and human approval for high-risk operations (Step 3). All of these rest on the shared principles of least privilege, default-deny, and a phased transition from audit to enforce.
For Japanese companies operating across ASEAN countries, design must also account for local data protection laws. We support the safe deployment of AI agents and the development of governance frameworks. If you would like to work with us to design which controls to apply at each layer for your own agents, please feel free to get in touch.
Chi
Majored in Information Science at the National University of Laos, where he contributed to the development of statistical software, building a practical foundation in data analysis and programming. He began his career in web and application development in 2021, and from 2023 onward gained extensive hands-on experience across both frontend and backend domains. At our company, he is responsible for the design and development of AI-powered web services, and is involved in projects that integrate natural language processing (NLP), machine learning, and generative AI and large language models (LLMs) into business systems. He has a voracious appetite for keeping up with the latest technologies and places great value on moving swiftly from technical validation to production implementation.