Cloud desktops for AI agents: a practical guide
Why AI agents need full virtual machines, not sandboxes. Architecture, tradeoffs, setup, and what we learned building Le Bureau.
Cloud desktops for AI agents: a practical guide
A code execution sandbox lets an AI agent run scripts. A full virtual machine with a desktop, file system, and network access lets it do what a developer does: browse the web, edit code in an IDE, install packages, and pick up tomorrow where it left off today.
We built Le Bureau around this idea. This post explains why we think full desktops are the right abstraction for serious agent work, how the architecture works, and how to get one running.
What an agent desktop actually is
It is a Linux VM with a graphical desktop environment. The agent connects through VNC to see and click things, through a terminal for command-line work, and through a chat interface to receive instructions and report back.
What makes it different from a regular VM:
- Agent runtime pre-installed. The VM ships with OpenClaw ready to go. No setup, the agent starts working immediately.
- Persistent disk. Files, git repos, configs all survive reboots. The agent picks up where it left off.
- BYOK API access. You bring your own API keys (Anthropic, OpenAI, OpenRouter). Keys are injected securely at provisioning time and scoped to the VM.
- Mission Control. A dashboard for watching what agents do, approving sensitive actions, and stepping in when something goes wrong.
- API-driven provisioning. Create, start, stop, destroy desktops via REST. Good for fleet management and CI/CD.
Remote desktops have existed for decades. The difference here is packaging VM isolation, agent tooling, and a management layer designed for non-human operators.
Why full desktops, not sandboxes
The first generation of agent infrastructure was sandboxes: lightweight ephemeral environments for running code snippets. They work for narrow tasks (run this Python script, execute this test suite) but hit walls fast.
Agents need screens
Computer use APIs like Anthropic's let agents interact with GUIs: clicking buttons, filling forms, navigating applications. A sandbox has no display server. A desktop with VNC gives the agent a real screen buffer it can observe and control.
Agents need state that persists
A multi-day coding project means files, git history, and environment have to survive between sessions. Sandboxes throw everything away on exit. A persistent desktop keeps it all, same as your own workstation.
Agents need to multitask
Real work means switching between a browser, a terminal, an IDE, and docs. Sandboxes typically expose one thing (a code execution endpoint). A desktop lets the agent run Chrome, VS Code, and a terminal side by side.
Agents need root
Installing packages, modifying configs, running background services: these need a real OS, not a locked-down execution environment. A full Ubuntu VM with sudo access gives agents the same capabilities you have.
Constrained environments cause failures
When an agent hits a missing dependency, an unsupported file format, or a tool that only works as a GUI app, it stops making progress. Full desktops reduce these dead ends.
The tradeoff is resources. A desktop VM needs more CPU, RAM, and storage than a sandbox. For agents doing complex work across multiple tools over multiple days, that cost pays for itself quickly.
Desktop vs sandbox vs container
The right choice depends on the workload.
| Cloud desktop (VM) | Sandbox (microVM) | Container | |
|---|---|---|---|
| GUI | Full (VNC/RDP) | None or limited | None |
| Persistence | Persistent disk | Ephemeral | Volume mounts (fragile) |
| Isolation | Hardware-level (KVM) | Hardware-level (Firecracker) | Kernel-level (cgroups) |
| Boot time | 30-90s | 1-5s | 1-3s |
| RAM overhead | 2-16 GB | 128-512 MB | 64-256 MB |
| System access | Full OS (sudo) | Restricted | Root in namespace |
| Multi-tool | Browser + IDE + terminal | Single tool | Single process |
| State after restart | Preserved | Lost | Lost (unless volume) |
When to use each:
- Desktop: The agent browses the web, uses GUI apps, works across sessions, or needs a full OS. This is where Le Bureau lives.
- Sandbox: The agent runs a code snippet and returns a result. No GUI, no persistence needed.
- Container: The agent runs a single service in a reproducible environment. Build pipelines, not interactive work.
Most production setups combine these. We provide full desktops for the primary workspace; sandboxes can handle isolated subtasks.
How it works under the hood
1. Virtualization
Each desktop is a QEMU/KVM virtual machine running on Proxmox. Hardware-level isolation between VMs, managed through Proxmox's API for lifecycle operations.
The base image is a golden snapshot: Ubuntu 22.04, XFCE4, KasmVNC, OpenClaw, Chrome, VS Code. New desktops clone from this template, which keeps startup under 90 seconds.
2. Networking
Each VM gets a private IP on the host network. External access goes through authenticated WebSocket proxies:
- VNC proxy connects the browser-based viewer to KasmVNC on the VM.
- Terminal proxy connects xterm.js in the browser to the VM's shell via ttyd.
- Chat proxy routes messages between the web UI and OpenClaw via SSH.
All proxy connections require a valid session token. No VM ports are directly exposed.
3. Agent layer
OpenClaw runs inside the VM as the agent runtime. It receives instructions via CLI, executes them using computer use (screen observation, mouse/keyboard control) and shell commands, then reports results through the chat channel.
The agent authenticates to LLM APIs using the user's own API key, injected at VM provisioning time. The key is scoped to the VM and never leaves it. Usage tracking happens per-desktop.
4. Management layer
The web app provides Mission Control: create desktops, monitor agents, manage API keys, configure workspaces. Everything in the UI is also available via REST API.
Architecture (simplified)
Browser
|
+-- VNC Viewer ----> WebSocket Proxy --> VM:6080 (KasmVNC)
+-- xterm.js ------> WebSocket Proxy --> VM:7681 (ttyd)
+-- Chat Panel -----> REST API --------> VM:SSH --> OpenClaw CLI
+-- Mission Control-> REST API --------> PostgreSQL + Proxmox API
|
Agent (inside VM)
+-- OpenClaw -------> LLM API (user's key)
+-- Browser, IDE, Terminal (local to VM)
Getting started
The whole process takes under three minutes.
Prerequisites
- A Le Bureau account (lebureau.talentai.fr)
- A workspace (created automatically on first login)
Step 1: Create a desktop
From Mission Control, click New Computer. Pick a tier:
| Tier | vCPU | RAM | Storage | Price |
|---|---|---|---|---|
| Free | 2 | 4 GB | 20 GB | Free during beta |
| Pro | 4 | 8 GB | 40 GB | EUR 49/month |
| Max | 4 | 16 GB | 80 GB | EUR 149/month |
Name it (e.g., dev-agent-01), click Provision. The VM boots from the golden image and is ready in about 60 seconds.
Step 2: Connect
Once the status shows Running, click the desktop to open the viewer. Three panels:
- VNC viewer on the left: the agent's screen in real time. You can watch or take manual control.
- Terminal at the bottom: a shell session inside the VM.
- Chat panel on the right: send instructions to the agent and read its responses.
Step 3: Give it something to do
In the chat panel, type a task:
Clone the repository at github.com/myorg/myproject and set up the development environment.
The agent opens a terminal, runs git commands, installs dependencies, and reports progress. You can watch it work through the VNC viewer.
Step 4: Use the API (optional)
Generate an API key from Mission Control under Settings:
## List desktops
curl -H "Authorization: Bearer YOUR_API_KEY" \
https://lebureau.talentai.fr/api/desktops
## Create a desktop
curl -X POST \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"name": "ci-agent", "workspaceId": "ws_xxx"}' \
https://lebureau.talentai.fr/api/desktops
## Send a task
curl -X POST \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"message": "Run the test suite and report results"}' \
https://lebureau.talentai.fr/api/desktops/DESKTOP_ID/chat
Step 5: Stop or destroy
Stop a desktop to preserve its state, or destroy it to free resources. Both available from Mission Control or the API. Stopped desktops keep their disk. Destroyed ones delete everything.
Monitoring with Mission Control
One agent is manageable. At ten, you need proper visibility.
Each desktop shows its state (provisioning, running, stopped, error) in real time via server-sent events. No page refresh needed.
All conversations are stored in the database, so you can review what an agent did days later, even after restarts. You can also see which desktops are active, how long they have been running, and what they are working on.
For sensitive operations (deploying to production, making purchases, modifying infrastructure), the agent pauses and asks for human approval before proceeding.
Alerts notify you on agent errors, unusual activity, or resource thresholds. And fleet management lets you create, start, stop, and destroy desktops in bulk, organized by workspace with cost tracking.
Security
Handing a full OS to an AI agent raises obvious questions. Here is how we handle them.
Each desktop runs in its own KVM virtual machine with a dedicated kernel, filesystem, and network stack. A compromised agent cannot reach other VMs or the host. This is hardware-level isolation, not container namespaces.
You bring your own API keys. They are injected into the VM at provisioning and scoped to that desktop. No keys pass through our servers at runtime.
Firewall rules restrict VM-to-VM communication. Desktops in different workspaces cannot reach each other, and outbound traffic can be locked to specific domains.
All API requests, chat messages, and management actions are logged for a complete audit trail. Hard CPU, memory, and storage caps per tier prevent noisy-neighbor problems. The agent gets access to LLM APIs and its own filesystem, nothing else.
Pricing
| Le Bureau Free | Le Bureau Pro | Le Bureau Max | E2B | Scrapybara | |
|---|---|---|---|---|---|
| Price | Free (beta) | EUR 49/mo | EUR 149/mo | ~$0.10/hr | Usage-based |
| Environment | Full Linux desktop | Full Linux desktop | Full Linux desktop | Firecracker microVM | Cloud browser |
| vCPU | 2 | 4 | 4 | 1-2 | Shared |
| RAM | 4 GB | 8 GB | 16 GB | 256-512 MB | Shared |
| Storage | 20 GB persistent | 40 GB persistent | 80 GB persistent | Ephemeral | Ephemeral |
| GUI | Yes (VNC) | Yes (VNC) | Yes (VNC) | No | Browser only |
| Persistence | Yes | Yes | Yes | No | No |
| Agent runtime | OpenClaw included | OpenClaw included | OpenClaw included | BYO | BYO |
| Computer use | Full desktop | Full desktop | Full desktop | No | Browser |
Details on the pricing page.
Choosing between providers
Le Bureau targets agents that need a real desktop: browsing, coding, GUI apps, persistent projects. Fixed monthly pricing works well for always-on agents.
E2B focuses on short-lived code execution. If your agent just runs Python snippets, per-second billing may cost less.
Scrapybara covers web automation. If your agent only interacts with websites, a cloud browser might be enough.
Browserbase provides headless browser infra. Good for high-volume scraping, not general agent work.
Windows 365 for Agents is Microsoft's offering in this space. Worth evaluating if you need Windows and are already on Azure, but it is Windows-only and enterprise-priced.
FAQ
What is the difference between a cloud desktop and a sandbox?
A desktop is a full VM: GUI, persistent storage, sudo access. The agent uses it like you use your computer. A sandbox is ephemeral, no GUI, limited access. Use a desktop for complex work; a sandbox for isolated code execution.
How fast is provisioning?
About 60 seconds. The VM boots from a pre-built image with Ubuntu 22.04, XFCE4, OpenClaw, Chrome, and VS Code. No manual setup needed.
Is it secure?
Each desktop is a KVM virtual machine with its own kernel and network stack. API keys stay inside the VM (BYOK model). VMs cannot communicate with each other. All actions are logged. See the security section above.
Can I run multiple agents on one desktop?
You can, but we recommend one agent per desktop for better isolation and independent monitoring. The API makes spinning up multiple desktops easy.
How does persistence work?
Each desktop has a virtual disk that survives reboots. Stop a desktop: disk preserved. Restart it: everything is where the agent left it. Only destroying a desktop deletes storage.
What tools can agents use?
Anything that runs on Linux. The golden image includes Chrome, VS Code, Python 3, Node.js, and git. Agents can install additional packages from Ubuntu repos, pip, npm, cargo, or wherever.
Can I use my own agent framework?
Yes. The desktop is a standard Linux VM. OpenClaw comes pre-installed, but you can run whatever you want. The VNC and terminal interfaces work regardless of what is running inside.
What comes next
We are working on fleet orchestration, approval workflows, and CI/CD integration. The goal is to make managing 50 agent desktops feel the same as managing 5.
If you want to try it, there is a free tier during beta. Provision a desktop, give your agent a task, and see how far it gets. The setup instructions above take about three minutes.
For a step-by-step walkthrough, see our guide on deploying your first AI agent desktop.
Le Bureau is a cloud desktop platform for AI agents. lebureau.talentai.fr
Ready to give your AI agent a real desktop?
View plansGet our next articles
Subscribe to our newsletter so you don't miss a thing.