2026-03-10

Cloud desktops for AI agents: a practical guide

Why AI agents need full virtual machines, not sandboxes. Architecture, tradeoffs, setup, and what we learned building Le Bureau.

Cloud desktops for AI agents: a practical guide

A code execution sandbox lets an AI agent run scripts. A full virtual machine with a desktop, file system, and network access lets it do what a developer does: browse the web, edit code in an IDE, install packages, and pick up tomorrow where it left off today.

We built Le Bureau around this idea. This post explains why we think full desktops are the right abstraction for serious agent work, how the architecture works, and how to get one running.

What an agent desktop actually is

It is a Linux VM with a graphical desktop environment. The agent connects through VNC to see and click things, through a terminal for command-line work, and through a chat interface to receive instructions and report back.

What makes it different from a regular VM:

Agent runtime pre-installed. The VM ships with OpenClaw ready to go. No setup, the agent starts working immediately.
Persistent disk. Files, git repos, configs all survive reboots. The agent picks up where it left off.
BYOK API access. You bring your own API keys (Anthropic, OpenAI, OpenRouter). Keys are injected securely at provisioning time and scoped to the VM.
Mission Control. A dashboard for watching what agents do, approving sensitive actions, and stepping in when something goes wrong.
API-driven provisioning. Create, start, stop, destroy desktops via REST. Good for fleet management and CI/CD.

Remote desktops have existed for decades. The difference here is packaging VM isolation, agent tooling, and a management layer designed for non-human operators.

Why full desktops, not sandboxes

The first generation of agent infrastructure was sandboxes: lightweight ephemeral environments for running code snippets. They work for narrow tasks (run this Python script, execute this test suite) but hit walls fast.

Agents need screens

Computer use APIs like Anthropic's let agents interact with GUIs: clicking buttons, filling forms, navigating applications. A sandbox has no display server. A desktop with VNC gives the agent a real screen buffer it can observe and control.

Agents need state that persists

A multi-day coding project means files, git history, and environment have to survive between sessions. Sandboxes throw everything away on exit. A persistent desktop keeps it all, same as your own workstation.

Agents need to multitask

Real work means switching between a browser, a terminal, an IDE, and docs. Sandboxes typically expose one thing (a code execution endpoint). A desktop lets the agent run Chrome, VS Code, and a terminal side by side.

Agents need root

Installing packages, modifying configs, running background services: these need a real OS, not a locked-down execution environment. A full Ubuntu VM with sudo access gives agents the same capabilities you have.

Constrained environments cause failures

When an agent hits a missing dependency, an unsupported file format, or a tool that only works as a GUI app, it stops making progress. Full desktops reduce these dead ends.

The tradeoff is resources. A desktop VM needs more CPU, RAM, and storage than a sandbox. For agents doing complex work across multiple tools over multiple days, that cost pays for itself quickly.

Desktop vs sandbox vs container

The right choice depends on the workload.

	Cloud desktop (VM)	Sandbox (microVM)	Container
GUI	Full (VNC/RDP)	None or limited	None
Persistence	Persistent disk	Ephemeral	Volume mounts (fragile)
Isolation	Hardware-level (dedicated VM)	Hardware-level (Firecracker)	Kernel-level (cgroups)
Boot time	30-90s	1-5s	1-3s
RAM overhead	2-16 GB	128-512 MB	64-256 MB
System access	Full OS (sudo)	Restricted	Root in namespace
Multi-tool	Browser + IDE + terminal	Single tool	Single process
State after restart	Preserved	Lost	Lost (unless volume)

When to use each:

Desktop: The agent browses the web, uses GUI apps, works across sessions, or needs a full OS. This is where Le Bureau lives.
Sandbox: The agent runs a code snippet and returns a result. No GUI, no persistence needed.
Container: The agent runs a single service in a reproducible environment. Build pipelines, not interactive work.

Most production setups combine these. We provide full desktops for the primary workspace; sandboxes can handle isolated subtasks.

How it works under the hood

1. Virtualization

Each desktop is a dedicated virtual machine with its own kernel. Hardware-level isolation between VMs, managed through an orchestration API for lifecycle operations.

The base image is a golden snapshot: Ubuntu 22.04, XFCE4, in-browser desktop streaming, OpenClaw, Chrome, VS Code. New desktops clone from this template, which keeps startup under 90 seconds.

2. Networking

Each VM gets a private IP on the host network. External access goes through authenticated WebSocket proxies:

VNC proxy connects the browser-based viewer to the live desktop stream on the VM.
Terminal proxy connects xterm.js in the browser to the VM's shell via ttyd.
Chat proxy routes messages between the web UI and OpenClaw via SSH.

All proxy connections require a valid session token. No VM ports are directly exposed.

3. Agent layer

OpenClaw runs inside the VM as the agent runtime. It receives instructions via CLI, executes them using computer use (screen observation, mouse/keyboard control) and shell commands, then reports results through the chat channel.

The agent authenticates to LLM APIs using the user's own API key, injected at VM provisioning time. The key is scoped to the VM and never leaves it. Usage tracking happens per-desktop.

4. Management layer

The web app provides Mission Control: create desktops, monitor agents, manage API keys, configure workspaces. Everything in the UI is also available via REST API.

Architecture (simplified)

Browser
    |
    +-- VNC Viewer ----> WebSocket Proxy --> VM:6080 (desktop stream)
    +-- xterm.js ------> WebSocket Proxy --> VM:7681 (ttyd)
    +-- Chat Panel -----> REST API --------> VM:SSH --> OpenClaw CLI
    +-- Mission Control-> REST API --------> database + VM orchestration API
    |
Agent (inside VM)
    +-- OpenClaw -------> LLM API (user's key)
    +-- Browser, IDE, Terminal (local to VM)

Getting started

The whole process takes under three minutes.

Prerequisites

A Le Bureau account (lebureau.talentai.fr)
A workspace (created automatically on first login)

Step 1: Create a desktop

From Mission Control, click New Computer. Pick a tier:

Tier	vCPU	RAM	Storage	Price
Free	2	4 GB	20 GB	Free during beta
Pro	4	8 GB	40 GB	EUR 49/month
Max	4	16 GB	80 GB	EUR 149/month

Name it (e.g., dev-agent-01), click Provision. The VM boots from the golden image and is ready in about 60 seconds.

Step 2: Connect

Once the status shows Running, click the desktop to open the viewer. Three panels:

VNC viewer on the left: the agent's screen in real time. You can watch or take manual control.
Terminal at the bottom: a shell session inside the VM.
Chat panel on the right: send instructions to the agent and read its responses.

Step 3: Give it something to do

In the chat panel, type a task:

Clone the repository at github.com/myorg/myproject and set up the development environment.

The agent opens a terminal, runs git commands, installs dependencies, and reports progress. You can watch it work through the VNC viewer.

Step 4: Use the API (optional)

Generate an API key from Mission Control under Settings:

## List desktops
curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://lebureau.talentai.fr/api/desktops

## Create a desktop
curl -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "ci-agent", "workspaceId": "ws_xxx"}' \
  https://lebureau.talentai.fr/api/desktops

## Send a task
curl -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"message": "Run the test suite and report results"}' \
  https://lebureau.talentai.fr/api/desktops/DESKTOP_ID/chat

Step 5: Stop or destroy

Stop a desktop to preserve its state, or destroy it to free resources. Both available from Mission Control or the API. Stopped desktops keep their disk. Destroyed ones delete everything.

Monitoring with Mission Control

One agent is manageable. At ten, you need proper visibility.

Each desktop shows its state (provisioning, running, stopped, error) in real time via server-sent events. No page refresh needed.

All conversations are stored in the database, so you can review what an agent did days later, even after restarts. You can also see which desktops are active, how long they have been running, and what they are working on.

For sensitive operations (deploying to production, making purchases, modifying infrastructure), the agent pauses and asks for human approval before proceeding.

Alerts notify you on agent errors, unusual activity, or resource thresholds. And fleet management lets you create, start, stop, and destroy desktops in bulk, organized by workspace with cost tracking.

Security

Handing a full OS to an AI agent raises obvious questions. Here is how we handle them.

Each desktop runs in its own dedicated virtual machine with a dedicated kernel, filesystem, and network stack. A compromised agent cannot reach other VMs or the host. This is hardware-level isolation, not container namespaces.

You bring your own API keys. They are injected into the VM at provisioning and scoped to that desktop. No keys pass through our servers at runtime.

Firewall rules restrict VM-to-VM communication. Desktops in different workspaces cannot reach each other, and outbound traffic can be locked to specific domains.

All API requests, chat messages, and management actions are logged for a complete audit trail. Hard CPU, memory, and storage caps per tier prevent noisy-neighbor problems. The agent gets access to LLM APIs and its own filesystem, nothing else.

Pricing

	Le Bureau Free	Le Bureau Pro	Le Bureau Max	E2B	Scrapybara
Price	Free (beta)	EUR 49/mo	EUR 149/mo	~$0.10/hr	Usage-based
Environment	Full Linux desktop	Full Linux desktop	Full Linux desktop	Firecracker microVM	Cloud browser
vCPU	2	4	4	1-2	Shared
RAM	4 GB	8 GB	16 GB	256-512 MB	Shared
Storage	20 GB persistent	40 GB persistent	80 GB persistent	Ephemeral	Ephemeral
GUI	Yes (VNC)	Yes (VNC)	Yes (VNC)	No	Browser only
Persistence	Yes	Yes	Yes	No	No
Agent runtime	OpenClaw included	OpenClaw included	OpenClaw included	BYO	BYO
Computer use	Full desktop	Full desktop	Full desktop	No	Browser

Details on the pricing page.

Choosing between providers

Le Bureau targets agents that need a real desktop: browsing, coding, GUI apps, persistent projects. Fixed monthly pricing works well for always-on agents.

E2B focuses on short-lived code execution. If your agent just runs Python snippets, per-second billing may cost less.

Scrapybara covers web automation. If your agent only interacts with websites, a cloud browser might be enough.

Browserbase provides headless browser infra. Good for high-volume scraping, not general agent work.

Windows 365 for Agents is Microsoft's offering in this space. Worth evaluating if you need Windows and are already on Azure, but it is Windows-only and enterprise-priced.

FAQ

What is the difference between a cloud desktop and a sandbox?

A desktop is a full VM: GUI, persistent storage, sudo access. The agent uses it like you use your computer. A sandbox is ephemeral, no GUI, limited access. Use a desktop for complex work; a sandbox for isolated code execution.

How fast is provisioning?

About 60 seconds. The VM boots from a pre-built image with Ubuntu 22.04, XFCE4, OpenClaw, Chrome, and VS Code. No manual setup needed.

Is it secure?

Each desktop is a dedicated virtual machine with its own kernel and network stack. API keys stay inside the VM (BYOK model). VMs cannot communicate with each other. All actions are logged. See the security section above.

Can I run multiple agents on one desktop?

You can, but we recommend one agent per desktop for better isolation and independent monitoring. The API makes spinning up multiple desktops easy.

How does persistence work?

Each desktop has a virtual disk that survives reboots. Stop a desktop: disk preserved. Restart it: everything is where the agent left it. Only destroying a desktop deletes storage.

What tools can agents use?

Anything that runs on Linux. The golden image includes Chrome, VS Code, Python 3, Node.js, and git. Agents can install additional packages from Ubuntu repos, pip, npm, cargo, or wherever.

Can I use my own agent framework?

Yes. The desktop is a standard Linux VM. OpenClaw comes pre-installed, but you can run whatever you want. The VNC and terminal interfaces work regardless of what is running inside.

What comes next

We are working on fleet orchestration, approval workflows, and CI/CD integration. The goal is to make managing 50 agent desktops feel the same as managing 5.

If you want to try it, there is a free tier during beta. Provision a desktop, give your agent a task, and see how far it gets. The setup instructions above take about three minutes.

For a step-by-step walkthrough, see our guide on deploying your first AI agent desktop.

Le Bureau is a cloud desktop platform for AI agents. lebureau.talentai.fr

Ready to give your AI agent a real desktop?

View plans

Monthly newsletter

Once a month. Not one more.

The best articles, product news, and one reading pick. Five minutes, start of the month.