Sandbox Agent provides a managed virtual desktop (Xvfb + openbox) that you can control programmatically. This is useful for browser automation, GUI testing, and AI computer-use workflows.
Start and stop
import { SandboxAgent } from "sandbox-agent";
const sdk = await SandboxAgent.connect({
baseUrl: "http://127.0.0.1:2468",
});
const status = await sdk.startDesktop({
width: 1920,
height: 1080,
dpi: 96,
});
console.log(status.state); // "active"
console.log(status.display); // ":99"
// When done
await sdk.stopDesktop();
All fields in the start request are optional. Defaults are 1440x900 at 96 DPI.
Start request options
| Field | Type | Default | Description |
|---|
width | number | 1440 | Desktop width in pixels |
height | number | 900 | Desktop height in pixels |
dpi | number | 96 | Display DPI |
displayNum | number | 99 | Starting X display number. The runtime probes from this number upward to find an available display. |
stateDir | string | (auto) | Desktop state directory for home, logs, recordings |
streamVideoCodec | string | "vp8" | WebRTC video codec (vp8, vp9, h264) |
streamAudioCodec | string | "opus" | WebRTC audio codec (opus, g722) |
streamFrameRate | number | 30 | Streaming frame rate (1-60) |
webrtcPortRange | string | "59050-59070" | UDP port range for WebRTC media |
recordingFps | number | 30 | Default recording FPS when not specified in startDesktopRecording (1-60) |
The streaming and recording options configure defaults for the desktop session. They take effect when streaming or recording is started later.
const status = await sdk.startDesktop({
width: 1920,
height: 1080,
streamVideoCodec: "h264",
streamFrameRate: 60,
webrtcPortRange: "59100-59120",
recordingFps: 15,
});
Status
const status = await sdk.getDesktopStatus();
console.log(status.state); // "inactive" | "active" | "failed" | ...
Screenshots
Capture the full desktop or a specific region. Optionally include the cursor position.
// Full screenshot (PNG by default)
const png = await sdk.takeDesktopScreenshot();
// JPEG at 70% quality, half scale
const jpeg = await sdk.takeDesktopScreenshot({
format: "jpeg",
quality: 70,
scale: 0.5,
});
// Include cursor overlay
const withCursor = await sdk.takeDesktopScreenshot({
showCursor: true,
});
// Region screenshot
const region = await sdk.takeDesktopRegionScreenshot({
x: 100,
y: 100,
width: 400,
height: 300,
});
Screenshot options
| Param | Type | Default | Description |
|---|
format | string | "png" | Output format: png, jpeg, or webp |
quality | number | 85 | Compression quality (1-100, JPEG/WebP only) |
scale | number | 1.0 | Scale factor (0.1-1.0) |
showCursor | boolean | false | Composite a crosshair at the cursor position |
When showCursor is enabled, the cursor position is captured at the moment of the screenshot and a red crosshair is drawn at that location. This is useful for AI agents that need to see where the cursor is in the screenshot.
Mouse
// Get current position
const pos = await sdk.getDesktopMousePosition();
console.log(pos.x, pos.y);
// Move
await sdk.moveDesktopMouse({ x: 500, y: 300 });
// Click (left by default)
await sdk.clickDesktop({ x: 500, y: 300 });
// Right click
await sdk.clickDesktop({ x: 500, y: 300, button: "right" });
// Double click
await sdk.clickDesktop({ x: 500, y: 300, clickCount: 2 });
// Drag
await sdk.dragDesktopMouse({
startX: 100, startY: 100,
endX: 400, endY: 400,
});
// Scroll
await sdk.scrollDesktop({ x: 500, y: 300, deltaY: -3 });
Keyboard
// Type text
await sdk.typeDesktopText({ text: "Hello, world!" });
// Press a key with modifiers
await sdk.pressDesktopKey({
key: "c",
modifiers: { ctrl: true },
});
// Low-level key down/up
await sdk.keyDownDesktop({ key: "Shift_L" });
await sdk.keyUpDesktop({ key: "Shift_L" });
Clipboard
Read and write the X11 clipboard programmatically.
// Read clipboard
const clipboard = await sdk.getDesktopClipboard();
console.log(clipboard.text);
// Read primary selection (mouse-selected text)
const primary = await sdk.getDesktopClipboard({ selection: "primary" });
// Write to clipboard
await sdk.setDesktopClipboard({ text: "Pasted via API" });
// Write to both clipboard and primary selection
await sdk.setDesktopClipboard({
text: "Synced text",
selection: "both",
});
The selection parameter controls which X11 selection to read or write:
| Value | Description |
|---|
clipboard (default) | The standard clipboard (Ctrl+C / Ctrl+V) |
primary | The primary selection (text selected with the mouse) |
both | Write to both clipboard and primary selection (write only) |
Display and windows
const display = await sdk.getDesktopDisplayInfo();
console.log(display.resolution); // { width: 1920, height: 1080, dpi: 96 }
const { windows } = await sdk.listDesktopWindows();
for (const win of windows) {
console.log(win.title, win.x, win.y, win.width, win.height);
}
The windows endpoint filters out noise automatically: window manager internals (Openbox), windows with empty titles, and tiny helper windows (under 120x80) are excluded. The currently active/focused window is always included regardless of filters.
Focused window
Get the currently focused window without listing all windows.
const focused = await sdk.getDesktopFocusedWindow();
console.log(focused.title, focused.id);
Returns 404 if no window currently has focus.
Window management
Focus, move, and resize windows by their X11 window ID.
const { windows } = await sdk.listDesktopWindows();
const win = windows[0];
// Bring window to foreground
await sdk.focusDesktopWindow(win.id);
// Move window
await sdk.moveDesktopWindow(win.id, { x: 100, y: 50 });
// Resize window
await sdk.resizeDesktopWindow(win.id, { width: 1280, height: 720 });
All three endpoints return the updated window info so you can verify the operation took effect. The window manager may adjust the requested position or size.
App launching
Launch applications or open files/URLs on the desktop without needing to shell out.
// Launch an app by name
const result = await sdk.launchDesktopApp({
app: "firefox",
args: ["--private"],
});
console.log(result.processId); // "proc_7"
// Launch and wait for the window to appear
const withWindow = await sdk.launchDesktopApp({
app: "xterm",
wait: true,
});
console.log(withWindow.windowId); // "12345" or null if timed out
// Open a URL with the default handler
const opened = await sdk.openDesktopTarget({
target: "https://example.com",
});
console.log(opened.processId);
The returned processId can be used with the Process API to read logs (GET /v1/processes/{id}/logs) or stop the application (POST /v1/processes/{id}/stop).
When wait is true, the API polls for up to 5 seconds for a window to appear. If the window appears, its ID is returned in windowId. If it times out, windowId is null but the process is still running.
Launch/Open vs the Process API: Both launch and open are convenience wrappers around the Process API. They create managed processes (with owner: "desktop") that you can inspect, log, and stop through the same Process endpoints. The difference is that launch validates the binary exists in PATH first and can optionally wait for a window to appear, while open delegates to the system default handler (xdg-open). Use the Process API directly when you need full control over command, environment, working directory, or restart policies.
Recording
Record the desktop to MP4.
const recording = await sdk.startDesktopRecording({ fps: 30 });
console.log(recording.id);
// ... do things ...
const stopped = await sdk.stopDesktopRecording();
// List all recordings
const { recordings } = await sdk.listDesktopRecordings();
// Download
const mp4 = await sdk.downloadDesktopRecording(recording.id);
// Clean up
await sdk.deleteDesktopRecording(recording.id);
Desktop processes
The desktop runtime manages several background processes (Xvfb, openbox, neko, ffmpeg). These are all registered with the general Process API under the desktop owner, so you can inspect logs, check status, and troubleshoot using the same tools you use for any other managed process.
// List all processes, including desktop-owned ones
const { processes } = await sdk.listProcesses();
const desktopProcs = processes.filter((p) => p.owner === "desktop");
for (const p of desktopProcs) {
console.log(p.id, p.command, p.status);
}
// Read logs from a specific desktop process
const logs = await sdk.getProcessLogs(desktopProcs[0].id, { tail: 50 });
for (const entry of logs.entries) {
console.log(entry.stream, atob(entry.data));
}
The desktop status endpoint also includes a summary of running processes:
const status = await sdk.getDesktopStatus();
for (const proc of status.processes) {
console.log(proc.name, proc.pid, proc.running);
}
| Process | Role | Restart policy |
|---|
| Xvfb | Virtual X11 framebuffer | Auto-restart while desktop is active |
| openbox | Window manager | Auto-restart while desktop is active |
| neko | WebRTC streaming server (started by startDesktopStream) | No auto-restart |
| ffmpeg | Screen recorder (started by startDesktopRecording) | No auto-restart |
Live streaming
Start a WebRTC stream for real-time desktop viewing in a browser.
await sdk.startDesktopStream();
// Check stream status
const status = await sdk.getDesktopStreamStatus();
console.log(status.active); // true
console.log(status.processId); // "proc_5"
// Connect via the React DesktopViewer component or
// use the WebSocket signaling endpoint directly
// at ws://127.0.0.1:2468/v1/desktop/stream/signaling
await sdk.stopDesktopStream();
For a drop-in React component, see React Components.
API reference
Endpoints
| Method | Path | Description |
|---|
POST | /v1/desktop/start | Start the desktop runtime |
POST | /v1/desktop/stop | Stop the desktop runtime |
GET | /v1/desktop/status | Get desktop runtime status |
GET | /v1/desktop/screenshot | Capture full desktop screenshot |
GET | /v1/desktop/screenshot/region | Capture a region screenshot |
GET | /v1/desktop/mouse/position | Get current mouse position |
POST | /v1/desktop/mouse/move | Move the mouse |
POST | /v1/desktop/mouse/click | Click the mouse |
POST | /v1/desktop/mouse/down | Press mouse button down |
POST | /v1/desktop/mouse/up | Release mouse button |
POST | /v1/desktop/mouse/drag | Drag from one point to another |
POST | /v1/desktop/mouse/scroll | Scroll at a position |
POST | /v1/desktop/keyboard/type | Type text |
POST | /v1/desktop/keyboard/press | Press a key with optional modifiers |
POST | /v1/desktop/keyboard/down | Press a key down (hold) |
POST | /v1/desktop/keyboard/up | Release a key |
GET | /v1/desktop/display/info | Get display info |
GET | /v1/desktop/windows | List visible windows |
GET | /v1/desktop/windows/focused | Get focused window info |
POST | /v1/desktop/windows/{id}/focus | Focus a window |
POST | /v1/desktop/windows/{id}/move | Move a window |
POST | /v1/desktop/windows/{id}/resize | Resize a window |
GET | /v1/desktop/clipboard | Read clipboard contents |
POST | /v1/desktop/clipboard | Write to clipboard |
POST | /v1/desktop/launch | Launch an application |
POST | /v1/desktop/open | Open a file or URL |
POST | /v1/desktop/recording/start | Start recording |
POST | /v1/desktop/recording/stop | Stop recording |
GET | /v1/desktop/recordings | List recordings |
GET | /v1/desktop/recordings/{id} | Get recording metadata |
GET | /v1/desktop/recordings/{id}/download | Download recording |
DELETE | /v1/desktop/recordings/{id} | Delete recording |
POST | /v1/desktop/stream/start | Start WebRTC streaming |
POST | /v1/desktop/stream/stop | Stop WebRTC streaming |
GET | /v1/desktop/stream/status | Get stream status |
GET | /v1/desktop/stream/signaling | WebSocket for WebRTC signaling |
TypeScript SDK methods
| Method | Returns | Description |
|---|
startDesktop(request?) | DesktopStatusResponse | Start the desktop |
stopDesktop() | DesktopStatusResponse | Stop the desktop |
getDesktopStatus() | DesktopStatusResponse | Get desktop status |
takeDesktopScreenshot(query?) | Uint8Array | Capture screenshot |
takeDesktopRegionScreenshot(query) | Uint8Array | Capture region screenshot |
getDesktopMousePosition() | DesktopMousePositionResponse | Get mouse position |
moveDesktopMouse(request) | DesktopMousePositionResponse | Move mouse |
clickDesktop(request) | DesktopMousePositionResponse | Click mouse |
mouseDownDesktop(request) | DesktopMousePositionResponse | Mouse button down |
mouseUpDesktop(request) | DesktopMousePositionResponse | Mouse button up |
dragDesktopMouse(request) | DesktopMousePositionResponse | Drag mouse |
scrollDesktop(request) | DesktopMousePositionResponse | Scroll |
typeDesktopText(request) | DesktopActionResponse | Type text |
pressDesktopKey(request) | DesktopActionResponse | Press key |
keyDownDesktop(request) | DesktopActionResponse | Key down |
keyUpDesktop(request) | DesktopActionResponse | Key up |
getDesktopDisplayInfo() | DesktopDisplayInfoResponse | Get display info |
listDesktopWindows() | DesktopWindowListResponse | List windows |
getDesktopFocusedWindow() | DesktopWindowInfo | Get focused window |
focusDesktopWindow(id) | DesktopWindowInfo | Focus a window |
moveDesktopWindow(id, request) | DesktopWindowInfo | Move a window |
resizeDesktopWindow(id, request) | DesktopWindowInfo | Resize a window |
getDesktopClipboard(query?) | DesktopClipboardResponse | Read clipboard |
setDesktopClipboard(request) | DesktopActionResponse | Write clipboard |
launchDesktopApp(request) | DesktopLaunchResponse | Launch an app |
openDesktopTarget(request) | DesktopOpenResponse | Open file/URL |
startDesktopRecording(request?) | DesktopRecordingInfo | Start recording |
stopDesktopRecording() | DesktopRecordingInfo | Stop recording |
listDesktopRecordings() | DesktopRecordingListResponse | List recordings |
getDesktopRecording(id) | DesktopRecordingInfo | Get recording |
downloadDesktopRecording(id) | Uint8Array | Download recording |
deleteDesktopRecording(id) | void | Delete recording |
startDesktopStream() | DesktopStreamStatusResponse | Start streaming |
stopDesktopStream() | DesktopStreamStatusResponse | Stop streaming |
getDesktopStreamStatus() | DesktopStreamStatusResponse | Stream status |
Customizing the desktop environment
The desktop runs inside the sandbox filesystem, so you can customize it using the File System API before or after starting the desktop. The desktop HOME directory is located at ~/.local/state/sandbox-agent/desktop/home (or $XDG_STATE_HOME/sandbox-agent/desktop/home if XDG_STATE_HOME is set).
All configuration files below are written to paths relative to this HOME directory.
Window manager (openbox)
The desktop uses openbox as its window manager. You can customize its behavior, theme, and keyboard shortcuts by writing an rc.xml config file.
const openboxConfig = `<?xml version="1.0" encoding="UTF-8"?>
<openbox_config xmlns="http://openbox.org/3.4/rc">
<theme>
<name>Clearlooks</name>
<titleLayout>NLIMC</titleLayout>
<font place="ActiveWindow"><name>DejaVu Sans</name><size>10</size></font>
</theme>
<desktops><number>1</number></desktops>
<keyboard>
<keybind key="A-F4"><action name="Close"/></keybind>
<keybind key="A-Tab"><action name="NextWindow"/></keybind>
</keyboard>
</openbox_config>`;
await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox" });
await sdk.writeFsFile(
{ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox/rc.xml" },
openboxConfig,
);
Autostart programs
Openbox runs scripts in ~/.config/openbox/autostart on startup. Use this to launch applications, set the background, or configure the environment.
const autostart = `#!/bin/sh
# Set a solid background color
xsetroot -solid "#1e1e2e" &
# Launch a terminal
xterm -geometry 120x40+50+50 &
# Launch a browser
firefox --no-remote &
`;
await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox" });
await sdk.writeFsFile(
{ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox/autostart" },
autostart,
);
The autostart script runs when openbox starts, which happens during startDesktop(). Write the autostart file before calling startDesktop() for it to take effect.
Background
There is no wallpaper set by default (the background is the X root window default). You can set it using xsetroot in the autostart script (as shown above), or use feh if you need an image:
// Upload a wallpaper image
import fs from "node:fs";
const wallpaper = await fs.promises.readFile("./wallpaper.png");
await sdk.writeFsFile(
{ path: "~/.local/state/sandbox-agent/desktop/home/wallpaper.png" },
wallpaper,
);
// Set the autostart to apply it
const autostart = `#!/bin/sh
feh --bg-fill ~/wallpaper.png &
`;
await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox" });
await sdk.writeFsFile(
{ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox/autostart" },
autostart,
);
feh is not installed by default. Install it via the Process API before starting the desktop: await sdk.runProcess({ command: "apt-get", args: ["install", "-y", "feh"] }).
Fonts
Only fonts-dejavu-core is installed by default. To add more fonts, install them with your system package manager or copy font files into the sandbox:
// Install a font package
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "fonts-noto", "fonts-liberation"],
});
// Or copy a custom font file
import fs from "node:fs";
const font = await fs.promises.readFile("./CustomFont.ttf");
await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.local/share/fonts" });
await sdk.writeFsFile(
{ path: "~/.local/state/sandbox-agent/desktop/home/.local/share/fonts/CustomFont.ttf" },
font,
);
// Rebuild the font cache
await sdk.runProcess({ command: "fc-cache", args: ["-fv"] });
Cursor theme
await sdk.runProcess({
command: "apt-get",
args: ["install", "-y", "dmz-cursor-theme"],
});
const xresources = `Xcursor.theme: DMZ-White\nXcursor.size: 24\n`;
await sdk.writeFsFile(
{ path: "~/.local/state/sandbox-agent/desktop/home/.Xresources" },
xresources,
);
Run xrdb -merge ~/.Xresources (via the autostart or process API) after writing the file for changes to take effect.
Shell and terminal
No terminal emulator or shell is launched by default. Add one to the openbox autostart:
# In ~/.config/openbox/autostart
xterm -geometry 120x40+50+50 &
To use a different shell, set the SHELL environment variable in your Dockerfile or install your preferred shell and configure the terminal to use it.
GTK theme
Applications using GTK will pick up settings from ~/.config/gtk-3.0/settings.ini:
const gtkSettings = `[Settings]
gtk-theme-name=Adwaita
gtk-icon-theme-name=Adwaita
gtk-font-name=DejaVu Sans 10
gtk-cursor-theme-name=DMZ-White
gtk-cursor-theme-size=24
`;
await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.config/gtk-3.0" });
await sdk.writeFsFile(
{ path: "~/.local/state/sandbox-agent/desktop/home/.config/gtk-3.0/settings.ini" },
gtkSettings,
);
Summary of configuration paths
All paths are relative to the desktop HOME directory (~/.local/state/sandbox-agent/desktop/home).
| What | Path | Notes |
|---|
| Openbox config | .config/openbox/rc.xml | Window manager theme, keybindings, behavior |
| Autostart | .config/openbox/autostart | Shell script run on desktop start |
| Custom fonts | .local/share/fonts/ | TTF/OTF files, run fc-cache -fv after |
| Cursor theme | .Xresources | Requires xrdb -merge to apply |
| GTK 3 settings | .config/gtk-3.0/settings.ini | Theme, icons, fonts for GTK apps |
| Wallpaper | Any path, referenced from autostart | Requires feh or similar tool |