Skip to main content
Sandbox Agent provides a managed virtual desktop (Xvfb + openbox) that you can control programmatically. This is useful for browser automation, GUI testing, and AI computer-use workflows.

Start and stop

import { SandboxAgent } from "sandbox-agent";

const sdk = await SandboxAgent.connect({
  baseUrl: "http://127.0.0.1:2468",
});

const status = await sdk.startDesktop({
  width: 1920,
  height: 1080,
  dpi: 96,
});

console.log(status.state); // "active"
console.log(status.display); // ":99"

// When done
await sdk.stopDesktop();
All fields in the start request are optional. Defaults are 1440x900 at 96 DPI.

Start request options

FieldTypeDefaultDescription
widthnumber1440Desktop width in pixels
heightnumber900Desktop height in pixels
dpinumber96Display DPI
displayNumnumber99Starting X display number. The runtime probes from this number upward to find an available display.
stateDirstring(auto)Desktop state directory for home, logs, recordings
streamVideoCodecstring"vp8"WebRTC video codec (vp8, vp9, h264)
streamAudioCodecstring"opus"WebRTC audio codec (opus, g722)
streamFrameRatenumber30Streaming frame rate (1-60)
webrtcPortRangestring"59050-59070"UDP port range for WebRTC media
recordingFpsnumber30Default recording FPS when not specified in startDesktopRecording (1-60)
The streaming and recording options configure defaults for the desktop session. They take effect when streaming or recording is started later.
const status = await sdk.startDesktop({
  width: 1920,
  height: 1080,
  streamVideoCodec: "h264",
  streamFrameRate: 60,
  webrtcPortRange: "59100-59120",
  recordingFps: 15,
});

Status

const status = await sdk.getDesktopStatus();
console.log(status.state); // "inactive" | "active" | "failed" | ...

Screenshots

Capture the full desktop or a specific region. Optionally include the cursor position.
// Full screenshot (PNG by default)
const png = await sdk.takeDesktopScreenshot();

// JPEG at 70% quality, half scale
const jpeg = await sdk.takeDesktopScreenshot({
  format: "jpeg",
  quality: 70,
  scale: 0.5,
});

// Include cursor overlay
const withCursor = await sdk.takeDesktopScreenshot({
  showCursor: true,
});

// Region screenshot
const region = await sdk.takeDesktopRegionScreenshot({
  x: 100,
  y: 100,
  width: 400,
  height: 300,
});

Screenshot options

ParamTypeDefaultDescription
formatstring"png"Output format: png, jpeg, or webp
qualitynumber85Compression quality (1-100, JPEG/WebP only)
scalenumber1.0Scale factor (0.1-1.0)
showCursorbooleanfalseComposite a crosshair at the cursor position
When showCursor is enabled, the cursor position is captured at the moment of the screenshot and a red crosshair is drawn at that location. This is useful for AI agents that need to see where the cursor is in the screenshot.

Mouse

// Get current position
const pos = await sdk.getDesktopMousePosition();
console.log(pos.x, pos.y);

// Move
await sdk.moveDesktopMouse({ x: 500, y: 300 });

// Click (left by default)
await sdk.clickDesktop({ x: 500, y: 300 });

// Right click
await sdk.clickDesktop({ x: 500, y: 300, button: "right" });

// Double click
await sdk.clickDesktop({ x: 500, y: 300, clickCount: 2 });

// Drag
await sdk.dragDesktopMouse({
  startX: 100, startY: 100,
  endX: 400, endY: 400,
});

// Scroll
await sdk.scrollDesktop({ x: 500, y: 300, deltaY: -3 });

Keyboard

// Type text
await sdk.typeDesktopText({ text: "Hello, world!" });

// Press a key with modifiers
await sdk.pressDesktopKey({
  key: "c",
  modifiers: { ctrl: true },
});

// Low-level key down/up
await sdk.keyDownDesktop({ key: "Shift_L" });
await sdk.keyUpDesktop({ key: "Shift_L" });

Clipboard

Read and write the X11 clipboard programmatically.
// Read clipboard
const clipboard = await sdk.getDesktopClipboard();
console.log(clipboard.text);

// Read primary selection (mouse-selected text)
const primary = await sdk.getDesktopClipboard({ selection: "primary" });

// Write to clipboard
await sdk.setDesktopClipboard({ text: "Pasted via API" });

// Write to both clipboard and primary selection
await sdk.setDesktopClipboard({
  text: "Synced text",
  selection: "both",
});
The selection parameter controls which X11 selection to read or write:
ValueDescription
clipboard (default)The standard clipboard (Ctrl+C / Ctrl+V)
primaryThe primary selection (text selected with the mouse)
bothWrite to both clipboard and primary selection (write only)

Display and windows

const display = await sdk.getDesktopDisplayInfo();
console.log(display.resolution); // { width: 1920, height: 1080, dpi: 96 }

const { windows } = await sdk.listDesktopWindows();
for (const win of windows) {
  console.log(win.title, win.x, win.y, win.width, win.height);
}
The windows endpoint filters out noise automatically: window manager internals (Openbox), windows with empty titles, and tiny helper windows (under 120x80) are excluded. The currently active/focused window is always included regardless of filters.

Focused window

Get the currently focused window without listing all windows.
const focused = await sdk.getDesktopFocusedWindow();
console.log(focused.title, focused.id);
Returns 404 if no window currently has focus.

Window management

Focus, move, and resize windows by their X11 window ID.
const { windows } = await sdk.listDesktopWindows();
const win = windows[0];

// Bring window to foreground
await sdk.focusDesktopWindow(win.id);

// Move window
await sdk.moveDesktopWindow(win.id, { x: 100, y: 50 });

// Resize window
await sdk.resizeDesktopWindow(win.id, { width: 1280, height: 720 });
All three endpoints return the updated window info so you can verify the operation took effect. The window manager may adjust the requested position or size.

App launching

Launch applications or open files/URLs on the desktop without needing to shell out.
// Launch an app by name
const result = await sdk.launchDesktopApp({
  app: "firefox",
  args: ["--private"],
});
console.log(result.processId); // "proc_7"

// Launch and wait for the window to appear
const withWindow = await sdk.launchDesktopApp({
  app: "xterm",
  wait: true,
});
console.log(withWindow.windowId); // "12345" or null if timed out

// Open a URL with the default handler
const opened = await sdk.openDesktopTarget({
  target: "https://example.com",
});
console.log(opened.processId);
The returned processId can be used with the Process API to read logs (GET /v1/processes/{id}/logs) or stop the application (POST /v1/processes/{id}/stop). When wait is true, the API polls for up to 5 seconds for a window to appear. If the window appears, its ID is returned in windowId. If it times out, windowId is null but the process is still running.
Launch/Open vs the Process API: Both launch and open are convenience wrappers around the Process API. They create managed processes (with owner: "desktop") that you can inspect, log, and stop through the same Process endpoints. The difference is that launch validates the binary exists in PATH first and can optionally wait for a window to appear, while open delegates to the system default handler (xdg-open). Use the Process API directly when you need full control over command, environment, working directory, or restart policies.

Recording

Record the desktop to MP4.
const recording = await sdk.startDesktopRecording({ fps: 30 });
console.log(recording.id);

// ... do things ...

const stopped = await sdk.stopDesktopRecording();

// List all recordings
const { recordings } = await sdk.listDesktopRecordings();

// Download
const mp4 = await sdk.downloadDesktopRecording(recording.id);

// Clean up
await sdk.deleteDesktopRecording(recording.id);

Desktop processes

The desktop runtime manages several background processes (Xvfb, openbox, neko, ffmpeg). These are all registered with the general Process API under the desktop owner, so you can inspect logs, check status, and troubleshoot using the same tools you use for any other managed process.
// List all processes, including desktop-owned ones
const { processes } = await sdk.listProcesses();

const desktopProcs = processes.filter((p) => p.owner === "desktop");
for (const p of desktopProcs) {
  console.log(p.id, p.command, p.status);
}

// Read logs from a specific desktop process
const logs = await sdk.getProcessLogs(desktopProcs[0].id, { tail: 50 });
for (const entry of logs.entries) {
  console.log(entry.stream, atob(entry.data));
}
The desktop status endpoint also includes a summary of running processes:
const status = await sdk.getDesktopStatus();
for (const proc of status.processes) {
  console.log(proc.name, proc.pid, proc.running);
}
ProcessRoleRestart policy
XvfbVirtual X11 framebufferAuto-restart while desktop is active
openboxWindow managerAuto-restart while desktop is active
nekoWebRTC streaming server (started by startDesktopStream)No auto-restart
ffmpegScreen recorder (started by startDesktopRecording)No auto-restart

Live streaming

Start a WebRTC stream for real-time desktop viewing in a browser.
await sdk.startDesktopStream();

// Check stream status
const status = await sdk.getDesktopStreamStatus();
console.log(status.active); // true
console.log(status.processId); // "proc_5"

// Connect via the React DesktopViewer component or
// use the WebSocket signaling endpoint directly
// at ws://127.0.0.1:2468/v1/desktop/stream/signaling

await sdk.stopDesktopStream();
For a drop-in React component, see React Components.

API reference

Endpoints

MethodPathDescription
POST/v1/desktop/startStart the desktop runtime
POST/v1/desktop/stopStop the desktop runtime
GET/v1/desktop/statusGet desktop runtime status
GET/v1/desktop/screenshotCapture full desktop screenshot
GET/v1/desktop/screenshot/regionCapture a region screenshot
GET/v1/desktop/mouse/positionGet current mouse position
POST/v1/desktop/mouse/moveMove the mouse
POST/v1/desktop/mouse/clickClick the mouse
POST/v1/desktop/mouse/downPress mouse button down
POST/v1/desktop/mouse/upRelease mouse button
POST/v1/desktop/mouse/dragDrag from one point to another
POST/v1/desktop/mouse/scrollScroll at a position
POST/v1/desktop/keyboard/typeType text
POST/v1/desktop/keyboard/pressPress a key with optional modifiers
POST/v1/desktop/keyboard/downPress a key down (hold)
POST/v1/desktop/keyboard/upRelease a key
GET/v1/desktop/display/infoGet display info
GET/v1/desktop/windowsList visible windows
GET/v1/desktop/windows/focusedGet focused window info
POST/v1/desktop/windows/{id}/focusFocus a window
POST/v1/desktop/windows/{id}/moveMove a window
POST/v1/desktop/windows/{id}/resizeResize a window
GET/v1/desktop/clipboardRead clipboard contents
POST/v1/desktop/clipboardWrite to clipboard
POST/v1/desktop/launchLaunch an application
POST/v1/desktop/openOpen a file or URL
POST/v1/desktop/recording/startStart recording
POST/v1/desktop/recording/stopStop recording
GET/v1/desktop/recordingsList recordings
GET/v1/desktop/recordings/{id}Get recording metadata
GET/v1/desktop/recordings/{id}/downloadDownload recording
DELETE/v1/desktop/recordings/{id}Delete recording
POST/v1/desktop/stream/startStart WebRTC streaming
POST/v1/desktop/stream/stopStop WebRTC streaming
GET/v1/desktop/stream/statusGet stream status
GET/v1/desktop/stream/signalingWebSocket for WebRTC signaling

TypeScript SDK methods

MethodReturnsDescription
startDesktop(request?)DesktopStatusResponseStart the desktop
stopDesktop()DesktopStatusResponseStop the desktop
getDesktopStatus()DesktopStatusResponseGet desktop status
takeDesktopScreenshot(query?)Uint8ArrayCapture screenshot
takeDesktopRegionScreenshot(query)Uint8ArrayCapture region screenshot
getDesktopMousePosition()DesktopMousePositionResponseGet mouse position
moveDesktopMouse(request)DesktopMousePositionResponseMove mouse
clickDesktop(request)DesktopMousePositionResponseClick mouse
mouseDownDesktop(request)DesktopMousePositionResponseMouse button down
mouseUpDesktop(request)DesktopMousePositionResponseMouse button up
dragDesktopMouse(request)DesktopMousePositionResponseDrag mouse
scrollDesktop(request)DesktopMousePositionResponseScroll
typeDesktopText(request)DesktopActionResponseType text
pressDesktopKey(request)DesktopActionResponsePress key
keyDownDesktop(request)DesktopActionResponseKey down
keyUpDesktop(request)DesktopActionResponseKey up
getDesktopDisplayInfo()DesktopDisplayInfoResponseGet display info
listDesktopWindows()DesktopWindowListResponseList windows
getDesktopFocusedWindow()DesktopWindowInfoGet focused window
focusDesktopWindow(id)DesktopWindowInfoFocus a window
moveDesktopWindow(id, request)DesktopWindowInfoMove a window
resizeDesktopWindow(id, request)DesktopWindowInfoResize a window
getDesktopClipboard(query?)DesktopClipboardResponseRead clipboard
setDesktopClipboard(request)DesktopActionResponseWrite clipboard
launchDesktopApp(request)DesktopLaunchResponseLaunch an app
openDesktopTarget(request)DesktopOpenResponseOpen file/URL
startDesktopRecording(request?)DesktopRecordingInfoStart recording
stopDesktopRecording()DesktopRecordingInfoStop recording
listDesktopRecordings()DesktopRecordingListResponseList recordings
getDesktopRecording(id)DesktopRecordingInfoGet recording
downloadDesktopRecording(id)Uint8ArrayDownload recording
deleteDesktopRecording(id)voidDelete recording
startDesktopStream()DesktopStreamStatusResponseStart streaming
stopDesktopStream()DesktopStreamStatusResponseStop streaming
getDesktopStreamStatus()DesktopStreamStatusResponseStream status

Customizing the desktop environment

The desktop runs inside the sandbox filesystem, so you can customize it using the File System API before or after starting the desktop. The desktop HOME directory is located at ~/.local/state/sandbox-agent/desktop/home (or $XDG_STATE_HOME/sandbox-agent/desktop/home if XDG_STATE_HOME is set). All configuration files below are written to paths relative to this HOME directory.

Window manager (openbox)

The desktop uses openbox as its window manager. You can customize its behavior, theme, and keyboard shortcuts by writing an rc.xml config file.
const openboxConfig = `<?xml version="1.0" encoding="UTF-8"?>
<openbox_config xmlns="http://openbox.org/3.4/rc">
  <theme>
    <name>Clearlooks</name>
    <titleLayout>NLIMC</titleLayout>
    <font place="ActiveWindow"><name>DejaVu Sans</name><size>10</size></font>
  </theme>
  <desktops><number>1</number></desktops>
  <keyboard>
    <keybind key="A-F4"><action name="Close"/></keybind>
    <keybind key="A-Tab"><action name="NextWindow"/></keybind>
  </keyboard>
</openbox_config>`;

await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox" });
await sdk.writeFsFile(
  { path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox/rc.xml" },
  openboxConfig,
);

Autostart programs

Openbox runs scripts in ~/.config/openbox/autostart on startup. Use this to launch applications, set the background, or configure the environment.
const autostart = `#!/bin/sh
# Set a solid background color
xsetroot -solid "#1e1e2e" &

# Launch a terminal
xterm -geometry 120x40+50+50 &

# Launch a browser
firefox --no-remote &
`;

await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox" });
await sdk.writeFsFile(
  { path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox/autostart" },
  autostart,
);
The autostart script runs when openbox starts, which happens during startDesktop(). Write the autostart file before calling startDesktop() for it to take effect.

Background

There is no wallpaper set by default (the background is the X root window default). You can set it using xsetroot in the autostart script (as shown above), or use feh if you need an image:
// Upload a wallpaper image
import fs from "node:fs";

const wallpaper = await fs.promises.readFile("./wallpaper.png");
await sdk.writeFsFile(
  { path: "~/.local/state/sandbox-agent/desktop/home/wallpaper.png" },
  wallpaper,
);

// Set the autostart to apply it
const autostart = `#!/bin/sh
feh --bg-fill ~/wallpaper.png &
`;

await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox" });
await sdk.writeFsFile(
  { path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox/autostart" },
  autostart,
);
feh is not installed by default. Install it via the Process API before starting the desktop: await sdk.runProcess({ command: "apt-get", args: ["install", "-y", "feh"] }).

Fonts

Only fonts-dejavu-core is installed by default. To add more fonts, install them with your system package manager or copy font files into the sandbox:
// Install a font package
await sdk.runProcess({
  command: "apt-get",
  args: ["install", "-y", "fonts-noto", "fonts-liberation"],
});

// Or copy a custom font file
import fs from "node:fs";

const font = await fs.promises.readFile("./CustomFont.ttf");
await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.local/share/fonts" });
await sdk.writeFsFile(
  { path: "~/.local/state/sandbox-agent/desktop/home/.local/share/fonts/CustomFont.ttf" },
  font,
);

// Rebuild the font cache
await sdk.runProcess({ command: "fc-cache", args: ["-fv"] });

Cursor theme

await sdk.runProcess({
  command: "apt-get",
  args: ["install", "-y", "dmz-cursor-theme"],
});

const xresources = `Xcursor.theme: DMZ-White\nXcursor.size: 24\n`;
await sdk.writeFsFile(
  { path: "~/.local/state/sandbox-agent/desktop/home/.Xresources" },
  xresources,
);
Run xrdb -merge ~/.Xresources (via the autostart or process API) after writing the file for changes to take effect.

Shell and terminal

No terminal emulator or shell is launched by default. Add one to the openbox autostart:
# In ~/.config/openbox/autostart
xterm -geometry 120x40+50+50 &
To use a different shell, set the SHELL environment variable in your Dockerfile or install your preferred shell and configure the terminal to use it.

GTK theme

Applications using GTK will pick up settings from ~/.config/gtk-3.0/settings.ini:
const gtkSettings = `[Settings]
gtk-theme-name=Adwaita
gtk-icon-theme-name=Adwaita
gtk-font-name=DejaVu Sans 10
gtk-cursor-theme-name=DMZ-White
gtk-cursor-theme-size=24
`;

await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.config/gtk-3.0" });
await sdk.writeFsFile(
  { path: "~/.local/state/sandbox-agent/desktop/home/.config/gtk-3.0/settings.ini" },
  gtkSettings,
);

Summary of configuration paths

All paths are relative to the desktop HOME directory (~/.local/state/sandbox-agent/desktop/home).
WhatPathNotes
Openbox config.config/openbox/rc.xmlWindow manager theme, keybindings, behavior
Autostart.config/openbox/autostartShell script run on desktop start
Custom fonts.local/share/fonts/TTF/OTF files, run fc-cache -fv after
Cursor theme.XresourcesRequires xrdb -merge to apply
GTK 3 settings.config/gtk-3.0/settings.iniTheme, icons, fonts for GTK apps
WallpaperAny path, referenced from autostartRequires feh or similar tool