Computer Use

Sandbox Agent provides a managed virtual desktop (Xvfb + openbox) that you can control programmatically. This is useful for browser automation, GUI testing, and AI computer-use workflows.

Start and stop

import { SandboxAgent } from "sandbox-agent";

const sdk = await SandboxAgent.connect({
  baseUrl: "http://127.0.0.1:2468",
});

const status = await sdk.startDesktop({
  width: 1920,
  height: 1080,
  dpi: 96,
});

console.log(status.state); // "active"
console.log(status.display); // ":99"

// When done
await sdk.stopDesktop();

All fields in the start request are optional. Defaults are 1440x900 at 96 DPI.

Start request options

Field	Type	Default	Description
`width`	number	1440	Desktop width in pixels
`height`	number	900	Desktop height in pixels
`dpi`	number	96	Display DPI
`displayNum`	number	99	Starting X display number. The runtime probes from this number upward to find an available display.
`stateDir`	string	(auto)	Desktop state directory for home, logs, recordings
`streamVideoCodec`	string	`"vp8"`	WebRTC video codec (`vp8`, `vp9`, `h264`)
`streamAudioCodec`	string	`"opus"`	WebRTC audio codec (`opus`, `g722`)
`streamFrameRate`	number	30	Streaming frame rate (1-60)
`webrtcPortRange`	string	`"59050-59070"`	UDP port range for WebRTC media
`recordingFps`	number	30	Default recording FPS when not specified in `startDesktopRecording` (1-60)

The streaming and recording options configure defaults for the desktop session. They take effect when streaming or recording is started later.

const status = await sdk.startDesktop({
  width: 1920,
  height: 1080,
  streamVideoCodec: "h264",
  streamFrameRate: 60,
  webrtcPortRange: "59100-59120",
  recordingFps: 15,
});

Status

const status = await sdk.getDesktopStatus();
console.log(status.state); // "inactive" | "active" | "failed" | ...

Screenshots

Capture the full desktop or a specific region. Optionally include the cursor position.

// Full screenshot (PNG by default)
const png = await sdk.takeDesktopScreenshot();

// JPEG at 70% quality, half scale
const jpeg = await sdk.takeDesktopScreenshot({
  format: "jpeg",
  quality: 70,
  scale: 0.5,
});

// Include cursor overlay
const withCursor = await sdk.takeDesktopScreenshot({
  showCursor: true,
});

// Region screenshot
const region = await sdk.takeDesktopRegionScreenshot({
  x: 100,
  y: 100,
  width: 400,
  height: 300,
});

Screenshot options

Param	Type	Default	Description
`format`	string	`"png"`	Output format: `png`, `jpeg`, or `webp`
`quality`	number	85	Compression quality (1-100, JPEG/WebP only)
`scale`	number	1.0	Scale factor (0.1-1.0)
`showCursor`	boolean	`false`	Composite a crosshair at the cursor position

When showCursor is enabled, the cursor position is captured at the moment of the screenshot and a red crosshair is drawn at that location. This is useful for AI agents that need to see where the cursor is in the screenshot.

Mouse

// Get current position
const pos = await sdk.getDesktopMousePosition();
console.log(pos.x, pos.y);

// Move
await sdk.moveDesktopMouse({ x: 500, y: 300 });

// Click (left by default)
await sdk.clickDesktop({ x: 500, y: 300 });

// Right click
await sdk.clickDesktop({ x: 500, y: 300, button: "right" });

// Double click
await sdk.clickDesktop({ x: 500, y: 300, clickCount: 2 });

// Drag
await sdk.dragDesktopMouse({
  startX: 100, startY: 100,
  endX: 400, endY: 400,
});

// Scroll
await sdk.scrollDesktop({ x: 500, y: 300, deltaY: -3 });

Keyboard

// Type text
await sdk.typeDesktopText({ text: "Hello, world!" });

// Press a key with modifiers
await sdk.pressDesktopKey({
  key: "c",
  modifiers: { ctrl: true },
});

// Low-level key down/up
await sdk.keyDownDesktop({ key: "Shift_L" });
await sdk.keyUpDesktop({ key: "Shift_L" });

Clipboard

Read and write the X11 clipboard programmatically.

// Read clipboard
const clipboard = await sdk.getDesktopClipboard();
console.log(clipboard.text);

// Read primary selection (mouse-selected text)
const primary = await sdk.getDesktopClipboard({ selection: "primary" });

// Write to clipboard
await sdk.setDesktopClipboard({ text: "Pasted via API" });

// Write to both clipboard and primary selection
await sdk.setDesktopClipboard({
  text: "Synced text",
  selection: "both",
});

The selection parameter controls which X11 selection to read or write:

Value	Description
`clipboard` (default)	The standard clipboard (Ctrl+C / Ctrl+V)
`primary`	The primary selection (text selected with the mouse)
`both`	Write to both clipboard and primary selection (write only)

Display and windows

const display = await sdk.getDesktopDisplayInfo();
console.log(display.resolution); // { width: 1920, height: 1080, dpi: 96 }

const { windows } = await sdk.listDesktopWindows();
for (const win of windows) {
  console.log(win.title, win.x, win.y, win.width, win.height);
}

The windows endpoint filters out noise automatically: window manager internals (Openbox), windows with empty titles, and tiny helper windows (under 120x80) are excluded. The currently active/focused window is always included regardless of filters.

Focused window

Get the currently focused window without listing all windows.

const focused = await sdk.getDesktopFocusedWindow();
console.log(focused.title, focused.id);

Returns 404 if no window currently has focus.

Window management

Focus, move, and resize windows by their X11 window ID.

const { windows } = await sdk.listDesktopWindows();
const win = windows[0];

// Bring window to foreground
await sdk.focusDesktopWindow(win.id);

// Move window
await sdk.moveDesktopWindow(win.id, { x: 100, y: 50 });

// Resize window
await sdk.resizeDesktopWindow(win.id, { width: 1280, height: 720 });

All three endpoints return the updated window info so you can verify the operation took effect. The window manager may adjust the requested position or size.

App launching

Launch applications or open files/URLs on the desktop without needing to shell out.

// Launch an app by name
const result = await sdk.launchDesktopApp({
  app: "firefox",
  args: ["--private"],
});
console.log(result.processId); // "proc_7"

// Launch and wait for the window to appear
const withWindow = await sdk.launchDesktopApp({
  app: "xterm",
  wait: true,
});
console.log(withWindow.windowId); // "12345" or null if timed out

// Open a URL with the default handler
const opened = await sdk.openDesktopTarget({
  target: "https://example.com",
});
console.log(opened.processId);

The returned processId can be used with the Process API to read logs (GET /v1/processes/{id}/logs) or stop the application (POST /v1/processes/{id}/stop). When wait is true, the API polls for up to 5 seconds for a window to appear. If the window appears, its ID is returned in windowId. If it times out, windowId is null but the process is still running.

Launch/Open vs the Process API: Both launch and open are convenience wrappers around the Process API. They create managed processes (with owner: "desktop") that you can inspect, log, and stop through the same Process endpoints. The difference is that launch validates the binary exists in PATH first and can optionally wait for a window to appear, while open delegates to the system default handler (xdg-open). Use the Process API directly when you need full control over command, environment, working directory, or restart policies.

Recording

Record the desktop to MP4.

const recording = await sdk.startDesktopRecording({ fps: 30 });
console.log(recording.id);

// ... do things ...

const stopped = await sdk.stopDesktopRecording();

// List all recordings
const { recordings } = await sdk.listDesktopRecordings();

// Download
const mp4 = await sdk.downloadDesktopRecording(recording.id);

// Clean up
await sdk.deleteDesktopRecording(recording.id);

Desktop processes

The desktop runtime manages several background processes (Xvfb, openbox, neko, ffmpeg). These are all registered with the general Process API under the desktop owner, so you can inspect logs, check status, and troubleshoot using the same tools you use for any other managed process.

// List all processes, including desktop-owned ones
const { processes } = await sdk.listProcesses();

const desktopProcs = processes.filter((p) => p.owner === "desktop");
for (const p of desktopProcs) {
  console.log(p.id, p.command, p.status);
}

// Read logs from a specific desktop process
const logs = await sdk.getProcessLogs(desktopProcs[0].id, { tail: 50 });
for (const entry of logs.entries) {
  console.log(entry.stream, atob(entry.data));
}

The desktop status endpoint also includes a summary of running processes:

const status = await sdk.getDesktopStatus();
for (const proc of status.processes) {
  console.log(proc.name, proc.pid, proc.running);
}

Process	Role	Restart policy
Xvfb	Virtual X11 framebuffer	Auto-restart while desktop is active
openbox	Window manager	Auto-restart while desktop is active
neko	WebRTC streaming server (started by `startDesktopStream`)	No auto-restart
ffmpeg	Screen recorder (started by `startDesktopRecording`)	No auto-restart

Live streaming

Start a WebRTC stream for real-time desktop viewing in a browser.

await sdk.startDesktopStream();

// Check stream status
const status = await sdk.getDesktopStreamStatus();
console.log(status.active); // true
console.log(status.processId); // "proc_5"

// Connect via the React DesktopViewer component or
// use the WebSocket signaling endpoint directly
// at ws://127.0.0.1:2468/v1/desktop/stream/signaling

await sdk.stopDesktopStream();

For a drop-in React component, see React Components.

API reference

Endpoints

Method	Path	Description
`POST`	`/v1/desktop/start`	Start the desktop runtime
`POST`	`/v1/desktop/stop`	Stop the desktop runtime
`GET`	`/v1/desktop/status`	Get desktop runtime status
`GET`	`/v1/desktop/screenshot`	Capture full desktop screenshot
`GET`	`/v1/desktop/screenshot/region`	Capture a region screenshot
`GET`	`/v1/desktop/mouse/position`	Get current mouse position
`POST`	`/v1/desktop/mouse/move`	Move the mouse
`POST`	`/v1/desktop/mouse/click`	Click the mouse
`POST`	`/v1/desktop/mouse/down`	Press mouse button down
`POST`	`/v1/desktop/mouse/up`	Release mouse button
`POST`	`/v1/desktop/mouse/drag`	Drag from one point to another
`POST`	`/v1/desktop/mouse/scroll`	Scroll at a position
`POST`	`/v1/desktop/keyboard/type`	Type text
`POST`	`/v1/desktop/keyboard/press`	Press a key with optional modifiers
`POST`	`/v1/desktop/keyboard/down`	Press a key down (hold)
`POST`	`/v1/desktop/keyboard/up`	Release a key
`GET`	`/v1/desktop/display/info`	Get display info
`GET`	`/v1/desktop/windows`	List visible windows
`GET`	`/v1/desktop/windows/focused`	Get focused window info
`POST`	`/v1/desktop/windows/{id}/focus`	Focus a window
`POST`	`/v1/desktop/windows/{id}/move`	Move a window
`POST`	`/v1/desktop/windows/{id}/resize`	Resize a window
`GET`	`/v1/desktop/clipboard`	Read clipboard contents
`POST`	`/v1/desktop/clipboard`	Write to clipboard
`POST`	`/v1/desktop/launch`	Launch an application
`POST`	`/v1/desktop/open`	Open a file or URL
`POST`	`/v1/desktop/recording/start`	Start recording
`POST`	`/v1/desktop/recording/stop`	Stop recording
`GET`	`/v1/desktop/recordings`	List recordings
`GET`	`/v1/desktop/recordings/{id}`	Get recording metadata
`GET`	`/v1/desktop/recordings/{id}/download`	Download recording
`DELETE`	`/v1/desktop/recordings/{id}`	Delete recording
`POST`	`/v1/desktop/stream/start`	Start WebRTC streaming
`POST`	`/v1/desktop/stream/stop`	Stop WebRTC streaming
`GET`	`/v1/desktop/stream/status`	Get stream status
`GET`	`/v1/desktop/stream/signaling`	WebSocket for WebRTC signaling

TypeScript SDK methods

Method	Returns	Description
`startDesktop(request?)`	`DesktopStatusResponse`	Start the desktop
`stopDesktop()`	`DesktopStatusResponse`	Stop the desktop
`getDesktopStatus()`	`DesktopStatusResponse`	Get desktop status
`takeDesktopScreenshot(query?)`	`Uint8Array`	Capture screenshot
`takeDesktopRegionScreenshot(query)`	`Uint8Array`	Capture region screenshot
`getDesktopMousePosition()`	`DesktopMousePositionResponse`	Get mouse position
`moveDesktopMouse(request)`	`DesktopMousePositionResponse`	Move mouse
`clickDesktop(request)`	`DesktopMousePositionResponse`	Click mouse
`mouseDownDesktop(request)`	`DesktopMousePositionResponse`	Mouse button down
`mouseUpDesktop(request)`	`DesktopMousePositionResponse`	Mouse button up
`dragDesktopMouse(request)`	`DesktopMousePositionResponse`	Drag mouse
`scrollDesktop(request)`	`DesktopMousePositionResponse`	Scroll
`typeDesktopText(request)`	`DesktopActionResponse`	Type text
`pressDesktopKey(request)`	`DesktopActionResponse`	Press key
`keyDownDesktop(request)`	`DesktopActionResponse`	Key down
`keyUpDesktop(request)`	`DesktopActionResponse`	Key up
`getDesktopDisplayInfo()`	`DesktopDisplayInfoResponse`	Get display info
`listDesktopWindows()`	`DesktopWindowListResponse`	List windows
`getDesktopFocusedWindow()`	`DesktopWindowInfo`	Get focused window
`focusDesktopWindow(id)`	`DesktopWindowInfo`	Focus a window
`moveDesktopWindow(id, request)`	`DesktopWindowInfo`	Move a window
`resizeDesktopWindow(id, request)`	`DesktopWindowInfo`	Resize a window
`getDesktopClipboard(query?)`	`DesktopClipboardResponse`	Read clipboard
`setDesktopClipboard(request)`	`DesktopActionResponse`	Write clipboard
`launchDesktopApp(request)`	`DesktopLaunchResponse`	Launch an app
`openDesktopTarget(request)`	`DesktopOpenResponse`	Open file/URL
`startDesktopRecording(request?)`	`DesktopRecordingInfo`	Start recording
`stopDesktopRecording()`	`DesktopRecordingInfo`	Stop recording
`listDesktopRecordings()`	`DesktopRecordingListResponse`	List recordings
`getDesktopRecording(id)`	`DesktopRecordingInfo`	Get recording
`downloadDesktopRecording(id)`	`Uint8Array`	Download recording
`deleteDesktopRecording(id)`	`void`	Delete recording
`startDesktopStream()`	`DesktopStreamStatusResponse`	Start streaming
`stopDesktopStream()`	`DesktopStreamStatusResponse`	Stop streaming
`getDesktopStreamStatus()`	`DesktopStreamStatusResponse`	Stream status

Customizing the desktop environment

The desktop runs inside the sandbox filesystem, so you can customize it using the File System API before or after starting the desktop. The desktop HOME directory is located at ~/.local/state/sandbox-agent/desktop/home (or $XDG_STATE_HOME/sandbox-agent/desktop/home if XDG_STATE_HOME is set). All configuration files below are written to paths relative to this HOME directory.

Window manager (openbox)

The desktop uses openbox as its window manager. You can customize its behavior, theme, and keyboard shortcuts by writing an rc.xml config file.

const openboxConfig = `<?xml version="1.0" encoding="UTF-8"?>
<openbox_config xmlns="http://openbox.org/3.4/rc">
  <theme>
    <name>Clearlooks</name>
    <titleLayout>NLIMC</titleLayout>
    <font place="ActiveWindow"><name>DejaVu Sans</name><size>10</size></font>
  </theme>
  <desktops><number>1</number></desktops>
  <keyboard>
    <keybind key="A-F4"><action name="Close"/></keybind>
    <keybind key="A-Tab"><action name="NextWindow"/></keybind>
  </keyboard>
</openbox_config>`;

await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox" });
await sdk.writeFsFile(
  { path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox/rc.xml" },
  openboxConfig,
);

Autostart programs

Openbox runs scripts in ~/.config/openbox/autostart on startup. Use this to launch applications, set the background, or configure the environment.

const autostart = `#!/bin/sh
# Set a solid background color
xsetroot -solid "#1e1e2e" &

# Launch a terminal
xterm -geometry 120x40+50+50 &

# Launch a browser
firefox --no-remote &
`;

await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox" });
await sdk.writeFsFile(
  { path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox/autostart" },
  autostart,
);

The autostart script runs when openbox starts, which happens during startDesktop(). Write the autostart file before calling startDesktop() for it to take effect.

Background

There is no wallpaper set by default (the background is the X root window default). You can set it using xsetroot in the autostart script (as shown above), or use feh if you need an image:

// Upload a wallpaper image
import fs from "node:fs";

const wallpaper = await fs.promises.readFile("./wallpaper.png");
await sdk.writeFsFile(
  { path: "~/.local/state/sandbox-agent/desktop/home/wallpaper.png" },
  wallpaper,
);

// Set the autostart to apply it
const autostart = `#!/bin/sh
feh --bg-fill ~/wallpaper.png &
`;

await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox" });
await sdk.writeFsFile(
  { path: "~/.local/state/sandbox-agent/desktop/home/.config/openbox/autostart" },
  autostart,
);

feh is not installed by default. Install it via the Process API before starting the desktop: await sdk.runProcess({ command: "apt-get", args: ["install", "-y", "feh"] }).

Fonts

Only fonts-dejavu-core is installed by default. To add more fonts, install them with your system package manager or copy font files into the sandbox:

// Install a font package
await sdk.runProcess({
  command: "apt-get",
  args: ["install", "-y", "fonts-noto", "fonts-liberation"],
});

// Or copy a custom font file
import fs from "node:fs";

const font = await fs.promises.readFile("./CustomFont.ttf");
await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.local/share/fonts" });
await sdk.writeFsFile(
  { path: "~/.local/state/sandbox-agent/desktop/home/.local/share/fonts/CustomFont.ttf" },
  font,
);

// Rebuild the font cache
await sdk.runProcess({ command: "fc-cache", args: ["-fv"] });

Cursor theme

await sdk.runProcess({
  command: "apt-get",
  args: ["install", "-y", "dmz-cursor-theme"],
});

const xresources = `Xcursor.theme: DMZ-White\nXcursor.size: 24\n`;
await sdk.writeFsFile(
  { path: "~/.local/state/sandbox-agent/desktop/home/.Xresources" },
  xresources,
);

Run xrdb -merge ~/.Xresources (via the autostart or process API) after writing the file for changes to take effect.

Shell and terminal

No terminal emulator or shell is launched by default. Add one to the openbox autostart:

# In ~/.config/openbox/autostart
xterm -geometry 120x40+50+50 &

To use a different shell, set the SHELL environment variable in your Dockerfile or install your preferred shell and configure the terminal to use it.

GTK theme

Applications using GTK will pick up settings from ~/.config/gtk-3.0/settings.ini:

const gtkSettings = `[Settings]
gtk-theme-name=Adwaita
gtk-icon-theme-name=Adwaita
gtk-font-name=DejaVu Sans 10
gtk-cursor-theme-name=DMZ-White
gtk-cursor-theme-size=24
`;

await sdk.mkdirFs({ path: "~/.local/state/sandbox-agent/desktop/home/.config/gtk-3.0" });
await sdk.writeFsFile(
  { path: "~/.local/state/sandbox-agent/desktop/home/.config/gtk-3.0/settings.ini" },
  gtkSettings,
);

Summary of configuration paths

All paths are relative to the desktop HOME directory (~/.local/state/sandbox-agent/desktop/home).

What	Path	Notes
Openbox config	`.config/openbox/rc.xml`	Window manager theme, keybindings, behavior
Autostart	`.config/openbox/autostart`	Shell script run on desktop start
Custom fonts	`.local/share/fonts/`	TTF/OTF files, run `fc-cache -fv` after
Cursor theme	`.Xresources`	Requires `xrdb -merge` to apply
GTK 3 settings	`.config/gtk-3.0/settings.ini`	Theme, icons, fonts for GTK apps
Wallpaper	Any path, referenced from autostart	Requires `feh` or similar tool

Getting started

Agent

System

Reference

Start and stop

Start request options

Status

Screenshots

Screenshot options

Mouse

Keyboard

Clipboard

Display and windows

Focused window

Window management

App launching

Recording

Desktop processes

Live streaming

API reference

Endpoints

TypeScript SDK methods

Customizing the desktop environment

Window manager (openbox)

Autostart programs

Background

Fonts

Cursor theme

Shell and terminal

GTK theme

Summary of configuration paths

Getting started

Agent

System

Reference

​Start and stop

​Start request options

​Status

​Screenshots

​Screenshot options

​Mouse

​Keyboard

​Clipboard

​Display and windows

​Focused window

​Window management

​App launching

​Recording

​Desktop processes

​Live streaming

​API reference

​Endpoints

​TypeScript SDK methods

​Customizing the desktop environment

​Window manager (openbox)

​Autostart programs

​Background

​Fonts

​Cursor theme

​Shell and terminal

​GTK theme

​Summary of configuration paths

Start and stop

Start request options

Status

Screenshots

Screenshot options

Mouse

Keyboard

Clipboard

Display and windows

Focused window

Window management

App launching

Recording

Desktop processes

Live streaming

API reference

Endpoints

TypeScript SDK methods

Customizing the desktop environment

Window manager (openbox)

Autostart programs

Background

Fonts

Cursor theme

Shell and terminal

GTK theme

Summary of configuration paths