Skip to main content

Computer Use

BOR can operate your actual Mac: read the screen, find UI elements, and click, type, and press keys in any application. This is how it does things that aren’t on the web and aren’t in a file — “rename these layers in Figma”, “send this in the Mail app”, “click Run in Xcode”. Computer-use tools live in server/protocol/handlers/computer.js.

The tools

ToolPurpose
screen_captureCapture your real Mac screen and return a screenshot (BOR hides its own presence during capture by default, so you see your actual app).
locate_screen_elementUse the configured vision model to find a visible UI element and return click coordinates.
computer_clickClick at macOS coordinates (usually from locate_screen_element).
computer_typeType into the focused app.
computer_keyPress a key / shortcut.
open_appLaunch a macOS application.
computer_useA higher-level loop that combines the above to accomplish a goal.

How it sees

screen_capture returns an image of your screen. By default BOR hides its own presence/thought bubble during the capture, so the screenshot shows the app underneath — not BOR’s overlay. The presence preview shows the captured image inline (so BOR doesn’t double-fetch it). locate_screen_element then asks your configured provider’s vision model “where is the blue Share button?” and returns both the model coordinate and macOS click coordinates you can hand straight to computer_click.

How it acts

computer_click / computer_type / computer_key drive macOS through System Events. The “truth rule” BOR follows: a click/type result only proves the command was sent — not that anything succeeded. So BOR re-captures the screen to verify what actually happened before claiming a result. computer_use wraps this into a goal-directed loop: capture → locate → act → re-capture → repeat, until the goal is met or it hits a real blocker (then it asks you).

Permissions

Computer use needs macOS permissions:
  • Screen Recording — for screen_capture and locate_screen_element (the Electron/Terminal host must be allowed).
  • Accessibility — for computer_click/type/key (the host must be allowed).
BOR can probe these (/api/system/permissions) and open the right settings pane (/api/system/permissions/open) so you can grant them.

Computer use vs. the browser

  • For web tasks, prefer browser_action (a controlled headless browser) or the visible Browser app — faster and more reliable than operating a browser window by hand.
  • For native app tasks (Figma, Xcode, Mail, Finder, anything not web), computer use is the way.
When a browser task can’t be done with browser_action, BOR can fall back to driving the visible browser with computer use.

Use cases

  • “In Figma, select the frame tool.”screen_capture, locate_screen_element("the Figma frame tool"), computer_click.
  • “Open Xcode and click Run.”open_app, then locate + click.
  • “What’s on my screen right now?”screen_capture and describe it.
  • “Fill in this native form.” → locate each field, computer_click + computer_type, re-capture to verify.