Computer Use
BOR can operate your actual Mac: read the screen, find UI elements, and click, type, and press keys in any application. This is how it does things that aren’t on the web and aren’t in a file — “rename these layers in Figma”, “send this in the Mail app”, “click Run in Xcode”. Computer-use tools live inserver/protocol/handlers/computer.js.
The tools
| Tool | Purpose |
|---|---|
screen_capture | Capture your real Mac screen and return a screenshot (BOR hides its own presence during capture by default, so you see your actual app). |
locate_screen_element | Use the configured vision model to find a visible UI element and return click coordinates. |
computer_click | Click at macOS coordinates (usually from locate_screen_element). |
computer_type | Type into the focused app. |
computer_key | Press a key / shortcut. |
open_app | Launch a macOS application. |
computer_use | A higher-level loop that combines the above to accomplish a goal. |
How it sees
screen_capture returns an image of your screen. By default BOR hides its own presence/thought bubble during the capture, so the screenshot shows the app underneath — not BOR’s overlay. The presence preview shows the captured image inline (so BOR doesn’t double-fetch it).
locate_screen_element then asks your configured provider’s vision model “where is the blue Share button?” and returns both the model coordinate and macOS click coordinates you can hand straight to computer_click.
How it acts
computer_click / computer_type / computer_key drive macOS through System Events. The “truth rule” BOR follows: a click/type result only proves the command was sent — not that anything succeeded. So BOR re-captures the screen to verify what actually happened before claiming a result.
computer_use wraps this into a goal-directed loop: capture → locate → act → re-capture → repeat, until the goal is met or it hits a real blocker (then it asks you).
Permissions
Computer use needs macOS permissions:- Screen Recording — for
screen_captureandlocate_screen_element(the Electron/Terminal host must be allowed). - Accessibility — for
computer_click/type/key(the host must be allowed).
/api/system/permissions) and open the right settings pane (/api/system/permissions/open) so you can grant them.
Computer use vs. the browser
- For web tasks, prefer
browser_action(a controlled headless browser) or the visible Browser app — faster and more reliable than operating a browser window by hand. - For native app tasks (Figma, Xcode, Mail, Finder, anything not web), computer use is the way.
browser_action, BOR can fall back to driving the visible browser with computer use.
Use cases
- “In Figma, select the frame tool.” →
screen_capture,locate_screen_element("the Figma frame tool"),computer_click. - “Open Xcode and click Run.” →
open_app, then locate + click. - “What’s on my screen right now?” →
screen_captureand describe it. - “Fill in this native form.” → locate each field,
computer_click+computer_type, re-capture to verify.