> ## Documentation Index
> Fetch the complete documentation index at: https://docs.bor-os.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Computer use

# Computer Use

BOR can operate your actual Mac: read the screen, find UI elements, and click, type, and press keys in *any* application. This is how it does things that aren't on the web and aren't in a file — "rename these layers in Figma", "send this in the Mail app", "click Run in Xcode". Computer-use tools live in `server/protocol/handlers/computer.js`.

## The tools

| Tool                    | Purpose                                                                                                                                  |
| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| `screen_capture`        | Capture your real Mac screen and return a screenshot (BOR hides its own presence during capture by default, so you see your actual app). |
| `locate_screen_element` | Use the configured vision model to find a visible UI element and return click coordinates.                                               |
| `computer_click`        | Click at macOS coordinates (usually from `locate_screen_element`).                                                                       |
| `computer_type`         | Type into the focused app.                                                                                                               |
| `computer_key`          | Press a key / shortcut.                                                                                                                  |
| `open_app`              | Launch a macOS application.                                                                                                              |
| `computer_use`          | A higher-level loop that combines the above to accomplish a goal.                                                                        |

## How it sees

`screen_capture` returns an image of your screen. By default BOR **hides its own presence/thought bubble** during the capture, so the screenshot shows the app *underneath* — not BOR's overlay. The presence preview shows the captured image inline (so BOR doesn't double-fetch it).

`locate_screen_element` then asks your configured provider's **vision model** "where is the blue Share button?" and returns both the model coordinate and macOS click coordinates you can hand straight to `computer_click`.

## How it acts

`computer_click` / `computer_type` / `computer_key` drive macOS through System Events. The "truth rule" BOR follows: a click/type result only proves the command was *sent* — not that anything succeeded. So BOR re-captures the screen to verify what actually happened before claiming a result.

`computer_use` wraps this into a goal-directed loop: capture → locate → act → re-capture → repeat, until the goal is met or it hits a real blocker (then it asks you).

## Permissions

Computer use needs macOS permissions:

* **Screen Recording** — for `screen_capture` and `locate_screen_element` (the Electron/Terminal host must be allowed).
* **Accessibility** — for `computer_click`/`type`/`key` (the host must be allowed).

BOR can probe these (`/api/system/permissions`) and open the right settings pane (`/api/system/permissions/open`) so you can grant them.

## Computer use vs. the browser

* For **web** tasks, prefer [`browser_action`](the-browser.md) (a controlled headless browser) or the visible Browser app — faster and more reliable than operating a browser window by hand.
* For **native app** tasks (Figma, Xcode, Mail, Finder, anything not web), computer use is the way.

When a browser task can't be done with `browser_action`, BOR can fall back to driving the *visible* browser with computer use.

## Use cases

* *"In Figma, select the frame tool."* → `screen_capture`, `locate_screen_element("the Figma frame tool")`, `computer_click`.
* *"Open Xcode and click Run."* → `open_app`, then locate + click.
* *"What's on my screen right now?"* → `screen_capture` and describe it.
* *"Fill in this native form."* → locate each field, `computer_click` + `computer_type`, re-capture to verify.
