{"data":{"kind":"file","path":"README.md","version_id":"e1qb771rpo00xm60dmmj366i","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":5647,"modified_at":"2026-01-30T00:24:50.128000","content_hash":"1f3c82f881d3a242c792bf6da7eb84477d32f605c053d31576113c631d489a5f"},"entries":[],"content":"# Browser CUA Mode Example\n\nA simple example environment demonstrating **CUA (Computer Use Agent) mode** browser automation using [Browserbase](https://browserbase.com).\n\nCUA mode uses vision-based primitives to control the browser through screenshots, similar to how a human would interact with a screen.\n\n## How CUA Mode Works\n\nCUA mode provides low-level vision-based operations:\n- **click(x, y)**: Click at screen coordinates\n- **type_text(text)**: Type text into focused element\n- **scroll(direction)**: Scroll the page\n- **screenshot()**: Capture current screen state\n- **navigate(url)**: Go to a URL\n\nThe agent sees screenshots and decides which actions to take based on visual understanding.\n\n## Installation\n\n```bash\n# Install browser extras\nuv pip install -e \".[browser]\"\n\n# Install this example environment\nuv pip install -e ./environments/browser_cua_example\n```\n\n## Configuration\n\n### Required Environment Variables\n\n```bash\n# Browserbase credentials\nexport BROWSERBASE_API_KEY=\"your-api-key\"\nexport BROWSERBASE_PROJECT_ID=\"your-project-id\"\n\n# API key for agent model\nexport OPENAI_API_KEY=\"your-openai-key\"\n```\n\n<!-- TODO: Update this section when MODEL_API_KEY support is added to CUA server -->\nNote: When running in manual server mode, ensure `OPENAI_API_KEY` is set in the terminal where the CUA server runs (Stagehand requires it internally).\n\n## Usage\n\n### Quick Test Commands\n\n```bash\n# Default - pre-built image (fastest)\nprime eval run browser-cua-example -m openai/gpt-4o-mini\n\n# Binary upload (custom server)\nprime eval run browser-cua-example -m openai/gpt-4o-mini -a '{\"use_prebuilt_image\": false}'\n\n# Local development\nprime eval run browser-cua-example -m openai/gpt-4o-mini -a '{\"use_sandbox\": false}'\n```\n\n### Pre-built Docker Image (Default, Fastest)\n\nBy default, CUA mode uses a pre-built Docker image (`deepdream19/cua-server:latest`) for fastest startup. The image includes the CUA server binary and all dependencies pre-installed:\n\n```bash\nprime eval run browser-cua-example -m openai/gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY\n```\n\nThis is the recommended approach for production use. Startup is ~5-10 seconds compared to ~30-60 seconds with binary upload.\n\n### Binary Upload Mode (Custom Server)\n\nIf you need to use a custom version of the CUA server, disable the prebuilt image to build and upload the binary at runtime:\n\n```bash\nprime eval run browser-cua-example -m openai/gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY -a '{\"use_prebuilt_image\": false}'\n```\n\nThis mode:\n1. Builds the CUA server binary via Docker (first run only)\n2. Uploads the binary to a sandbox container\n3. Installs dependencies (curl) in the sandbox\n4. Starts the server\n\n### Manual Server Mode (Local Development)\n\nFor local development, you can run the CUA server manually:\n\n1. **Start the CUA server** (in a separate terminal):\n   ```bash\n   cd assets/templates/browserbase/cua\n   export OPENAI_API_KEY=\"your-openai-key\"\n   pnpm dev\n   ```\n\n   The server runs on `http://localhost:3000` by default.\n\n2. **Run the evaluation with sandbox disabled**:\n   ```bash\n   prime eval run browser-cua-example -m openai/gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY -a '{\"use_sandbox\": false}'\n   ```\n\n### Custom Server URL\n\nIf running the CUA server on a different port:\n```bash\nprime eval run browser-cua-example -m openai/gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY -a '{\"use_sandbox\": false, \"server_url\": \"http://localhost:8080\"}'\n```\n\n## Environment Arguments\n\n| Argument | Default | Description |\n|----------|---------|-------------|\n| `max_turns` | `15` | Maximum conversation turns (recommended: 50 for complex tasks) |\n| `judge_model` | `\"gpt-4o-mini\"` | Model for task completion judging |\n| `use_sandbox` | `True` | Auto-deploy CUA server to sandbox |\n| `use_prebuilt_image` | `True` | Use pre-built Docker image (fastest startup) |\n| `prebuilt_image` | `\"deepdream19/cua-server:latest\"` | Docker image to use when `use_prebuilt_image=True` |\n| `server_url` | `\"http://localhost:3000\"` | CUA server URL (only used when `use_sandbox=False`) |\n| `viewport_width` | `1024` | Browser viewport width |\n| `viewport_height` | `768` | Browser viewport height |\n| `save_screenshots` | `False` | Save screenshots during execution |\n\n## Execution Modes Summary\n\n| Mode | Flag | Startup Time | Use Case |\n|------|------|--------------|----------|\n| **Pre-built image** (default) | None | ~5-10s | Production, fastest startup |\n| **Binary upload** | `use_prebuilt_image=false` | ~30-60s | Custom server version |\n| **Manual server** | `use_sandbox=false` | Instant | Local development |\n\n## Building a Custom Docker Image\n\nTo build and push a custom CUA server image:\n\n```bash\ncd assets/templates/browserbase/cua\n./build-and-push.sh                    # Push as :latest\n./build-and-push.sh v1.0.0             # Push with version tag\nDOCKERHUB_USER=myuser ./build-and-push.sh  # Use different Docker Hub user\n```\n\nThen use your custom image:\n```bash\nprime eval run browser-cua-example -m openai/gpt-4.1-mini -a '{\"prebuilt_image\": \"myuser/cua-server:v1.0.0\"}'\n```\n\n## DOM vs CUA Mode Comparison\n\n| Aspect | DOM Mode | CUA Mode |\n|--------|----------|----------|\n| **Control** | Natural language via Stagehand | Vision-based coordinates |\n| **Server** | None required | CUA server (auto-deployed) |\n| **MODEL_API_KEY** | Required (for Stagehand) | Not required |\n| **Best for** | Structured web interactions | Visual/complex UIs |\n| **Speed** | Faster (direct DOM) | Slower (screenshots) |\n\n## Requirements\n\n- Python >= 3.10\n- Browserbase account with API credentials\n- OpenAI API key\n","encoding":"utf-8","truncated":false,"total_bytes":5647},"status":null}