{"data":{"kind":"file","path":"README.md","version_id":"aa7hnk8vvc2np1vcvhbezilb","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2297,"modified_at":"2026-01-30T00:24:50.103000","content_hash":"e3a30f816a7865a62c4c26d56c62bd6bd796d4a982220f1e766ee478b5cd5663"},"entries":[],"content":"# Browser DOM Mode Example\n\nA simple example environment demonstrating **DOM mode** browser automation using [Browserbase](https://browserbase.com) and [Stagehand](https://github.com/browserbase/stagehand).\n\nDOM mode uses the Stagehand SDK to translate natural language commands into browser actions.\n\n## How DOM Mode Works\n\nDOM mode provides these natural language operations:\n- **act**: Perform actions like clicking buttons, filling forms\n- **observe**: Get information about visible elements\n- **extract**: Extract structured data from the page\n- **navigate**: Go to URLs\n\nStagehand uses an LLM (configured via `stagehand_model`) to understand the page DOM and execute the appropriate browser actions.\n\n## Installation\n\n```bash\n# Install browser extras\nuv pip install -e \".[browser]\"\n\n# Install this example environment\nuv pip install -e ./environments/browser_dom_example\n```\n\n## Configuration\n\n### Required Environment Variables\n\n```bash\n# Browserbase credentials\nexport BROWSERBASE_API_KEY=\"your-api-key\"\nexport BROWSERBASE_PROJECT_ID=\"your-project-id\"\n\n# API keys for models\nexport OPENAI_API_KEY=\"your-openai-key\"    # For agent model\nexport MODEL_API_KEY=\"your-openai-key\"     # For Stagehand (can be same as OPENAI_API_KEY)\n```\n\n### Why MODEL_API_KEY?\n\nStagehand needs its own LLM to understand the DOM and translate natural language to actions. The `MODEL_API_KEY` environment variable provides the API key for this internal Stagehand model.\n\n## Usage\n\n```bash\nprime eval run browser-dom-example -m gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY\n```\n\n## Environment Arguments\n\n| Argument | Default | Description |\n|----------|---------|-------------|\n| `max_turns` | `10` | Maximum conversation turns (recommended: 50 for complex tasks) |\n| `judge_model` | `\"gpt-4o-mini\"` | Model for task completion judging |\n| `stagehand_model` | `\"openai/gpt-4o-mini\"` | Model for Stagehand DOM operations |\n\n## Example Task\n\nThe smoke test navigates to the Prime Intellect homepage and asks the agent to read the headline. The agent uses DOM mode operations to:\n1. Navigate to the page\n2. Observe visible text\n3. Extract the headline content\n4. Report the answer\n\n## Requirements\n\n- Python >= 3.10\n- Browserbase account with API credentials\n- OpenAI API key (for agent and Stagehand)\n","encoding":"utf-8","truncated":false,"total_bytes":2297},"status":null}