{"data":{"kind":"file","path":"README.md","version_id":"kyenjwrcwziae9dn77ws17yf","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3473,"modified_at":"2026-02-05T02:34:04.479000","content_hash":"286d858b4d805a6915c2eb17e8f0010ca61a7747762415b3308f94336351e688"},"entries":[],"content":"# onlineMind2Web Browser Benchmark (No Anti-Bot)\n\nA browser benchmark environment for evaluating LLM agents on Mind2Web web navigation tasks using [Browserbase](https://browserbase.com).\n\nThis version uses a **filtered dataset** that excludes websites with anti-bot protection for more reliable evaluation.\n\nMind2Web contains tasks with varying difficulty levels. Tasks are evaluated based on successful completion rather than explicit ground-truth answers.\n\n## Dataset\n\n- **Total tasks**: 276 tasks (92.0% of original 300 tasks)\n- **Excluded sites**: gamestop.com, cargurus.com, justice.gov, macys.com, reddit.com, apartments.com, phys.org, adidas.com, expedia.com\n- **Removed tasks**: 24 tasks from 9 sites with anti-bot detection\n- **Task format**: Web navigation tasks\n- **Evaluation**: Task completion judging via LLM\n\n## Installation\n\nFirst, install the browser extras for verifiers:\n```bash\nuv pip install -e \".[browser]\"\n```\n\nThen install the mind2web-no-anti-bot environment locally:\n```bash\nuv pip install -e ./environments/mind2web_no_anti_bot\n```\n\nOr install from Prime hub:\n```bash\nprime env install browserbase/mind2web-no-anti-bot\n```\n\n## Usage\n\n### Quick Start\n\n```bash\n# Run Mind2Web benchmark with OpenAI (clean dataset)\nprime eval run mind2web-no-anti-bot -m gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY\n```\n\n### Configuration\n\nSet your Browserbase credentials:\n```bash\nexport BROWSERBASE_API_KEY=\"your-api-key\"\nexport BROWSERBASE_PROJECT_ID=\"your-project-id\"\n```\n\nFor DOM mode (default), you'll also need:\n```bash\nexport OPENAI_API_KEY=\"your-openai-key\"  # For agent model and judge\nexport MODEL_API_KEY=\"your-openai-key\"   # For Stagehand browser operations\n```\n\n### Difficulty Levels\n\nMind2Web has three difficulty levels (task counts from clean dataset):\n\n```bash\n# Run all difficulty levels (default)\nprime eval run mind2web-no-anti-bot -m gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY\n\n# Run easy tasks only\nprime eval run mind2web-no-anti-bot -m gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY -a '{\"difficulty_level\": \"easy\"}'\n\n# Run medium tasks\nprime eval run mind2web-no-anti-bot -m gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY -a '{\"difficulty_level\": \"medium\"}'\n\n# Run hard tasks\nprime eval run mind2web-no-anti-bot -m gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY -a '{\"difficulty_level\": \"hard\"}'\n```\n\n### Browser Modes\n\n**DOM Mode** (default): Uses Stagehand SDK for natural language browser control.\n```bash\nprime eval run mind2web-no-anti-bot -m gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY\n```\n\n**CUA Mode**: Uses vision-based primitives via a CUA server.\n```bash\nprime eval run mind2web-no-anti-bot -m gpt-4.1-mini -b https://api.openai.com/v1 -k OPENAI_API_KEY -a '{\"mode\": \"cua\", \"server_url\": \"http://localhost:3000\"}'\n```\n\n## Environment Arguments\n\n| Argument | Default | Description |\n|----------|---------|-------------|\n| `mode` | `\"dom\"` | Browser control mode (`\"dom\"` or `\"cua\"`) |\n| `max_turns` | `50` | Maximum conversation turns (recommended: 50 for complex tasks) |\n| `judge_model` | `\"gpt-4o-mini\"` | Model for task completion judging |\n| `num_examples` | `-1` | Number of examples (-1 for all) |\n| `difficulty_level` | `None` | Task difficulty (`\"easy\"`, `\"medium\"`, `\"hard\"`, or `None` for all) |\n\n## Requirements\n\n- Python >= 3.10\n- Browserbase account with API credentials\n- OpenAI API key (for agent model, judge, and Stagehand)\n","encoding":"utf-8","truncated":false,"total_bytes":3473},"status":null}