{"data":{"kind":"file","path":"README.md","version_id":"m8gmmkxv08of0d181rc0s37x","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3168,"modified_at":"2026-02-18T04:29:31.500000","content_hash":"59d979ecff99ead8e0dda7c76050bf10b74416ff9c3c9785c1d2f8cbe723dfb4"},"entries":[],"content":"# Browser Navigation Challenge\n\nRL environment for training a small open-weights model to solve [Brett Adcock's 30-step browser navigation challenge](https://serene-frangipane-7fd25b.netlify.app/) in under 5 minutes.\n\n> *\"Solve all 30 challenges on this website in under 5 minutes and I'll offer you $500k/year in cash plus several million in equity\"*\n> — [@adcock_brett](https://x.com/adcock_brett/status/2018417226895028414), Feb 2 2026 ([1M+ views](https://x.com/adcock_brett/status/2018417226895028414))\n\n## The problem\n\nThe challenge site has 30 steps, each a different browser puzzle — overlapping popups that block each other, codes hidden in DOM attributes, decoy buttons, radio forms inside scrollable modals, and more. You need an agent that can solve all 30 in under 5 minutes.\n\nThe obvious approaches don't work:\n\n- **Big frontier models (Opus 4.5, GPT-5.2)** are smart enough but way too slow. They top out at step 5/30 within the turn budget, and inference latency alone blows past the 5-minute mark at scale.\n- **Small models (Qwen3-30B-A3B, 3B active params)** are fast but can't reason through multi-step UI puzzles out of the box. The instruct variant scores 0/30.\n- **CUA / screenshot-based approaches** add vision latency to every turn. When you need hundreds of actions in 5 minutes, you can't afford to send screenshots back and forth.\n\nThis environment enables post-training a small, fast model to work directly on DOM accessibility trees — no screenshots, just text. The model gets a numbered list of interactive elements with visibility markers, z-index stacking info, and parent context, then acts via tool calls.\n\nBuilt with [verifiers](https://github.com/PrimeIntellect-ai/verifiers) and [Prime Intellect's hosted RL](https://app.primeintellect.ai/dashboard/training).\n\n## Tools (9)\n\n| Tool | What it does |\n|------|-------------|\n| `get_page_state` | URL, numbered interactive elements with `[visible]`/`[obscured]` markers, z-index, parent context, condensed page text |\n| `click(ref)` | Click an element |\n| `type_text(ref, text)` | Type into an input |\n| `select_option(ref, value)` | Select dropdown/radio option |\n| `hover(ref)` | Hover to reveal tooltips |\n| `scroll(direction, amount, ref?)` | Scroll page or a specific container/modal |\n| `keypress(keys)` | Press keys (Enter, Escape, etc.) |\n| `navigate(url)` | Go to a URL (step URLs blocked to prevent reward hacking) |\n| `inspect_element(ref)` | Full DOM details — all attributes, computed styles, innerHTML, parent chain |\n\n## Quick start\n\n```bash\npip install -e .\nplaywright install chromium\n\n# eval\nprime eval run browser-nav-challenge -m openai/gpt-4.1-mini -n 1 -r 1 -t 2048\n\n# eval with more turns\nprime eval run browser-nav-challenge -m anthropic/claude-opus-4.5 -n 1 -r 1 -a '{\"max_turns\": 300}'\n\n# watch it in a browser\nprime eval run browser-nav-challenge -a '{\"headless\": false}' -m openai/gpt-4.1-mini -n 1 -r 1\n\n# train\nprime rl run configs/train-fast.toml\n```\n\n## Training\n\nUses Prime Intellect's hosted RL. Current config (`configs/train-fast.toml`):\n\n- Qwen3-30B-A3B-Instruct, 1000 steps, batch=16, rollouts=4\n- max_turns=300, lr=1e-4, oversampling=2x\n\n","encoding":"utf-8","truncated":false,"total_bytes":3168},"status":null}