{"data":{"kind":"file","path":"README.md","version_id":"p20n35hfl5wzk17x2eveltct","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2491,"modified_at":"2026-03-26T20:46:43.663000","content_hash":"f75168774c89f8dc94de5804781238a9b2a94e769a8074a679d304c90ddadc40"},"entries":[],"content":"# molmo-browserbase\n\nBrowserEnv training environment for improving WebVoyager-style browser agents with MolmoWeb task prompts.\n\n## Overview\n\n- Environment ID: `molmo-browserbase`\n- Reference env: `prime/webvoyager-no-anti-bot`\n- Task type: Browser tool use via `BrowserEnv` in `cua` mode by default\n- Reward: Single binary LLM-judge reward using the same yes/no completion pattern as WebVoyager\n\n## Dataset\n\n- Source: public `allenai/MolmoWeb-SyntheticTrajs`, config `task_seeded_wv`\n- Why this source: it is the Molmo slice explicitly seeded from WebVoyager-style tasks\n- Selection logic:\n  - use `instruction.low_level` first, then fall back to `mid_level` / `high_level`\n  - keep tasks on WebVoyager-overlap sites\n  - drop login / checkout / subscription-heavy tasks\n  - score tasks by prompt difficulty, not by trajectory length\n- Fetch mode:\n  - read the public parquet shards directly\n  - project only `sample_id`, `instruction`, and `trajectory`\n  - stop once the requested filtered prompt pool is assembled\n\n## Quickstart\n\nInstall locally:\n\n```bash\nprime env install ./environments/molmo_browserbase\n```\n\nSmoke eval:\n\n```bash\n./.venv/bin/dotenv -f .env run -- prime eval run molmo-browserbase -n 5 -r 1 -s\n```\n\n## Environment arguments\n\n| Arg | Default | Meaning |\n| --- | --- | --- |\n| `split` | `\"train[:800]\"` | Local split name, supports `train`, `val`, `train[:N]`, `val[:N]` |\n| `eval_split` | `\"val[:200]\"` | Held-out split used by `prime eval run` by default |\n| `max_examples` | `-1` | Additional cap after split selection |\n| `eval_max_examples` | `-1` | Additional cap for eval split |\n| `dataset_pool_size` | `1000` | Minimum filtered prompt pool to assemble before train/val split |\n| `min_difficulty_score` | `0` | Prompt difficulty threshold |\n| `shuffle_seed` | `42` | Deterministic shuffle seed |\n| `train_fraction` | `0.8` | Train/val partition fraction |\n| `require_webvoyager_domains` | `true` | Keep only tasks on WebVoyager-overlap sites |\n| `mode` | `\"cua\"` | Browser control mode |\n| `max_turns` | `15` | Max interaction turns |\n| `judge_model` | `\"gpt-4o-mini\"` | Judge model for the binary reward |\n\n## Notes\n\n- This environment is intended for RL training on Molmo-derived task prompts while preserving a WebVoyager-like interaction contract.\n- It keeps the Hugging Face source public and avoids storing a baked local task dataset in the repo.\n- Online evals should point at `prime/webvoyager-no-anti-bot` to measure transfer to the true target benchmark.\n","encoding":"utf-8","truncated":false,"total_bytes":2491},"status":null}