{"data":{"kind":"file","path":"README.md","version_id":"uhm5pgco2u2je2b5q673y8xf","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2868,"modified_at":"2026-02-05T04:57:11.122000","content_hash":"38a2fefdbab4a245b523588ee58b8082e9346ed88b52e1dd8040e28cfd8a5538"},"entries":[],"content":"# tldraw\n\n## Overview\n\n- **Environment ID**: `tldraw`\n- **Short description**: Single‑turn tool‑use environment that validates tldraw actions in a real UI using Playwright.\n- **Tags**: tldraw, tool-use, ui-validation\n\n## Dataset\n\n- **Source**: Curated prompt list in `dataset.py` (`get_example_prompts`).\n- **Split sizes**: Small fixed set used for local evals.\n\n## Task\n\n- **Type**: Single‑turn tool use\n- **Parser**: JSON extraction from model output\n- **Rubric**: Parses `actions`, runs them through the validator UI, and returns `reward=1` only when validation returns no errors.\n\n## Quickstart\n\nInstall and run an eval:\n\n```bash\nprime env install --path ./environments/tldraw\nprime eval run tldraw -m openai/gpt-4.1-mini -n 1 -r 1 \\\n  -a '{\"validator_url\":\"http://127.0.0.1:5173/validator.html\",\"pool_size\":1,\"headless\":true}'\n```\n\n## Environment Arguments\n\n| Arg                | Type | Default                                  | Description                                                                  |\n| ------------------ | ---- | ---------------------------------------- | ---------------------------------------------------------------------------- |\n| `validator_url`    | str  | `\"http://localhost:5173/validator.html\"` | URL of the validator page. If localhost, the env auto‑starts the dev server. |\n| `pool_size`        | int  | `5`                                      | Playwright page pool size.                                                   |\n| `headless`         | bool | `True`                                   | Run Chromium headless.                                                       |\n| `save_screenshots` | bool | `True`                                   | Save screenshots for validation runs.                                        |\n| `screenshot_dir`   | str  | `\"outputs/screenshots\"`                  | Where screenshots are written.                                               |\n| `log_errors`       | bool | `True`                                   | Persist validation errors to JSONL.                                          |\n| `error_log_dir`    | str  | `\"outputs/errors\"`                       | Where error logs are written.                                                |\n\n## Bootstrap behavior\n\nWhen `validator_url` points to localhost, the environment will:\n\n- Install Node.js via `nvm` (Node 24)\n- Install JS dependencies in `tldraw-agent/`\n- Start the Vite dev server (serves `validator.html`)\n- Ensure Playwright Chromium is installed\n\nIf `validator_url` points to a remote host, the environment will **not** start a server; the validator page must already be reachable.\n\n## System Prompt\n\nThe environment reads a fixed prompt from:\n\n```\n./system_prompt.py\n```\n\n## Outputs\n\n- Screenshots: `outputs/screenshots/`\n- Error logs: `outputs/errors/`\n- Validator logs: `outputs/validator/validator.log`\n","encoding":"utf-8","truncated":false,"total_bytes":2868},"status":null}