{"data":{"kind":"file","path":"README.md","version_id":"cyjzv9j8203aonvclvdh6k0r","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1615,"modified_at":"2025-11-12T17:29:19.286000","content_hash":"f2578a411281bcf414d385caa2a22777342448a1881720ec5e09de2b08e8a851"},"entries":[],"content":"# dojo\n\n### Overview\n\n- **Environment ID**: `chakra-labs/dojo`\n- **Short description**: Multi-turn agent evaluation using Dojo infrastructure for task execution\n- **Tags**: multi-turn, tool-use, benchmark, agent-evaluation, multimodal, dojo, web\n\n### Datasets\n\n- **Primary dataset(s)**: `dojo-mini-bench` - Collection of multi-turn tasks including LinkedIn, Linear, Gmail\n- **Source links**: [Dojo Documentation](https://docs.trydojo.ai)\n\n### Task\n\n- **Type**: multi-turn, tool use\n- **Parser**: OpenAI-compatible tool calling format\n- **Rubric overview**: Task-specific verification logic\n\n### Quickstart\n[Get your Dojo API KEY](https://trydojo.ai/) \n\nRun an evaluation with default settings:\n\n```bash\nDOJO_API_KEY=\"your_key\" uv run vf-eval dojo\n```\n\nConfigure model and sampling:\n\n```bash\nDOJO_API_KEY=\"your_key\" uv run vf-eval dojo -m gpt-4.1-mini -n 10 -r 1\n```\n\nIf you want to run with browserbase\n```bash\nDOJO_API_KEY=\"your_key\" BROWSERBASE_PROJECT_ID=\"project_id\" BROWSERBASE_API_KEY=\"your_browserbase_key\" DOJO_ENGINE=browserbase BROWSERBASE_CONCURRENT_LIMIT=1  uv run vf-eval dojo -m gpt-4.1-mini -n 10 -r 1\n```\n\nNotes:\n\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Metrics\n\nTask-specific verification that returns a fraction between 0.0 and 1.0. Failure means 0.0, partial sucess is <= 1 and sucess is 1.0\n\n| Metric   | Meaning                   |\n| -------- | ------------------------- |\n| `reward` | Score between 0.0 and 1.0 |\n\n---\n\nFor more information, see the [Dojo Verifiers Integration Documentation](https://docs.trydojo.ai/integrations/verifiers).\n","encoding":"utf-8","truncated":false,"total_bytes":1615},"status":null}