{"data":{"kind":"file","path":"README.md","version_id":"itag4f13xxk8wjl4eg7e3yjv","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":4000,"modified_at":"2026-01-19T06:57:21.906000","content_hash":"807d89be728a7172021df5787f397a94835eeeca9e58152f9105842ca98bb2a2"},"entries":[],"content":"# ExcaliGym-Geometry\r\n\r\n## Overview\r\n\r\n- **Environment ID**: `excaligym-geometry`\r\n- **Short description**: Visual RL environment for training agents to draw geometric shapes on an Excalidraw canvas with deterministic verification\r\n- **Tags**: browser, multimodal, canvas, tool-use, verifiable-reward\r\n\r\n## Datasets\r\n\r\n- **Primary dataset**: Built-in tasks across 7 categories (basic shapes, line orientations, multiple shapes, containment, line configurations, complex compositions)\r\n- **Source**: Generated programmatically from `tasks.py`\r\n- **Split sizes**: 35 tasks (configurable via `num_examples`)\r\n\r\n## Task\r\n\r\n- **Type**: multi-turn, tool use, visual\r\n- **Parser**: Standard tool parser with Excalidraw API\r\n- **Rubric overview**: 100% deterministic verification via `excalidrawAPI.getSceneElements()` - checks element types, dimensions, counts, orientations, and spatial relationships\r\n\r\n## Quickstart\r\n\r\nInstall Playwright browser:\r\n\r\n```bash\r\nplaywright install chromium\r\n```\r\n\r\nInstall environment:\r\n\r\n```bash\r\nprime env install excaligym-geometry\r\n```\r\n\r\nRun an evaluation:\r\n\r\n```bash\r\nprime eval run excaligym-geometry -m gpt-4o -n 35\r\n```\r\n\r\nConfigure with environment arguments:\r\n\r\n```bash\r\nprime eval run excaligym-geometry -m gpt-4o -a '{\"tolerance\": 0.15, \"max_turns\": 15}'\r\n```\r\n\r\n## Environment Arguments\r\n\r\n| Arg | Type | Default | Description |\r\n|-----|------|---------|-------------|\r\n| tolerance | float | 0.1 | How close width/height ratio must be for squares/circles (0.1 = 90%) |\r\n| max_turns | int | 10 | Maximum tool calls allowed per task |\r\n| headless | bool | True | Run browser in headless mode |\r\n\r\n## Metrics\r\n\r\n| Metric | Meaning |\r\n|--------|---------|\r\n| reward | Binary reward (1.0 if task completed correctly, 0.0 otherwise) |\r\n| shape_correctness | Same as reward - verifies element types, counts, dimensions, relationships |\r\n| num_turns | Number of conversation turns used |\r\n| total_tool_calls | Total tools called during episode |\r\n| select_tool_calls | Number of tool selection calls |\r\n| draw_shape_calls | Number of drawing operations |\r\n| submit_calls | Number of submission attempts |\r\n\r\n## How It Works\r\n\r\n1. **Environment Loader** (`excaligym_geometry.py`):\r\n   - Loads 35 tasks from `tasks.py` with prompts and verification schemas\r\n   - Creates Verifiers-compatible dataset\r\n   - Spawns headless Chromium browser with Excalidraw\r\n\r\n2. **Browser Controller** (`browser_controller.py`):\r\n   - Playwright wrapper for keyboard shortcuts and mouse events\r\n   - Serves `excalidraw.html` via local HTTP server\r\n   - Exposes `select_tool()`, `drag()`, `screenshot()`, `get_scene_elements()`\r\n\r\n3. **Deterministic Rubric** (`rubric.py`):\r\n   - Verifies shapes via `excalidrawAPI.getSceneElements()` scene graph\r\n   - Checks: element types, dimension ratios, counts, orientations, containment, nesting\r\n   - No VLM required - 100% reproducible rewards\r\n\r\n## Tools\r\n\r\n| Tool | Description |\r\n|------|-------------|\r\n| `select_tool(name)` | Select drawing tool: rectangle, ellipse, diamond, arrow, line, freedraw |\r\n| `draw_shape(x1, y1, x2, y2)` | Draw by dragging from start to end point (canvas: 1920x1080) |\r\n| `submit()` | Submit drawing for evaluation |\r\n\r\n## Task Categories\r\n\r\n| Category | Count | Difficulty | Examples |\r\n|----------|-------|------------|----------|\r\n| Basic Shapes | 7 | Easy | rectangle, square, circle, ellipse, diamond, arrow, line |\r\n| Line Orientations | 4 | Easy | horizontal/vertical lines and arrows |\r\n| Multiple Identical | 5 | Medium | 2 circles, 3 rectangles, 4 diamonds |\r\n| Multiple Different | 5 | Medium | rectangle + circle, square + diamond |\r\n| Containment | 5 | Hard | circle inside rectangle, nested squares |\r\n| Lines & Angles | 4 | Hard | parallel lines, perpendicular lines, triangle |\r\n| Complex Compositions | 5 | Hard | concentric circles, nested squares, multi-element |\r\n\r\n## Local Testing\r\n\r\n```bash\r\npython environments/excaligym_geometry/test_integration.py\r\n```\r\n\r\nExpected: `ALL TESTS PASSED!`\r\n","encoding":"utf-8","truncated":false,"total_bytes":4000},"status":null}