{"data":{"kind":"file","path":"README.md","version_id":"nsvkzvxekuk6h4f4duwy1b3i","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3541,"modified_at":"2026-03-02T00:12:17.748000","content_hash":"48b3e9634a08d5feb021894ad9a4973b54e77338430819e91caa61b39e80c12e"},"entries":[],"content":"# Book Extractor\n\n### Overview\n- **Environment ID**: `versa/book-extractor`\n- **Short description**: Train a model to accurately extract knowledge from a reference document using strategic tool-based navigation\n- **Tags**: `tool-use`, `knowledge-extraction`, `train`, `eval`\n\n### Datasets\n- **Primary dataset**: 30 extraction questions embedded in the environment (no external downloads)\n- **Split sizes**: 30 train examples across 3 difficulty levels\n- **Source**: Distillation of *Becoming a Supple Leopard* (Kelly Starrett) — mobility coaching reference (~12K chars)\n\n### Task\n- **Type**: multi-turn tool use\n- **Output format**: Plain text responses containing key facts from the source document\n- **Rubric overview**:\n  - `key_fact_recall` (weight 0.6) — fraction of expected key facts found via case-insensitive substring match, with 60% completeness threshold (no partial credit for shallow answers). Key facts are multi-word phrases requiring source extraction.\n  - `tool_diversity_reward` (weight 0.4) — scaled reward for using multiple distinct tools (0→0.25→0.5→1.0 for 0→1→2→3+ tools)\n\n### Difficulty Levels\n\n| Level | Name | Questions | Max Turns | What It Tests |\n| --- | --- | --- | --- | --- |\n| 1 | Factual | 10 | 3 | Single lookup — one tool call to find a direct answer |\n| 2 | Structural | 14 | 4 | Cross-referencing — read tables, connect concepts across sections |\n| 3 | Applied | 6 | 6 | Multi-step synthesis — diagnose faults, design sessions, explain exceptions |\n\nUse `level` arg to train on a specific tier:\n\n```bash\nprime rl run configs/lab/book-extractor-l1.toml   # factual only\nprime rl run configs/lab/book-extractor-l3.toml   # applied only\nprime rl run configs/lab/book-extractor.toml      # all levels\n```\n\n### Tools (11)\n\n| Tool | Description |\n| --- | --- |\n| `search(query)` | Search for sections matching keywords |\n| `list_sections()` | List all section headers (table of contents) |\n| `read_section(name)` | Read full content of a named section |\n| `read_range(offset, limit)` | Read a character range from the raw document |\n| `find_table(query)` | Find markdown tables matching a query |\n| `find_definition(term)` | Find where a term is defined |\n| `get_context(query, chars)` | Get text surrounding a match |\n| `count_matches(query)` | Count occurrences of a term across sections |\n| `cross_reference(term1, term2)` | Find sections mentioning both terms |\n| `get_document_stats()` | Get document overview statistics |\n| `run_code(code)` | Execute Python with the document as `doc` variable |\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | --- | --- | --- |\n| `level` | int \\| None | `None` | Difficulty level (1=factual, 2=structural, 3=applied). None for all |\n\n### Quickstart\n\n```bash\nprime env push environments/book_extractor          # push to hub\nprime rl run configs/lab/book-extractor.toml         # launch training\nprime rl run configs/lab/book-extractor-l1.toml      # factual only\nprime rl logs <run-id> -f                            # stream logs\n```\n\n> **Note:** `prime eval run` requires paid credits. Use `prime rl` during the free beta.\n\n### Metrics\n\n| Metric | Meaning |\n| --- | --- |\n| `reward` | Weighted sum: 0.6 × key_fact_recall + 0.4 × tool_diversity_reward |\n| `key_fact_recall` | Fraction of expected key facts found (0.0 if < 60% found, else 0.6–1.0) |\n| `tool_diversity_reward` | 0.0/0.25/0.5/1.0 for 0/1/2/3+ distinct tools used |\n| `total_tool_calls` | Number of tool calls per rollout (tracked automatically) |\n","encoding":"utf-8","truncated":false,"total_bytes":3541},"status":null}