{"data":{"kind":"file","path":"README.md","version_id":"yjiftc86booj3p4fowm0696y","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":5717,"modified_at":"2026-02-02T22:30:53.532000","content_hash":"7ad680db54343043e5dbfd2c80ee81292210308ad5e6534d4d4383554ff385a1"},"entries":[],"content":"# BM25 Search Environment\n\n### Overview\n- **Environment ID**: `bm25-search`\n- **Short description**: Environment for training agentic multi-hop search using BM25 lexical retrieval\n- **Tags**: `bm25`, `agentic-search`, `multi-hop`, `tool-use`, `single-turn`\n\n### Datasets\n- **Primary dataset(s)**: [ragent_qa_pairs](https://huggingface.co/datasets/diegi97/ragent_qa_pairs), [ragent_data_sources](https://huggingface.co/datasets/diegi97/ragent_data_sources)\n- **Source links**: [GitHub](https://github.com/Diegi97/ragent)\n- **Split sizes**: 2,374 train / 229 eval QA pairs across 9 data sources\n\n### Task\n- **Type**: Multi-turn tool use\n- **Parser**: XML-based tool parser\n- **Rubric overview**: LLM judge (0.8) + format compliance (0.2)\n\n---\n\n## About RAGent\n\nRAGent is a research project focused on training language models to perform **agentic search** through reinforcement learning. The goal is to train models that can autonomously perform multi-hop searches across document collections, breaking down complex information needs into sequences of search queries, reading relevant documents, and synthesizing answers.\n\nThis environment focuses on **lexical search using BM25**. The simplicity of BM25 makes it easy to deploy for training while still giving models the tools and instructions needed to perform successful multi-hop searches. Models trained here are intended to:\n\n- Formulate effective search queries\n- Navigate search results and read relevant documents\n- Perform iterative refinement when initial queries don't return useful results\n- Synthesize information from multiple documents into coherent answers\n\n## Datasets (Detailed)\n\n### QA Pairs\n\n| Split | Total | gitlab_handbook | common_pile_peps | rust_rfcs | rfc_all | owasp_cheatsheets | nampdn_ai_devdocs_io | diegi97_mythology | diegi97_marine_biology | diegi97_space_exploration |\n|-------|-------|-----------------|------------------|-----------|---------|-------------------|----------------------|-------------------|------------------------|---------------------------|\n| Train | 2,374 | 361 | 355 | 355 | 318 | 225 | 213 | 183 | 183 | 181 |\n| Eval | 229 | 39 | 35 | 34 | 31 | 22 | 18 | 16 | 17 | 17 |\n\n### Data Sources\n\nThe document corpora include:\n\n- **GitLab Handbook** - Internal documentation from GitLab\n- **IETF RFCs** (`rfc_all`) - Internet Engineering Task Force Request for Comments\n- **Python Enhancement Proposals** (`common_pile_peps`) - PEPs from the Python community\n- **DevDocs.io** (`nampdn_ai_devdocs_io`) - Developer documentation aggregator\n- **OWASP Cheat Sheets** - Security best practices documentation\n- **Rust RFCs** - Rust language design proposals\n- **Domain-specific corpora** - Wikipedia extractions on marine biology, space exploration, and mythology\n\n### QA Generation Pipeline\n\nQA pairs are generated using the **Explorer Agent Pipeline** (`ragent_core/pipelines/explorer_agent/explorer_agent.py`), inspired by the agentic search synthesis methodology described in [DeepSeek-v3.2](https://arxiv.org/abs/2512.02556).\n\nThe pipeline generates QA pairs of varying complexity through the following stages:\n\n1. **Concept Extraction**: Documents are sampled from the corpus and key concepts/entities are extracted using an LLM.\n\n2. **Breadth & Depth Assignment**: Each concept is assigned `breadth` and `depth` parameters based on its occurrence frequency in the corpus:\n   - **Breadth** (1-3): Controls how many related sub-questions are generated for a concept\n   - **Depth** (1-3): Controls how many reasoning hops are required to answer questions\n\n3. **Breadth Agent**: For each concept, generates multiple surface-level questions exploring different facets of the entity. The agent has access to retrieval tools to ground questions in actual document content.\n\n4. **Depth Agent**: Takes each breadth question and iteratively deepens it, creating multi-hop questions that require chaining information across documents.\n\n5. **Synthesis Agent**: Combines the breadth and depth QA pairs for each concept into a final, synthesized question-answer pair that captures the full complexity.\n\nThis approach ensures questions require genuine multi-hop reasoning rather than simple lookup, making them effective for training agentic search behavior.\n\n## Task (Detailed)\n\n### Available Tools\n\n| Tool | Description |\n|------|-------------|\n| `search_tool(queries)` | Execute BM25 search queries against the document corpus and returns the most relevant chunks |\n| `read_tool(doc_ids)` | Read full content of documents by their IDs |\n| `text_scan_tool(pattern, ...)` | Scan documents for specific text patterns |\n\n### Reward Functions\n\n| Reward Function | Weight | Description |\n|-----------------|--------|-------------|\n| `judge_reward` | 0.8 | LLM-based judge evaluating answer correctness and completeness |\n| `format_reward` | 0.2 | Compliance with expected output format |\n\n---\n\n## Quickstart\n\nRun an evaluation with default settings:\n\n```bash\nprime eval run bm25-search\n```\n\nConfigure model and sampling:\n\n```bash\nprime eval run bm25-search \\\n  -m gpt-4.1-mini \\\n  -n 20 \\\n  -r 3 \\\n  -t 1024 \\\n  -T 0.7\n```\n\nThis environment does not require custom arguments, it uses the standard verifiers environment parameters.\n\n## Metrics\n\n| Metric | Meaning |\n|--------|---------|\n| `reward` | Main scalar reward (weighted sum of judge + format) |\n| `judge_reward` | LLM judge score for answer quality |\n| `format_reward` | Format compliance score |\n\n---\n\n## Active Development\n\nThis environment is under **active development**. Planned improvements include:\n\n- Expanding the number and diversity of data sources\n- Increasing the quantity of QA pairs\n- Improving synthetic pipeline quality for higher quality multi-hop questions","encoding":"utf-8","truncated":false,"total_bytes":5717},"status":null}