{"data":{"kind":"file","path":"README.md","version_id":"g4nhol1lmyiahlob6x5gisy1","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3572,"modified_at":"2026-01-30T02:19:16.263000","content_hash":"1065475909bcfbfac334b5041197b37f121e4436d419daa0c23d2c26b65878de"},"entries":[],"content":"# wiki-search\n\n<a href=\"https://github.com/PrimeIntellect-ai/verifiers/tree/main/environments/wiki_search\">\n<img src=\"https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white\" alt=\"Source Code\">\n</a>\n\n### Overview\n- **Environment ID**: `wiki-search`\n- **Short description**: Multi-turn tool-use QA over a small Wikipedia corpus using ChromaDB and OpenAI embeddings, with judge-based scoring.\n- **Tags**: retrieval, tools, multi-turn, embeddings, judge\n\n### Datasets\n- **Primary dataset(s)**: `willcb/wiki-trivia-questions` (HF) and a Wikipedia corpus indexed in ChromaDB (from `willcb/rare-wiki-pages`, indexed at `.chroma_db` on first run)\n- **Source links**: Hugging Face Datasets, ChromaDB\n- **Split sizes**: Uses the `train` split for prompts\n\n### Task\n- **Type**: multi-turn tool use\n- **Rubric overview**: Combines the default tool rubric with a `JudgeRubric` for answer quality\n\n### How it works\n- **Corpus load**: Reads `willcb/rare-wiki-pages` (HF) into memory: `id → title`, `id → content`.\n- **Indexing**: Creates/opens a persistent Chroma collection `wiki_titles` under `.chroma_db`, using OpenAI embeddings to index page titles. Missing titles are upserted in small batches on first run.\n- **Tools**:\n  - `search_pages(query)`: Embedding search over titles; returns top 10 `{page_id, title}`.\n  - `view_sections(page_id)`: Parses the page content for Markdown-style headings (`# ...`) and returns section ids/names. Falls back to a single `full` section if no headings.\n  - `read_section(section_id)`: Returns the content slice for the requested section (or full page).\n- **Scoring**: Adds a `JudgeRubric` on top of the default tool rubric for answer quality.\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nprime eval run wiki-search\n```\n\nConfigure model and sampling:\n\n```bash\nprime eval run wiki-search \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"judge_model\": \"gpt-4.1-mini\", \"judge_base_url\": \"https://api.openai.com/v1\", \"judge_api_key_var\": \"OPENAI_API_KEY\", \"embed_model\": \"text-embedding-3-small\", \"embed_base_url\": \"https://api.openai.com/v1\", \"embed_api_key_var\": \"OPENAI_API_KEY\"}'\n```\n\nNotes:\n- The first run builds the Chroma index and may take a few minutes.\n\n### Required Environment Variables\n\n| Variable | Description |\n| -------- | ----------- |\n| `OPENAI_API_KEY` | Required for judge and embedding calls (or set custom vars via `judge_api_key_var`/`embed_api_key_var`) |\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `judge_model` | str | `\"gpt-4.1-mini\"` | Judge model name |\n| `judge_base_url` | str | `\"https://api.openai.com/v1\"` | Judge provider base URL |\n| `judge_api_key_var` | str | `\"OPENAI_API_KEY\"` | Env var for judge API key |\n| `embed_model` | str | `\"text-embedding-3-small\"` | Embedding model name |\n| `embed_base_url` | str | `\"https://api.openai.com/v1\"` | Embedding provider base URL |\n| `embed_api_key_var` | str | `\"OPENAI_API_KEY\"` | Env var for embed API key |\n| `corpus_dataset` | str | `\"willcb/rare-wiki-pages\"` | HF dataset id containing pages |\n| `corpus_split` | str | `\"train\"` | HF split to load |\n| `chroma_db_dir` | str | `.chroma_db` | Path to ChromaDB index |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| ToolRubric metrics | Tool execution success and format adherence |\n| JudgeRubric metrics | Judge-scored answer quality |\n\n### Changelog\n\n#### v0.1.22 (Jan 22, 2026)\n- Make ChromaDB initialization lazy to allow multiple env instances to run concurrently","encoding":"utf-8","truncated":false,"total_bytes":3572},"status":null}