{"data":{"kind":"file","path":"README.md","version_id":"nemanpw92fmvt3mzbh2nx07b","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1738,"modified_at":"2025-08-29T15:15:33.386000","content_hash":"7ce0f8bac965012284bb6a975e1b2279ae9896257ddf87cf2c92a2d716f98159"},"entries":[],"content":"# mrcr\n\n### Overview\n\n- **Environment ID**: `mrcr`\n- **Short description**: Long-context multi-needle retrieval; model must prepend a random string and return the i-th requested writing; graded by similarity.\n- **Tags**: mrcr, long-context, single-turn, chat\n\n### Dataset\n\n- **Primary dataset**: `openai/mrcr` OpenAI MRCR is a long context benchmark where needles are hidden among distractors; the model must retrieve the specified instance and prepend a required hash.\n- **Source links**: <https://huggingface.co/datasets/openai/mrcr>\n- **Split sizes**: train split ~2.4k rows\n\n### Task\n\n- **Type**: single-turn\n- **Parser**: none (completion is coerced to text)\n- **Rubric**:\n  - `ratio`: requires exact `random_string_to_prepend` prefix; then uses difflib SequenceMatcher ratio on the stripped bodies\n\n### Quickstart\n\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval mrcr\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval mrcr -m gpt-4.1-mini -n 20 -r 3 -t 1024 -T 0.7 -a '{\"key\": \"value\"}'\n```\n\nNotes:\n\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `split` | str | `\"train\"` | HF split to load |\n| `num_examples` | int | `-1` | Limit dataset size (>0 to enable) |\n\n### Metrics\n\n- **ratio**: 0.0 if the required prefix is missing; otherwise `difflib.SequenceMatcher(None, pred, truth).ratio()` on prefix-stripped strings\n\n## Evaluation Reports\n\n<!-- Do not edit below this line. Content is auto-generated. -->\n<!-- vf:begin:reports -->\n<p>No reports found. Run <code>uv run vf-eval mrcr -a '{\"key\": \"value\"}'</code> to generate one.</p>\n<!-- vf:end:reports -->\n","encoding":"utf-8","truncated":false,"total_bytes":1738},"status":null}