{"data":{"kind":"file","path":"README.md","version_id":"axqvasprmycw70zfw3plmm3x","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":6855,"modified_at":"2026-03-27T16:03:36.164000","content_hash":"01a225534e1ad260426ce93d484d0b68944f1ca5893f1c3f587042d4cf6edacb"},"entries":[],"content":"# context-1\n\nmultiturn retrieval-subagent environment inspired by the chroma context-1 report.\n\n### overview\n- **environment id**: `context-1`\n- **type**: `statefultoolenv` / multiturn tool-use\n- **goal**: search a corpus over multiple turns, manage a bounded retained-context budget with pruning, and submit a final answer plus ranked supporting chunk ids\n\n### data loading model\nthis env supports three sources, in this order:\n\n1. existing generated local task files under `generated_tasks_dir`\n2. a hugging face task dataset via `task_dataset_name`\n3. automatic bootstrap from the `context-1-data-gen` repo\n\nfor local or hf tasks, rows are expected to contain chroma-style fields:\n- `question`\n- `truth`\n- `supporting_items`\n- either `items_and_contents` or a companion corpus dataset passed via `corpus_dataset_name`\n\nthe loader also accepts rows where tasks are nested under a top-level `tasks` field and skips rows with `passed_verification: false`.\n\nif no local tasks exist and no hf task dataset is given, the env will:\n- reuse an existing local clone of the datagen repo if present\n- otherwise clone the repo\n- reuse existing generated task files if already present\n- otherwise require the domain’s api keys before attempting generation\n\nif the required env vars are missing, the env fails fast and does not run generation.\n\ndomain-specific generation notes:\n- `email`: can use the repo’s default download path, or `local_input_path` to point at `epstein_only.json`\n- `web`: uses the repo seed file by default, or `seed_input_path` for a custom seeds file\n- `finance`: requires `identity=\"name email@example.com\"` for sec edgar\n- `legal`: requires `seed_input_path` for the patent application numbers file\n\n### tools\n- `search_corpus(query, k)`\n- `grep_corpus(pattern, k)`\n- `read_document(doc_id)`\n- `prune_chunks(doc_ids)`\n- `submit_answer(answer, supporting_doc_ids)`\n\n### reward rubric\nthe main reward mirrors the report’s retrieval-first setup:\n- recall-heavy `f_beta` on the final retained evidence with `beta=4`\n- trajectory recall over all relevant documents encountered during search\n- `answer_found` bonus when final supporting docs include answer-bearing evidence\n- penalty for repeated consecutive prune calls beyond a streak of 3\n- penalty for very long trajectories\n\nbecause `vf-eval` evaluates the acting model directly rather than a downstream answering model, the env adds a small exact-answer bonus to the final reward.\n\n### quickstart\nexample with a chroma-style task dataset on hugging face:\n\n```bash\nuv run vf-eval -s context-1 -a '{\n  \"task_dataset_name\": \"your-org/context-1-tasks\",\n  \"task_dataset_split\": \"train\"\n}'\n```\n\nexample with a separate corpus dataset:\n\n```bash\nuv run vf-eval -s context-1 -a '{\n  \"task_dataset_name\": \"your-org/context-1-tasks\",\n  \"corpus_dataset_name\": \"your-org/context-1-corpus\",\n  \"corpus_id_field\": \"id\",\n  \"corpus_text_field\": \"text\"\n}'\n```\n\nexample using the datagen repo flow for the email domain:\n\n```bash\nuv run vf-eval -s context-1 -a '{\n  \"task_dataset_name\": null,\n  \"domain\": \"email\",\n  \"auto_prepare_data\": true\n}'\n```\n\n### environment arguments\n| arg | type | default | description |\n| --- | ---- | ------- | ----------- |\n| `task_dataset_name` | `str \\| null` | `null` | hugging face task dataset repo id; if null, use local/generated tasks flow |\n| `task_dataset_split` | `str` | `\"train\"` | split to load from the task dataset |\n| `task_dataset_config` | `str \\| null` | `null` | optional hf config name for the task dataset |\n| `corpus_dataset_name` | `str \\| null` | `null` | optional separate hf corpus dataset repo id |\n| `corpus_dataset_split` | `str` | `\"train\"` | split to load from the corpus dataset |\n| `corpus_dataset_config` | `str \\| null` | `null` | optional hf config name for the corpus dataset |\n| `corpus_id_field` | `str` | `\"id\"` | corpus document id field |\n| `corpus_text_field` | `str` | `\"text\"` | corpus document text field |\n| `chunk_size_chars` | `int` | `1200` | chunk size for each source document |\n| `chunk_overlap_chars` | `int` | `200` | overlap used when chunking source documents |\n| `token_budget_chars` | `int` | `12000` | retained-context budget before pruning is required |\n| `max_search_results` | `int` | `8` | maximum results returned from search or grep |\n| `max_read_chars` | `int` | `2400` | max characters returned by `read_document` |\n| `max_turns` | `int` | `24` | multiturn rollout cap |\n| `max_examples` | `int` | `100` | max task records to materialize |\n| `max_corpus_docs` | `int` | `-1` | optional corpus truncation for debugging |\n| `domain` | `str` | `\"email\"` | context-1 domain: `email`, `web`, `finance`, or `legal` |\n| `data_gen_repo_url` | `str` | `https://github.com/chroma-core/context-1-data-gen.git` | datagen repo clone url |\n| `data_gen_repo_dir` | `str \\| null` | `null` | local clone directory override |\n| `generated_tasks_dir` | `str \\| null` | `null` | directory containing generated json/jsonl tasks |\n| `collection_name` | `str \\| null` | `null` | chroma collection name used if generation is needed |\n| `num_explorations` | `int` | `10` | datagen exploration count |\n| `max_workers` | `int` | `8` | datagen worker count |\n| `max_iterations` | `int` | `20` | datagen per-exploration iteration cap |\n| `seed_input_path` | `str \\| null` | `null` | custom seeds file for `web`, `finance`, or `legal` generation |\n| `local_input_path` | `str \\| null` | `null` | local `epstein_only.json` override for `email` generation |\n| `identity` | `str \\| null` | `null` | required sec edgar identity string for `finance` generation |\n| `extension_rounds` | `int` | `0` | web/sec extension rounds when generation is needed |\n| `auto_prepare_data` | `bool` | `true` | clone/generate tasks automatically when no local data exists |\n| `system_prompt` | `str \\| null` | `null` | override the default retrieval-agent prompt |\n\n### notes\n- generated tasks are cached under `~/.cache/prime-environments/context_1/` by default.\n- the env will not regenerate tasks if json/jsonl task files already exist in `generated_tasks_dir`.\n- the env will not reclone the datagen repo if `data_gen_repo_dir` already exists as a git repo.\n- required env vars vary by domain and follow the upstream repo:\n  - `email`: `CHROMA_API_KEY`, `CHROMA_DATABASE`, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`\n  - `web`: `SERPER_API_KEY`, `JINA_API_KEY`, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `CHROMA_API_KEY`, `CHROMA_DATABASE`\n  - `finance`: `CHROMA_API_KEY`, `CHROMA_DATABASE`, `OPENAI_API_KEY`, `BASETEN_API_KEY`, `ANTHROPIC_API_KEY`\n  - `legal`: `USPTO_API_KEY`, `SEARCH_API_KEY`, `DATALAB_API_KEY`, `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `CHROMA_API_KEY`, `CHROMA_DATABASE`\n- this implementation mirrors the report’s search/prune loop and reward shape, but uses a lightweight local lexical retriever rather than chroma cloud infrastructure.\n","encoding":"utf-8","truncated":false,"total_bytes":6855},"status":null}