{"data":{"kind":"file","path":"README.md","version_id":"cdm7dfukp8t1tomoz6zlkaav","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":7006,"modified_at":"2026-05-19T22:32:03.368000","content_hash":"3e3433f014a8ad4f6bb2807436832d80dae330af19a2b9945e8ad983bbd83e91"},"entries":[],"content":"# langchain-deep-agents-wikispeedia\n\nLangChain deep-agents trained on Wikispeedia navigation through a v1 `Taskset`/`Harness`.\n\n### Overview\n- **Environment ID**: `langchain-deep-agents-wikispeedia`\n- **Short description**: Multi-turn navigation through the Wikispeedia article graph with LangChain `create_deep_agent` (todos, virtual files, sub-agents) plus two task tools (`click_link`, `go_back`).\n- **Tags**: v1, taskset, harness, multi-turn, tool-use, langchain, deep-agents, wikispeedia, navigation\n\n### Datasets\n- **Source**: SNAP Wikispeedia ([snap.stanford.edu/data/wikispeedia](https://snap.stanford.edu/data/wikispeedia.html)) — 4,604 Wikipedia articles, ~120K hyperlinks, precomputed shortest-path distance matrix, plus aggregate human-play stats.\n- **Splits**: 50K train pairs / 1K eval pairs, sampled evenly across shortest-path buckets within `min_path_length..max_path_length`. Train and eval target articles are **disjoint** (no target ever crosses splits). Deterministic via `split_seed`.\n\n### Task\n- **Type**: `vf.Env` with a Wikispeedia `vf.Taskset` and LangChain Deep Agents `vf.Harness`\n- **Goal**: navigate from a source Wikipedia article to a target article using only on-page hyperlinks.\n- **Boundary**: the taskset owns the Wikispeedia graph, `click_link`/`go_back` tools, rewards, and metrics; the harness only adapts the resolved taskset tools into LangChain Deep Agents.\n- **Output format**: agent calls `click_link(article)` until the target is reached. The `TARGET REACHED` tool message tells the agent to stop and reply briefly.\n- **Scoring**: binary `reached_target` reward plus zero-weight path/tool metrics. `path_efficiency` becomes a weighted reward when `efficiency_weight > 0`.\n\n### Quickstart\n\nInstall the env locally:\n```bash\nprime env install ./environments/langchain_deep_agents_wikispeedia\n```\n\nRun an evaluation with default settings:\n```bash\nprime eval run langchain-deep-agents-wikispeedia\n```\n\nConfigure model and difficulty band:\n```bash\nprime eval run langchain-deep-agents-wikispeedia \\\n  -m openai/gpt-4.1-mini \\\n  -n 20 -r 3 -t 4096 -T 0.7 \\\n  -a '{\"config\": {\"taskset\": {\"min_path_length\": 4, \"max_path_length\": 6, \"max_turns\": 40}}}'\n```\n\nDisable `go_back` (force planning over backtracking):\n```bash\nprime eval run langchain-deep-agents-wikispeedia \\\n  -m openai/gpt-4.1-mini -n 20 -r 3 \\\n  -a '{\"config\": {\"taskset\": {\"allow_go_back\": false}}}'\n```\n\nNotes:\n- The first run downloads ~5MB of SNAP data into `~/.cache/wikispeedia` (override with `cache_dir`).\n- Set `OPENAI_API_KEY` (or whatever the policy endpoint expects) for the agent.\n\n### LangSmith tracing\n\nDeep Agents uses LangGraph/LangChain native LangSmith tracing. Enable it with\nthe standard LangSmith environment variables before running the eval:\n\n```bash\nexport LANGSMITH_TRACING=true\nexport LANGSMITH_API_KEY=...\nexport LANGSMITH_PROJECT=verifiers-wikispeedia\nprime eval run langchain-deep-agents-wikispeedia\n```\n\n### Taskset Config\n\n| Field | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `cache_dir` | str \\| None | `None` | SNAP cache directory (defaults to `~/.cache/wikispeedia`). |\n| `min_path_length` | int | `3` | Drop pairs with shortest path shorter than this. |\n| `max_path_length` | int | `6` | Drop pairs with shortest path longer than this (only ~470 pairs exist at dist=8, 5 at dist=9). |\n| `train_size` | int | `50000` | Number of train pairs to sample. |\n| `eval_size` | int | `1000` | Number of eval pairs to sample. |\n| `eval_target_fraction` | float | `0.1` | Fraction of articles reserved as eval-only targets. |\n| `split_seed` | int | `0` | Seed for deterministic train/eval split. |\n| `links_only` | bool | `False` | Render articles as just the link menu (ablation: tests whether the agent navigates from semantic content or link names alone). |\n| `allow_go_back` | bool | `True` | Expose the `go_back` tool. |\n| `max_turns` | int | `50` | Per-rollout LangGraph recursion limit stored on each task row. This is not a literal model-turn count; Deep Agents may spend multiple graph steps per model/tool cycle. |\n| `efficiency_weight` | float | `0.0` | If `> 0`, mix `path_efficiency` into the reward at this weight (a near-optimal route earns up to `1 + efficiency_weight`; a wanderer that reaches the target still earns `1`). Default `0.0` keeps reward as pure binary reachability. |\n| `stratify_path_length` | bool | `True` | Take equal counts at each shortest-path bucket inside `[min_path_length, max_path_length]`, capped at the smallest non-empty bucket. The SNAP graph's natural distribution heavily skews toward the lower end of any band (4-6 → 83% sp=4); without stratification the policy over-trains on the trivial floor. Set `False` to recover the natural distribution. |\n\n### Harness Config\n\n| Field | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `max_turns` | int | `50` | LangGraph recursion limit fallback when runtime config does not provide one. This is not directly correlated with model turns. |\n| `timeout_seconds` | float | `1200.0` | Per-rollout wall-clock cap. |\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | weighted sum (defaults to `reached_target`) |\n| `reached_target` | 1.0 if the agent navigated to the target (always a weighted reward; weight 1.0) |\n| `path_efficiency` | `shortest_path / actual_path_length` if reached, else 0. Zero-weight by default; becomes a weighted reward at `efficiency_weight` when that arg is `> 0` |\n| `path_length` | number of edges traversed (zero-weight) |\n| `shortest_path` | precomputed shortest path length for the pair (zero-weight) |\n| `agent_timeout` | 1.0 if rollout hit `timeout_seconds` |\n| `calls_click_link`, `calls_go_back` | navigation tool counts (zero-weight) |\n| `calls_write_todos`, `calls_write_file`, `calls_read_file`, `calls_ls`, `calls_edit_file`, `calls_grep`, `calls_task` | deep-agent tool counts (zero-weight) |\n| `total_tool_calls`, `assistant_turns` | trajectory shape (zero-weight) |\n| `invalid_link_rate` | fraction of `click_link` calls that named a non-existent link (hallucination canary, zero-weight) |\n\n### Notes\n- Reward is `reached_target` only — exact, deterministic, no judge required. The deep-agent structural metrics are zero-weight so they show up in eval tables without shaping the policy.\n- `min_path_length=4, max_path_length=6` is the calibrated RL difficulty band for Nemotron-30B-A3B-BF16 — predicted ~0.3-0.4 reach rate, the useful-gradient zone. The 3-5 band landed at 0.61 mean reach (dominated by the trivial sp=3 floor where the deep-agent scaffolding is decorative); the 5-7 band landed at 0.13 with 27% timeouts.\n- This is the primary LangChain Deep Agents example because tool use is load-bearing: the model cannot reach the target without invoking `click_link`.\n- `max_turns` is passed through to LangGraph as `recursion_limit`. It caps graph execution steps, not model calls, so the observed number of model/tool cycles can be lower than the configured value.\n","encoding":"utf-8","truncated":false,"total_bytes":7006},"status":null}