{"data":{"kind":"file","path":"README.md","version_id":"tla7eb0nz8wv4aemnrsnh2iv","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3747,"modified_at":"2026-02-02T06:56:31.772000","content_hash":"86140054d86c01823382e0b0964ad5f123bdfa865745ebada1426907cc26663c"},"entries":[],"content":"# WikiNav-CC: Frozen Corridor Benchmark\r\n\r\nA reproducible, zero-network environment for testing RLM capabilities on graph traversal tasks.\r\n\r\n## How It Works\r\n\r\nInstead of querying Wikipedia live, this environment uses **pre-computed corridors**.\r\nFor each task (Start → Target), we pre-scrape:\r\n\r\n1. **The Spine** — the optimal path pages with full summaries and ALL original links\r\n2. **Hydrated Distractors** — sampled 1-hop neighbors with their real content\r\n3. **Dead Ends** — links visible but not hydrated, returning navigation failures\r\n\r\nThe agent sees **realistic branching factors** (100-500 links per page) but can only\r\nsuccessfully navigate to hydrated pages. Failed navigation attempts cost steps.\r\n\r\n## Quick Start\r\n\r\n### 1. Install\r\n\r\n```bash\r\npip install -e .\r\n```\r\n\r\n### 2. Generate Dataset\r\n\r\nYou must generate the benchmark file first (or download a pre-built one):\r\n\r\n```bash\r\n# Install## Data Handling\r\n\r\nThe environment uses a pre-computed corridor dataset (`corridors.parquet`).\r\n1. **Local File:** If `corridors.parquet` exists in the working directory, it is used.\r\n2. **Hugging Face:** If not found locally, the environment automatically downloads the dataset from Hugging Face (`ascl1u/wiki-nav`) and caches it.\r\n3. **Generate:** You can generate a new dataset using:\r\n   ```bash\r\n   python generate_corridor.py --tasks 50 --output corridors.parquet\r\n   ```\r\n\r\n### 3. Run Environment\r\n\r\n```python\r\nfrom wiki_nav import load_environment\r\n\r\nenv = load_environment(dataset_path=\"corridors.parquet\")\r\n```\r\n\r\n## Tools\r\n\r\n| Tool | Description |\r\n|------|-------------|\r\n| `scan_neighbors(filter, start_index=0)` | Find links matching regex. Use `start_index` for pagination. |\r\n| `move_to(page)` | Navigate to new page (wipes observation). Fails if page not hydrated. |\r\n| `go_back()` | Return to previous page (costs a step) |\r\n| `scratchpad_write(text)` | Append to persistent notes (survives moves) |\r\n| `read_infobox()` | Get structured page metadata |\r\n| `finish(answer)` | Declare victory when target is reached |\r\n\r\n## Navigation Mechanics\r\n\r\n- **Observation Wipe**: `move_to` erases the previous page's content, forcing scratchpad use\r\n- **Dead Ends**: Attempting to navigate to an unhydrated page returns `NAVIGATION FAILED` and costs a step\r\n- **Backtracking**: `go_back()` returns to the previous page when stuck\r\n- **Pagination**: `scan_neighbors` shows 50 links at a time. Use `start_index` to see more.\r\n\r\n## Reward Function\r\n\r\nUses **potential-based reward shaping** to prevent loitering:\r\n\r\n```\r\nReward = R_goal + R_progress - C_compute\r\n```\r\n\r\n| Component | Value | Purpose |\r\n|-----------|-------|---------|\r\n| R_goal | +10 | Reaching target page |\r\n| R_progress | Δ similarity | Rewards moving closer, penalizes moving away |\r\n| C_compute | -0.5/step, -0.001/token | Forces efficiency |\r\n\r\n## Dataset Format\r\n\r\nThe `corridors.parquet` file contains:\r\n\r\n| Column | Description |\r\n|--------|-------------|\r\n| `task_id` | Unique identifier |\r\n| `start_page` | Starting Wikipedia page |\r\n| `target_page` | Goal Wikipedia page |\r\n| `spine` | List of pages forming the solution path |\r\n| `pages_json` | JSON dict of all hydrated pages + corridor metadata |\r\n\r\n## Design Goals\r\n\r\n- **Context Economy**: Frozen text = identical token counts across runs\r\n- **Realistic Branching**: Full Wikipedia link lists force semantic reasoning\r\n- **Dead-End Handling**: Unhydrated links test exploration/exploitation tradeoffs\r\n- **Non-Local Inference**: Zero network flakes = failures are reasoning failures\r\n\r\n## Dependencies\r\n\r\n- `pandas` / `pyarrow` — Dataset loading\r\n- `sentence-transformers` — Embedding similarity for heuristic rewards\r\n- `httpx` (optional) — Only needed for generating new corridors\r\n","encoding":"utf-8","truncated":false,"total_bytes":3747},"status":null}