{"data":{"kind":"file","path":"README.md","version_id":"wn7m7pjh58guiuv9d18kfxyp","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3127,"modified_at":"2025-11-23T22:25:11.034000","content_hash":"11537ac23ad7975aed563c21752abf3dea3b126ec2ba158f911d3955f14acf5b"},"entries":[],"content":"# traveling-salesman\n\nSingle-turn TSP routing environment for verifiers / Prime-RL. Each example is a small TSP instance (4–7 cities) specified by coordinates. The agent must return a tour as space-separated city indices starting/ending at city 0. Reward is `optimal_distance / tour_distance` (1.0 = optimal, 0 for invalid).\n\n### Overview\n- **Environment ID**: `traveling-salesman`\n- **Short description**: Generate TSP tours for randomly sampled small graphs.\n- **Tags**: `tsp`, `rl`, `graphs`, `routing`, `eval`\n\n### Data splits\n- Train: 48 synthetic instances (by default)\n- Eval: 16 synthetic instances (by default)\n- Deterministic sampling via `seed` env arg; each row stores coords, distance matrix, optimal tour/length.\n\n### Task\n- **Type**: single-turn chat (completion also works)\n- **Parser**: default verifiers `Parser`\n- **Rubric**: parses the route, checks feasibility, computes tour length, reward = `optimal/tour` (clipped to [0,1]).\n\n### Output format\n- Return a tour as space-separated city indices, starting/ending at city 0 (e.g., `0 2 3 1 0`).\n- No extra text or units; only the sequence (one line).\n- The parser enforces:\n  - Starts/ends at start city (0)\n  - Visits every city exactly once\n  - No out-of-range indices\n  - No empty outputs\n\n### Quickstart\nEvaluate with defaults:\n```bash\nuv run vf-eval traveling-salesman\n```\n\nChange model/sampling and override env args:\n```bash\nuv run vf-eval traveling-salesman \\\n  -m gpt-4o-mini \\\n  -n 20 -r 3 \\\n  -a '{\"train_examples\": 64, \"eval_examples\": 32, \"min_cities\": 4, \"max_cities\": 7, \"seed\": 42}'\n```\n\nSampling defaults baked into rollout:\n- `response_format={\"type\": \"text\"}`\n- `temperature=0`\n- `max_tokens=128`\n- Parser will use the first line containing numbers; invalid/empty outputs get -1.\n- You can add `-S '{\"stop\":[\"\\\\n\",\".\",\",\"]}'` to vf-eval to further trim verbosity if a model is chatty.\n\n### Environment arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `train_examples` | int | `48` | Synthetic train rows to generate |\n| `eval_examples` | int | `16` | Synthetic eval rows to generate |\n| `min_cities` | int | `4` | Minimum cities per instance |\n| `max_cities` | int | `7` | Maximum cities per instance |\n| `seed` | int | `13` | RNG seed for reproducible instances |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `tsp_reward` | Main scalar reward (1.0 optimal, 0 invalid) |\n| `tour_length` | Length of returned tour |\n| `optimal_length` | Optimal tour length for the instance |\n| `gap` | `tour_length - optimal_length` |\n| `feasible` | 1 if the tour is valid/visits all cities once; else 0 |\n\n### Notes on scoring\n- Invalid or malformed routes get `tsp_reward = -1.0`.\n- Feasible but suboptimal routes get a fractional reward: `optimal_distance / tour_distance` (clipped to [0,1]).\n- Optimal route yields `tsp_reward = 1.0`.\n\n### Outputs directory\nAn `outputs/` directory is packaged with the environment (contains a README and .gitkeep) to support automated evaluators that expect a writable outputs path during integration tests. You can also drop rollout artifacts there if needed.\n","encoding":"utf-8","truncated":false,"total_bytes":3127},"status":null}