{"data":{"kind":"file","path":"README.md","version_id":"p7qrxo5fdfoc2q1ba0n6fjgc","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2091,"modified_at":"2026-02-16T04:41:39.363000","content_hash":"8d5eddd595505ae70e97e1c0ae39172708e31b00f8116ffc359753a8d849ebe2"},"entries":[],"content":"# grid-nav\n\nNavigate procedurally generated ASCII mazes using text-only reasoning.\n\n## Task\n\nThe agent receives an ASCII grid maze and must navigate from start (S) to goal (G) by issuing directional moves (UP, DOWN, LEFT, RIGHT). The agent sees the full maze and its current position (@) every turn.\n\n```\n#######\n#S#...#\n#.#.#.#\n#...#.#\n#.###.#\n#....G#\n#######\n```\n\n## Research Context\n\n**Physical AI mapping:** Robot navigation, warehouse AMR pathfinding, drone planning.\n\n**Core question:** Can RL train an LLM to perform systematic search (BFS-like reasoning) in spatial environments? Base models fail at grid navigation beyond small sizes because they pattern-match rather than search. RL should close this gap.\n\n## Datasets\n\nMazes are procedurally generated using DFS recursive backtracker with configurable extra passage removal to create loops. Start and goal are placed at maximum BFS distance. Difficulty is controlled by maze size:\n\n| Tier | Logical Size | Display Grid | Typical Optimal Steps |\n|------|-------------|-------------|----------------------|\n| Easy | 3x3 | 7x7 | 6-10 |\n| Medium | 4x4 | 9x9 | 10-18 |\n| Hard | 6x6 | 13x13 | 18-30 |\n\n## Quickstart\n\n```bash\n# Install\nprime env install grid-nav\n\n# Evaluate\nprime eval run grid-nav -m gpt-4.1-nano\n\n# Train\nprime train run configs/lab/grid-nav.toml\n```\n\n## Environment Arguments\n\n| Argument | Default | Description |\n|----------|---------|-------------|\n| num_examples | 500 | Total maze tasks to generate |\n| maze_sizes | \"3,4,6\" | Comma-separated logical maze sizes (display = 2n+1) |\n| extra_passage_frac | 0.15 | Fraction of walls removed to create loops (0 = perfect maze) |\n| seed | 42 | Random seed for reproducible generation |\n| max_turns | 100 | Max turns per rollout |\n\n## Metrics\n\n| Metric | Weight | Description |\n|--------|--------|-------------|\n| reached_goal | 1.0 | 1.0 if agent reached goal |\n| path_efficiency | 0.5 | optimal_steps / actual_steps (when goal reached) |\n| progress_metric | 0.0 | Manhattan distance progress toward goal |\n| invalid_move_rate | 0.0 | Fraction of moves that hit walls |\n","encoding":"utf-8","truncated":false,"total_bytes":2091},"status":null}