{"data":{"kind":"file","path":"README.md","version_id":"ctp5x2wvef78p4ic6hoe96lv","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3207,"modified_at":"2025-09-21T01:19:02.229000","content_hash":"aba6c098ddcda82fc4ebd979b52c0bb9d68a67eeab196fc8708b51116744e036"},"entries":[],"content":"# crossword\n\n### Overview\n- **Environment ID**: `crossword`\n- **Short description**: Gymnasium-style RL environment where an agent fills a crossword grid letter-by-letter given clues.\n- **Tags**: RL, games, grid, text, crossword, gymnasium\n\n\n### Datasets\n- **Primary dataset(s)**: NYT-style crossword clue–answer pairs (CSV)\n- **Local file**: `nytcrosswords.csv` (expected columns: `Date,Word,Clue`)\n- **Fallback**: If the CSV is unavailable or unsuitable, the loader falls back to `puzzles.json`, then to a tiny built-in sample.\n- **Split sizes**: N/A (episodic RL; puzzles are sampled per episode)\n\nNotes on loading:\n- The default loader prefers `nytcrosswords.csv` found [here](https://github.com/Michaelgathara/RL/tree/main/crossword/environments/crossword) alongside `crossword.py` and constructs lightweight 1×N across-only puzzles from alphabetic answers (length 3–8 by default).\n- Robust CSV decoding is used (tries UTF-8/UTF-8-SIG/CP1252/Latin-1; then UTF-8 with replacement) to tolerate varied encodings.\n\n### Task\n- **Type**: multi-turn\n- **Parser**: N/A (standard Gymnasium-like RL loop)\n- **Episode**: One puzzle per episode; the agent writes letters into grid cells.\n\n#### Observation\nThe observation is a dictionary:\n- `grid`: 2D list of strings, where each cell is `#` (block), `-` (empty), or an uppercase letter.\n- `clues`: `{ \"across\": {num -> {start, length, clue}}, \"down\": { ... } }` mirroring the puzzle specification.\n\nFor CSV-derived puzzles the grid is 1×N (across-only). JSON/fallback puzzles may have larger grids and down clues.\n\n#### Action\nTuple `(row, column, letter)` where:\n- `row`, `column`: zero-based indices\n- `letter`: either an integer 0–25 (A–Z) or a single-character string `\"A\"..\"Z\"`\n\n#### Rewards (per step)\n- `+1.0` for a correct letter placed in an empty, non-block cell\n- `-0.5` for an incorrect letter or attempting to overwrite a filled cell\n- `-1.0` for an out-of-bounds action (terminates the episode)\n- `-1.0` for attempting to write on a block (`#`) cell (episode continues)\n- `-0.01` step penalty each action\n- `+5.0` bonus when the puzzle is completely solved\n\nThe episode ends when the puzzle is solved, an out-of-bounds action occurs, or `max_steps` is reached (if set).\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval crossword\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval crossword \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"max_steps\": 400}'\n```\n\nLocal smoke test (Python):\n\n```python\nfrom crossword.environments.crossword.crossword import load_environment\n\nenv = load_environment(max_steps=200)\nobs = env.reset()\nprint(obs[\"grid\"])  # show initial grid\nenv.render()         # pretty print grid\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `max_steps` | int | `null` | Optional cap on episode length. Episode also ends on solve. |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Cumulative episode reward (step rewards + solve bonus). |\n| `letters_correct` | Count of correct letters placed (if tracked by evaluator). |\n| `completion` | Whether the grid was fully solved (boolean). |\n\n","encoding":"utf-8","truncated":false,"total_bytes":3207},"status":null}