{"data":{"kind":"file","path":"README.md","version_id":"b76qyefkw5r70dfcshbce62m","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3058,"modified_at":"2026-06-09T04:54:16.648000","content_hash":"bacf9daf80376bae34fed87b9e9d8146f6b2a7aeff24831067b6da88da6a41b6"},"entries":[],"content":"# fmt2fmt\n\n### Overview\n- **Environment ID**: `fmt2fmt`\n- **Short description**: Exact table format conversion through a `write_file` tool.\n- **Tags**: tool-use, tables, format-conversion, train, eval\n\n### Task\n- **Type**: single-attempt tool use\n- **Formats**: CSV, Markdown pipe tables, JSONL records\n- **Goal**: Convert or copy a generated table into the requested target format while preserving every header, row, column, value, ordering choice, empty cell, and escape.\n- **Model-facing tool**: `write_file(path: str, content: str) -> str`\n\nThe environment does not expose a converter tool. Deterministic serializers and strict parsers are internal benchmark machinery used to generate the oracle target and score the model's written output.\n\n### Quickstart\nRun a local evaluation with defaults:\n\n```bash\nprime eval run fmt2fmt --env-dir-path ./environments -n 5 -r 1\n```\n\nConfigure model and sampling:\n\n```bash\nprime eval run fmt2fmt \\\n  --env-dir-path ./environments \\\n  -m openai/gpt-4.1-mini \\\n  -n 20 -r 1 -t 4096 -T 0 \\\n  -a '{\"num_eval_examples\": 20, \"min_rows\": 2, \"max_rows\": 16, \"min_cols\": 2, \"max_cols\": 8}'\n```\n\n### Environment Arguments\nPass arguments through `-a` / `--env-args` as JSON.\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `seed` | int | `0` | Global deterministic task seed. |\n| `num_train_examples` | int | `20` | Number of generated train rows. |\n| `num_eval_examples` | int | `20` | Number of generated eval rows. |\n| `min_rows` | int | `2` | Minimum table body rows. |\n| `max_rows` | int | `8` | Maximum table body rows. |\n| `min_cols` | int | `2` | Minimum table columns. |\n| `max_cols` | int | `5` | Maximum table columns. |\n| `formats` | list[str] | `['csv', 'markdown', 'jsonl']` | Formats included in source/target pair sampling. |\n| `include_identity` | bool | `true` | Include copy-control tasks like CSV to CSV. |\n\nTask generation is split-aware and index-aware, so changing train size does not change eval examples.\n\n### Metrics\n| Metric | Meaning |\n| ------ | ------- |\n| `exact_table_match` | Main reward: strict parse plus exact truth equality. |\n| `parse_success` | Target artifact parsed under the target format contract. |\n| `header_match` | Parsed header exactly matches the reference header. |\n| `row_count_match` | Parsed row count matches the reference row count. |\n| `column_count_match` | Parsed column count matches the reference column count. |\n| `cell_accuracy` | Fraction of reference cells exactly matched at the same row/column position. |\n| `extra_data_detected` | Parser/scorer detected invalid extra structure. |\n| `truncation_detected` | Rollout was marked truncated before scoring. |\n| `source_chars` | Source artifact character count. |\n| `target_chars` | Oracle target artifact character count. |\n| `cells` | Reference body cell count. |\n\nEach rollout stores a `fmt2fmt_score` object in state with the dense metrics plus a primary `failure_class` such as `no tool call`, `wrong path`, `parse error`, `missing rows`, `extra columns`, or `value mismatch`.\n","encoding":"utf-8","truncated":false,"total_bytes":3058},"status":null}