{"data":{"kind":"file","path":"README.md","version_id":"zrdbaum5elbemokdrqjyvi4m","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1450,"modified_at":"2025-08-23T21:58:50.558000","content_hash":"cac429be76d454ee318726457e7cd788ca203214cedbac05918e24900c3bd649"},"entries":[],"content":"# operating-hours\r\n\r\n### Overview\r\n\r\n- **Environment ID**: `operating-hours`\r\n- **Short description**: The model must complete a large CSV file by parsing business's self-reported hours of operation.\r\n- **Tags**: nlp, tabular\r\n\r\n### Datasets\r\n\r\n- **Primary dataset(s)**: Custom dataset built in; one sample.\r\n- **Split sizes**: N/A\r\n\r\n### Task\r\n\r\n- **Type**: single-turn\r\n- **Parser**: XMLParser\r\n- **Rubric overview**: Symmetric Damerau-Levensthein is used to evaluate the final table against the expected output.\r\n\r\n### Quickstart\r\n\r\nRun an evaluation with default settings:\r\n\r\n```bash\r\nuv run vf-eval operating-hours\r\n```\r\n\r\nConfigure model and sampling:\r\n\r\n```bash\r\nuv run vf-eval operating-hours   -m gpt-4.1-mini   -n 1 -r 3 -t 1024 -T 0.7\r\n```\r\n\r\nNotes:\r\n\r\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\r\n- Reports are written under `./environments/operating_hours/reports/` and auto-embedded below.\r\n\r\n### Metrics\r\n\r\nSummarize key metrics your rubric emits and how they�re interpreted.\r\n\r\n| Metric | Meaning |\r\n| ------ | ------- |\r\n| `reward` | Main scalar reward (weighted sum of criteria) |\r\n| `accuracy` | Exact match on target answer |\r\n\r\n## Evaluation Reports\r\n\r\n<!-- Do not edit below this line. Content is auto-generated. -->\r\n<!-- vf:begin:reports -->\r\n<p>No reports found. Run <code>uv run vf-eval operating-hours -a '{\"key\": \"value\"}'</code> to generate one.</p>\r\n<!-- vf:end:reports -->\r\n","encoding":"utf-8","truncated":false,"total_bytes":1450},"status":null}