{"data":{"kind":"file","path":"README.md","version_id":"ayx467sr1ysdogiquqbx3ygj","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1531,"modified_at":"2025-10-12T01:10:18.462000","content_hash":"0e4ef42d0f4cc96de4b6a7af51b4ca3cfaa6f7994dbdfb84a24fdfd9d5dac50b"},"entries":[],"content":"# GoodSirMath8k\n\n\n### Overview\n- **Environment ID**: `GoodSirMath8k`\n- **Short description**: Just GSM8K with the added reward based on how shakespearean the model is.\n- **Tags**: math, gsm8k, single-turn, think, old-english\n\n\n### Datasets\n- **Primary dataset(s)**: gsm8k train (train) and test (eval) via load_example_dataset\n- **Source links**: Uses the example loader in verifiers.utils.data_utils\n- **Split sizes**: Configurable via args; defaults to full train/test\n\n### Task\n- **Type**: single-turn\n- **Parser**: `XMLParser` with the tags `ponder` and `andswaru`\n- **Rubric overview**: Exact match on parsed boxed answer, format check and classical english style reward.\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval GoodSirMath8k\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval GoodSirMath8k   -m gpt-4.1-mini   -n 20 -r 3 -t 1024 -T 0.7 \n```\n\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_train_examples` | int | `-1` | Limit training set size (-1 for all) |\n| `num_eval_examples` | int | `-1` | Limit eval set size (use -1 for all) |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `correct_answer` | 1.0 if parsed boxed answer equals target, else 0.0 |\n| `format_reward` | Adherence to `<ponder>` and `<andswaru>` tags  |\n| `old_speak_reward` | Adherence to old style english, the DistilBERT-base-uncased finetuned provided in  `notaphoenix/shakespeare_classifier_model` is used to evaluate that|\n\n","encoding":"utf-8","truncated":false,"total_bytes":1531},"status":null}