{"data":{"kind":"file","path":"README.md","version_id":"oa5myn84kd86uobn81diynkd","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":4673,"modified_at":"2026-05-18T10:11:40.420000","content_hash":"50338761de608e7ff4a24bb28021af6b80e6cfee48d17e7c118c69fde261d28d"},"entries":[],"content":"# openfarm-horse-grimace\n\n<p>\n  <a href=\"https://github.com/ob1-s/happy-farm/tree/main/environments/openfarm_horse_grimace\">\n    <img align=\"left\" hspace=\"4\" src=\"https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white\" alt=\"GitHub\">\n  </a>\n  <a href=\"https://app.primeintellect.ai/dashboard/environments/openfarm/openfarm-horse-grimace\">\n    <img align=\"left\" hspace=\"4\" src=\"https://img.shields.io/badge/Prime%20Intellect-Envs%20Hub-181717?style=for-the-badge&labelColor=181717&logoColor=white\" alt=\"Prime Intellect Environments Hub\">\n  </a>\n  <a href=\"https://huggingface.co/datasets/oliveirabruno01/openfarm-horse-grimace-region\">\n    <img align=\"left\" hspace=\"4\" src=\"https://img.shields.io/badge/Hugging%20Face-Dataset-181717?style=for-the-badge&logo=huggingface&logoColor=yellow&labelColor=181717\" alt=\"Hugging Face Dataset\">\n  </a>\n</p>\n<br clear=\"all\" />\n\n### Overview\n- **Environment ID**: `openfarm-horse-grimace`\n- **Short description**: Horse Grimace Scale-style facial-region scoring from equine face crops.\n- **Tags**: animal-welfare, horse, equine, facial-expression, grimace-scale, pain-assessment, vision, eval\n\n### Datasets\n- **Primary dataset**: [`oliveirabruno01/openfarm-horse-grimace-region`](https://huggingface.co/datasets/oliveirabruno01/openfarm-horse-grimace-region)\n- **Source**: Mendeley Data, \"Automatic Pain Assessment in Horses\" ([`10.17632/t8rtzcgwxm.3`](https://doi.org/10.17632/t8rtzcgwxm.3)), CC-BY-4.0.\n- **Related paper**: \"Pain assessment in horses using automatic facial expression recognition through deep learning-based modeling\" ([`10.1371/journal.pone.0258672`](https://doi.org/10.1371/journal.pone.0258672)).\n- **Default split**: `test`\n\n### Task\n- **Type**: single-turn multimodal image classification\n- **Default task**: `region_score`\n- **Supported tasks**:\n  - `region_score`: answer `0`, `1`, or `2` for an HGS-style facial-region cue.\n  - `binary_pain`: answer `0` or `1`, derived from source scores (`0` = no cue, `1/2` = cue present).\n- **Output format**: XML tags. Depending on `require_explanation`, models must output either `<answer>...</answer>` or both `<explanation>...</explanation>` and `<answer>...</answer>`.\n- **Rubric**: exact-match answer reward, optional XML format reward, and optional grounded reasoning judge against an equine grimace-scale cheatsheet.\n\n### Quickstart\n\n```bash\nprime eval run openfarm-horse-grimace\n```\n\nConfigure task and split:\n\n```bash\nprime eval run openfarm-horse-grimace \\\n  -m google/gemini-3-flash-preview \\\n  -n 20 -r 3 \\\n  -a '{\"task\": \"region_score\", \"test_split\": \"test\", \"require_explanation\": true}'\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `dataset_id` | str | `\"oliveirabruno01/openfarm-horse-grimace-region\"` | Hugging Face dataset ID. |\n| `dataset_revision` | str/null | `None` | Optional dataset revision. |\n| `test_split` | str | `\"test\"` | Split to evaluate. |\n| `task` | str | `\"region_score\"` | One of `region_score` or `binary_pain`. |\n| `max_examples` | int | `-1` | Limit examples after shuffling. |\n| `max_examples_per_task` | int/null | `None` | Compatibility alias for `max_examples`. |\n| `seed` | int | `42` | Dataset shuffle seed. |\n| `include_region_context` | bool | `true` | Include the source crop region in the prompt. |\n| `require_explanation` | bool | `true` | Require an explanation before the answer. |\n| `require_reasoning` | bool/null | `None` | Compatibility alias for `require_explanation`. |\n| `accuracy_reward_weight` | float | `1.0` | Exact-match reward weight. |\n| `judge_reward_weight` | float | `1.0` | Optional grounded-judge reward weight. |\n| `format_reward_weight` | float | `0.0` | XML format reward weight. |\n| `reward_weights` | dict/null | `None` | Optional aliases: `accuracy`, `judge`, `format` or full reward names. |\n| `judge_mode` | str | `\"none\"` | `none`, `self`, or `external`. |\n| `judge_model` | str | `\"gpt-4o-mini\"` | Model used when `judge_mode=\"external\"`. |\n| `system_prompt_override` | str/null | `None` | Replace the default system prompt. |\n| `user_prompt_override` | str/null | `None` | Replace the default user prompt template. |\n| `cheatsheet_version` | str | `\"full\"` | `full` or `short` for judge grounding. |\n| `cheatsheet_override` | str/null | `None` | Replace the default judge cheatsheet. |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Weighted scalar reward. |\n| `accuracy_reward` | Exact match after answer normalization. |\n| `format_reward` | XML format reward when enabled. |\n| `judge_reward` | Optional biological-reasoning score against the equine cheatsheet. |\n","encoding":"utf-8","truncated":false,"total_bytes":4673},"status":null}