{"data":{"kind":"file","path":"README.md","version_id":"tys3a0i666ob19x03ssyasan","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":5188,"modified_at":"2026-05-18T10:12:20.178000","content_hash":"dda2522deae15f4d38d10a0157a205d1ae50307579a6b020dc81f3f07e60b8f3"},"entries":[],"content":"# openfarm-zoo-arousal-eval\n\n<p>\n  <a href=\"https://github.com/ob1-s/happy-farm/tree/main/environments/openfarm_zoo_arousal_eval\">\n    <img align=\"left\" hspace=\"4\" src=\"https://img.shields.io/badge/GitHub-181717?style=for-the-badge&logo=github&logoColor=white\" alt=\"GitHub\">\n  </a>\n  <a href=\"https://app.primeintellect.ai/dashboard/environments/openfarm/openfarm-zoo-arousal-eval\">\n    <img align=\"left\" hspace=\"4\" src=\"https://img.shields.io/badge/Prime%20Intellect-Envs%20Hub-181717?style=for-the-badge&labelColor=181717&logoColor=white\" alt=\"Prime Intellect Environments Hub\">\n  </a>\n  <a href=\"https://huggingface.co/datasets/oliveirabruno01/openfarm-zoo-valence-arousal\">\n    <img align=\"left\" hspace=\"4\" src=\"https://img.shields.io/badge/Hugging%20Face-Dataset-181717?style=for-the-badge&logo=huggingface&logoColor=yellow&labelColor=181717\" alt=\"Hugging Face Dataset\">\n  </a>\n</p>\n<br clear=\"all\" />\n\n## Overview\n\nOpenFARM Zoo Arousal is a tiny visual affect eval built from expert-labeled\nzoo-animal video stimuli. It tests whether multimodal models can classify\nexpert-coded arousal and/or valence from visible behavior, posture, movement,\nand facial/body cues.\n\n- **Environment ID**: `openfarm-zoo-arousal-eval`\n- **Type**: single-turn classification / EnvGroup when multiple tasks are selected\n- **Default modality**: `filmstrip` image\n- **Other modalities**: `video`, `frames`, `text`\n- **Output format**: XML answer, with optional explanation\n- **Primary metric**: exact normalized answer reward\n\nThe headline benchmark is visual-only. The prepared dataset uses muted clips\nbecause the source study frames the task as visual recognition from mute video\nclips. Source audio was audited during prep and should not be treated as the\nscientific signal for this env.\n\n## Dataset\n\n- **Primary dataset**: [`oliveirabruno01/openfarm-zoo-valence-arousal`](https://huggingface.co/datasets/oliveirabruno01/openfarm-zoo-valence-arousal)\n- **Source**: Figshare collection [`10.6084/m9.figshare.c.7807931`](https://doi.org/10.6084/m9.figshare.c.7807931)\n- **Article**: Hiisivuori et al. (2025), *Human recognition of emotional valence and arousal of zoo animals*, [`10.1038/s41598-025-28646-7`](https://doi.org/10.1038/s41598-025-28646-7)\n- **Split sizes**: 15 `test` examples\n- **Species**: Barbary macaque, Siberian tiger, Turkmenian markhor\n\n## Tasks\n\n| Task | Label |\n| --- | --- |\n| `arousal` | `low` / `high` |\n| `valence` | `negative` / `neutral` / `positive` |\n| `valence_arousal` | `negative_high` / `neutral_low` / `positive_low` / `positive_high` |\n\n## Quickstart\n\n```bash\nprime eval run openfarm-zoo-arousal-eval \\\n  -a '{\"task\": \"arousal\", \"modality\": \"filmstrip\", \"max_examples\": 5}'\n```\n\nRun valence and arousal as an EnvGroup:\n\n```bash\nprime eval run openfarm-zoo-arousal-eval \\\n  -a '{\"task\": [\"arousal\", \"valence\"], \"modality\": \"filmstrip\"}'\n```\n\nGive the model nine separate sampled image inputs instead of one montage:\n\n```bash\nprime eval run openfarm-zoo-arousal-eval \\\n  -a '{\"task\": \"valence_arousal\", \"modality\": \"frames\"}'\n```\n\nUse the muted prepared video directly on endpoints that support video input:\n\n```bash\nprime eval run openfarm-zoo-arousal-eval \\\n  -a '{\"task\": \"valence_arousal\", \"modality\": \"video\"}'\n```\n\n## Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | --- | --- | --- |\n| `task` | str/list | `\"arousal\"` | `arousal`, `valence`, `valence_arousal`, a list, or `\"all\"`. |\n| `dataset_id` | str | `\"oliveirabruno01/openfarm-zoo-valence-arousal\"` | Hugging Face dataset ID. |\n| `dataset_revision` | str/null | `null` | Optional dataset revision. |\n| `test_split` | str | `\"test\"` | Eval split. This dataset is eval-only. |\n| `max_examples` | int | `-1` | Optional subsampling budget. |\n| `seed` | int | `42` | Shuffle seed before subsampling. |\n| `modality` | str | `\"filmstrip\"` | `filmstrip`, `video`, `frames`, or `text`. |\n| `include_species_context` | bool | `false` | Adds species as prompt text. Kept off by default for a purer visual task. |\n| `require_explanation` | bool | `false` | Requires an `<explanation>` field before `<answer>`. |\n| `format_reward_weight` | float | `0.0` | Optional XML format reward weight. |\n\n## Dataset Notes\n\n- The dataset is intentionally eval-only; there is no meaningful train split.\n- The public rows use opaque media filenames and omit source filenames, clip\n  codes, segment timestamps, expert notes, pre-rendered messages, task ids, and\n  OpenFARM-specific row IDs.\n- `video` mode sends the muted prepared MP4 clip from embedded HF `Video` bytes.\n  It is endpoint-dependent and intentionally has no local file fallback.\n- `filmstrip` mode sends the 3x3 montage as one image. It is the most portable\n  vision path across current multimodal endpoints.\n- `frames` mode splits that same 3x3 filmstrip into nine separate image inputs,\n  ordered left-to-right and top-to-bottom. This is useful for models such as\n  Gemma 4 that can attend to multiple input images.\n\n## Metrics\n\n| Metric | Meaning |\n| --- | --- |\n| `accuracy_reward` | 1.0 when the parsed answer matches the target label after normalization. |\n| `format_reward` | Optional XML-format reward when `format_reward_weight > 0`. |\n","encoding":"utf-8","truncated":false,"total_bytes":5188},"status":null}