{"data":{"kind":"file","path":"README.md","version_id":"dzh8qf9582jktgf57llmd0s9","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2393,"modified_at":"2025-09-07T22:24:06.072000","content_hash":"b23d05892834f1ce69a6bdc29feeb727c977bd0b5805321039533420b6f8c844"},"entries":[],"content":"# reddit-user-grouping\n\n### Overview\n- **Environment ID**: `reddit-user-grouping`\n- **Short description**: Group comments in a single Reddit comment thread by author. Usernames are redacted\n\n### Datasets\n- **Primary dataset(s)**: cosmicoptima/IFhXR5QAHNW9 - Reddit posts with comment threads and known user mappings\n- **Source links**: [HuggingFace Dataset](https://huggingface.co/datasets/cosmicoptima/IFhXR5QAHNW9)\n- **Preprocessing**: Filtered to examples ≤2076 tokens to fit model context window. This should be manually adjusted as necessary\n\n### Task\n- **Type**: single-turn\n- **Parser**: `UserDiffParser` - extracts grouped comment indices from answer inside &lt;answer&gt; tag\n- **Rubric overview**: F1 scoring between predicted and actual user groupings, with power=4 scaling\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval reddit-user-grouping\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval reddit-user-grouping   -m gpt-4.1-mini   -n 20 -r 3 -t 1024 -T 0.7   -a '{\"key\": \"value\"}'  # env-specific args as JSON\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n- Reports are written under `./environments/vf_reddit_user_grouping/reports/` and auto-embedded below.\n\n### Environment Arguments\nStandard verifiers environment arguments supported:\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `max_examples` | int | `-1` | Limit on dataset size (use -1 for all) |\n\n### Metrics\nThe environment uses a sophisticated reward function with multiple validation steps:\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | F1-weighted group matching score with power=4 scaling (0.0-1.0) |\n| `format_reward` | Binary reward for properly formatted XML response (0.0 or 1.0) |\n\n**Reward calculation details:**\n- Validates all comments are assigned exactly once (no duplicates/missing)\n- Computes F1 score between each actual user group and best-matching predicted group\n- Weights each group's contribution by its size (comment count / total comments)\n- Applies power=4 scaling: `reward = weighted_f1_score^4`\n\n## Evaluation Reports\n\n<!-- Do not edit below this line. Content is auto-generated. -->\n<!-- vf:begin:reports -->\n<p>No reports found. Run <code>uv run vf-eval reddit-user-grouping -a '{\"key\": \"value\"}'</code> to generate one.</p>\n<!-- vf:end:reports -->","encoding":"utf-8","truncated":false,"total_bytes":2393},"status":null}