{"data":{"kind":"file","path":"README.md","version_id":"suatrehrovvewflxomcuuwnc","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2908,"modified_at":"2026-02-23T15:24:47.561000","content_hash":"634443730b8ea734de129187cddeddb2a315ac3c5a7426457063e2003144ef48"},"entries":[],"content":"# TOMAGPT\r\n\r\n### Overview\r\n- **Environment ID**: `smolclaims/TOMAGPT`\r\n- **Version**: 0.3.0\r\n- **Description**: Train small models to classify legal hearsay by decomposing it into three sub-elements: (1) an assertion, (2) made out of court, (3) offered to prove the truth of the matter asserted (TOMA).\r\n- **Tags**: law, single-turn, hearsay, legalbench, GRPO\r\n\r\n### Dataset\r\n- **Source**: [DoodDood/HearsayGRPOTrainingData2](https://huggingface.co/datasets/DoodDood/HearsayGRPOTrainingData2)\r\n- **Size**: 3,140 rows\r\n- **Columns**: `prompt`, `is_hearsay`, `an_assertion`, `made_out_of_court`, `is_for_toma`\r\n\r\n### Task\r\n- **Type**: Single-turn classification\r\n- **Output format**: Semicolon-separated key-value pairs\r\n  ```\r\n  is_hearsay: YES/NO; an_assertion: YES/NO; made_out_of_court: YES/NO; is_for_toma: YES/NO\r\n  ```\r\n- **Rule**: `is_hearsay = YES` if and only if all three sub-elements are YES.\r\n\r\n### Rubric\r\nSix reward functions with weights `[1.5, 1.0, 2.0, 1.0, 1.0, 1.0]`:\r\n\r\n| # | Function | Weight | Scoring | Description |\r\n|---|----------|--------|---------|-------------|\r\n| 1 | assertion_reward | 1.5 | +1 / -1 | Checks `an_assertion` against ground truth |\r\n| 2 | out_of_court_reward | 1.0 | +1 / -1 | Checks `made_out_of_court` (hearsay cases only) |\r\n| 3 | toma_reward | 2.0 | +1 / -1 | Checks `is_for_toma` (hearsay cases only) |\r\n| 4 | consistency_penalty | 1.0 | 0 / -0.5 | Penalizes when `is_hearsay` contradicts sub-elements |\r\n| 5 | format_compliance | 1.0 | 0 to -1.0 | -0.25 per missing output field |\r\n| 6 | constraint_penalty | 1.0 | 0 / -0.5 | Penalizes `assertion=NO` with downstream fields `YES` |\r\n\r\n### Quickstart\r\n```bash\r\nuv run vf-eval TOMAGPT\r\n```\r\n\r\n### Environment Arguments\r\n\r\n| Arg | Type | Default | Description |\r\n|-----|------|---------|-------------|\r\n| `max_examples` | int | `-1` | Limit dataset size (-1 = all 3,140 rows). Shuffles with seed=42 when set. |\r\n\r\n### Results\r\n\r\nTrained with GRPO on **Qwen3-4B-Instruct-2507** (Run 3: 500 steps, LR=1e-5, batch=128, rollouts=16).\r\n\r\nEvaluated on [LegalBench hearsay test set](https://huggingface.co/datasets/nguha/legalbench) (94 examples):\r\n\r\n| Metric | Base | TOMAGPT | Delta |\r\n|--------|------|---------|-------|\r\n| Overall accuracy | 71.3% | 77.7% | +6.4% |\r\n| TOMA sub-element | 78.0% | 95.1% | +17.1% |\r\n| Assertion sub-element | 90.2% | 95.1% | +4.9% |\r\n| Non-verbal hearsay | 33.3% | 83.3% | +50.0% |\r\n| Standard hearsay | 93.1% | 100.0% | +6.9% |\r\n| Non-assertive conduct | 89.5% | 100.0% | +10.5% |\r\n\r\nModel available on HuggingFace: [DoodDood/TOMAGPT](https://huggingface.co/DoodDood/TOMAGPT)\r\n\r\n### Metrics\r\n\r\n| Metric | Meaning |\r\n|--------|---------|\r\n| `reward` | Weighted sum of all 6 reward functions |\r\n| `assertion_reward` | Accuracy on the assertion sub-element |\r\n| `out_of_court_reward` | Accuracy on the out-of-court sub-element |\r\n| `toma_reward` | Accuracy on the TOMA sub-element |\r\n","encoding":"utf-8","truncated":false,"total_bytes":2908},"status":null}