{"data":{"kind":"file","path":"README.md","version_id":"ybbqg8neprxyl05di8a8mxwp","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1623,"modified_at":"2026-02-12T20:40:34.712000","content_hash":"82e258c2cec37415a62ca81cf28ec8b8bdbd59c5aad82ba5820392ccfd7a4c4a"},"entries":[],"content":"# TaxCalcBench RLM\r\n\r\nEvaluates RLM configurations on the TaxCalcBench dataset (51 hand-crafted U.S. tax return calculation problems for Tax Year 2024).\r\n\r\nDataset: https://github.com/column-tax/tax-calc-bench\r\n\r\n## Dataset\r\n\r\nEach problem provides taxpayer data as JSON (W-2s, filing status, income, credits, deductions). The model must calculate and produce a complete Form 1040 tax return. Ground truth is IRS-compliant MeF XML.\r\n\r\nData is cloned from GitHub on first run and cached locally.\r\n\r\n## Task\r\n\r\nGiven taxpayer input data, use the Python REPL to calculate a complete Form 1040 tax return. The model outputs line-by-line amounts which are compared against 19 key IRS form lines from the ground truth XML.\r\n\r\n## Metrics\r\n\r\n| Metric | Description |\r\n|---|---|\r\n| strict_reward | correct_lines / 19 with exact dollar match required |\r\n| lenient_reward | correct_lines / 19 with plus or minus 5 dollar tolerance per line |\r\n\r\n## Quickstart\r\n\r\n```bash\r\n# Evaluate\r\nprime eval run . -m PrimeIntellect/INTELLECT-3\r\n\r\n# Train (default: recursion_depth=1)\r\nprime rl run configs/taxcalcbench.toml\r\n```\r\n\r\n## Environment Arguments\r\n\r\n| Argument | Default | Description |\r\n|---|---|---|\r\n| recursion_depth | 1 | 0=REPL only, 1=standard RLM, N and above=recursive |\r\n| nested_max_iterations | 10 | Max REPL turns for nested sub-LLMs |\r\n| max_iterations | 20 | Max REPL turns for root model |\r\n| max_output_length | 8192 | Max code execution output length |\r\n| repl_language | python | REPL language |\r\n| root_prompt_verbosity | heavy | System prompt verbosity |\r\n| cache_dir | None | Directory to cache TaxCalcBench data |\r\n","encoding":"utf-8","truncated":false,"total_bytes":1623},"status":null}