{"data":{"kind":"file","path":"README.md","version_id":"zkp6etwx14jyxw1rvmgccfxp","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":6962,"modified_at":"2026-06-05T22:08:34.426000","content_hash":"8c86eb81de0377411cd5cd66de91a1f54cc5db0ce49a67bfcb426b7c5c0b035c"},"entries":[],"content":"# tax-calculation\n\nA single-turn, verifiable (RLVR) environment for US federal individual income tax.\nThe agent reads a synthetic taxpayer scenario and computes the balance due\n(positive) or refund (negative) for one illustrative tax year.\n\n## Overview and motivation\n\nPersonal income tax is a good reinforcement-learning target because the answer is\na single number produced by a fully specified, deterministic procedure, yet\ngetting it right requires multi-step, order-sensitive reasoning over many\ninteracting rules. A model can land on a wrong-but-plausible number in several\ndistinct ways: netting the deduction in the wrong place, taxing long-term capital\ngains at ordinary rates instead of the preferential schedule, forgetting the child\ntax credit phaseout, or omitting payroll (FICA) tax. Each of those mistakes is\nexactly the kind of error a verifiable reward can catch, so the signal is dense\nand learnable without a model judge.\n\n## Task specification\n\nEach example presents a taxpayer scenario: filing status, wages, other ordinary\nincome, long-term capital gains, above-the-line deductions, itemized-deduction\ntotal (with the standard deduction shown for comparison), number of dependents,\nand federal tax withheld. The agent must output the balance due or refund as JSON.\n\nThe ground truth is computed by the generator using the same constants embedded in\nthe module, so every example is exactly verifiable. The pipeline mirrors a\nsimplified Form 1040:\n\n1. gross income = wages + other income + long-term capital gains\n2. AGI = gross income - above-the-line adjustments\n3. taxable income = max(0, AGI - max(standard deduction for the status, itemized))\n4. ordinary taxable income = max(0, taxable income - long-term capital gains)\n5. ordinary tax = marginal ordinary brackets applied to ordinary taxable income\n6. capital-gains tax = the long-term gains stack on top of ordinary taxable income\n   and are taxed at the band's preferential 0 / 15 / 20 percent rate\n7. income tax before credits = ordinary tax + capital-gains tax\n8. child tax credit = $2,000 per dependent, reduced $50 per $1,000 (or fraction) of\n   AGI over the filing-status threshold, never below zero\n9. income tax after credits = max(0, before credits - credit)\n10. FICA (employee share) = 6.2% Social Security on wages up to the wage base, plus\n    1.45% Medicare on all wages\n11. total tax = income tax after credits + FICA\n12. balance = total tax - federal tax withheld (positive = owed, negative = refund)\n\n## Domain grounding\n\nThis encodes real US federal individual-income-tax mechanics by name:\n\n- Progressive marginal ordinary-income brackets and the difference between the\n  marginal rate (the rate on the next dollar) and the effective rate (total tax\n  divided by income). Each scenario carries its computed effective rate in the\n  answer metadata for grounding.\n- The four common filing statuses: single, married filing jointly, married filing\n  separately, and head of household, each with its own brackets and standard\n  deduction.\n- The standard-versus-itemized deduction choice (the larger of the two reduces\n  taxable income).\n- Above-the-line adjustments that bridge gross income to adjusted gross income\n  (AGI), which is the figure the credit phaseout keys off.\n- The separate preferential long-term capital-gains rate schedule (0 / 15 / 20\n  percent), where the gains are stacked on top of ordinary taxable income and taxed\n  at the rate of the band they occupy.\n- The child tax credit and its AGI phaseout (a $50 reduction per $1,000 of AGI over\n  $200,000, or $400,000 for married filing jointly).\n- The employee share of FICA: 6.2 percent Social Security up to the annual wage\n  base plus 1.45 percent Medicare with no cap.\n\nThe bracket figures, deduction amounts, capital-gains breakpoints, wage base, and\ncredit phaseout are a fixed, illustrative single-year set (2023-style ordinary\nbrackets). They are simplified, not tax advice, and live as module constants only\nso the generator and the reward agree on one source of truth.\n\n## Reward design rationale\n\nReward is numeric closeness to the ground-truth balance. It is 1.0 within about 1\npercent relative error and scales linearly to 0.0 by about 25 percent error. The\ndenominator has a $500 absolute floor so a near-zero ground truth (a taxpayer whose\nwithholding almost exactly matches the liability) does not blow up the relative\nerror on a few-dollar miss. A tolerance band rather than exact-match is deliberate:\nit rewards the correct procedure even when intermediate rounding differs by a\ndollar, while still punishing any structural error (wrong bracket, gains at\nordinary rates, missing FICA), which moves the answer well outside the band.\n\n## Edge cases handled\n\n- Withholding can exceed liability, so refunds (negative balances) are common\n  (roughly half of generated scenarios).\n- Long-term capital gains appear in about a quarter of scenarios and can be large\n  enough to cross multiple 0 / 15 / 20 breakpoints; gains are capped at taxable\n  income so the deduction is not double-counted under them.\n- The credit phaseout can drive the child tax credit to exactly zero at high AGI.\n- Itemized deductions below the standard deduction are correctly ignored.\n- Taxable income, ordinary tax, capital-gains tax, and the after-credit tax are\n  each floored at zero so no negative tax leaks through.\n- The answer parser accepts a bare number, a number with commas, or a `$`-prefixed\n  string, and rejects booleans.\n\n## Evaluation\n\nEVAL (gpt-4o-mini, n=20): mean reward 0.87 on real gpt-4o-mini rollouts, versus 0.048 for a naive baseline (the gap confirms the reward discriminates competence from guessing)\n\nSeparation check (deterministic policies, n=300): a rules policy that runs the\ndocumented pipeline scores 1.0000; a naive baseline (flat 15 percent of gross\nincome) scores 0.0482, a gap of 0.9518.\n\n## Limitations and intended use\n\nThe constants are a single illustrative year, simplified, and not tax advice: there\nis no AMT, no itemized-deduction limits, no state tax, no self-employment or\nabove-the-wage-base Medicare surtax, and no phase-in of the refundable portion of\nthe child credit. The environment is intended as a reasoning and arithmetic RLVR\ntask, not a production tax engine. The generator and reward share one constant set,\nso the task tests whether an agent can apply a stated rule set correctly, not\nwhether it has memorized any particular real tax year.\n\n## Usage\n\n```bash\nuv run vf-install tax-calculation\nuv run vf-eval tax-calculation -m gpt-4.1-mini\n```\n\n`load_environment(num_examples=300, seed=7)` builds synthetic taxpayer scenarios\nwith known ground truth. The pure helpers (`compute_total_tax`,\n`compute_capital_gains_tax`, `compute_child_tax_credit`, `compute_fica`,\n`tax_reward`, `parse_predicted`) live above `load_environment` and are testable\nwithout importing verifiers.\n\n## Format\n\n```\n<think> reasoning </think>\n<answer>{\"tax_owed\": 8423}</answer>\n```\n","encoding":"utf-8","truncated":false,"total_bytes":6962},"status":null}