{"data":{"kind":"file","path":"README.md","version_id":"nj4uvuo9yqomn1x7vgl9hexh","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":5746,"modified_at":"2026-06-05T22:08:34.426000","content_hash":"858cf95516a3df39697b9eb19c100ce22bc5d188179928f956210f8d84c4c279"},"entries":[],"content":"# Medical Claim Adjudication (RLVR)\n\n> Reproduce the payer's explanation of benefits, line by line, to the cent.\n\n## What problem this trains\n\nWhen a professional medical claim reaches a payer, an adjudication engine turns it into\nan explanation of benefits (EOB): which lines paid, which denied and why, what the plan\npaid, and what is patient responsibility. That output is deterministic given the claim\nand the member's benefits, and it is audited against the remittance advice. So it is an\nideal reinforcement-learning-from-verifiable-rewards (RLVR) target: there is exactly one\ncorrect EOB, and the reward checks the agent's answer against it rather than asking a\nmodel to judge plausibility. The agent here learns the payer side of revenue-cycle\nmanagement (RCM), the same reasoning a denials analyst or a payer operations team runs\nhundreds of times a day.\n\n## The adjudication pipeline (ground truth)\n\nThe environment applies edits in the order a real engine does. Order matters: a line\ndenied by a coverage edit never reaches cost sharing.\n\n1. **Coverage edit.** Non-covered services are denied.\n2. **Prior authorization.** A line that requires prior auth with none on file is denied.\n   Prior auth is a standard utilization-management control.\n3. **NCCI bundling edit.** The National Correct Coding Initiative defines code pairs\n   that should not be billed together. When a **component** (column-two) code appears\n   alongside its **comprehensive** (column-one) code on the same claim and both are\n   otherwise payable, the component line is **bundled** (denied), because its work is\n   already included in the comprehensive procedure. Two edits are modeled:\n   `97110` bundles into `29881`, and `12001` bundles into `20610`.\n4. **Allowed amount and network status.** Each surviving line is recognized at its\n   **allowed amount** (the contracted fee-schedule rate), not the provider's billed\n   charge; the billed-minus-allowed difference is a contractual write-off. If the claim\n   is **out of network**, the recognized allowed amount is reduced by an out-of-network\n   factor before cost sharing.\n5. **Patient cost sharing.** On the recognized allowed total: the **remaining\n   deductible** applies first (patient pays it), then **coinsurance** on the balance,\n   then a flat per-visit **copay**. The plan pays the remainder, floored at 0.\n\n## Domain grounding\n\nEvery mechanic above maps to a named concept in payer adjudication: CPT procedure\ncoding, allowed amounts and fee schedules, prior authorization, NCCI bundling edits\n(comprehensive vs component codes), in-network vs out-of-network benefit reductions,\ndeductible, coinsurance, copay, and the EOB itself. No payer policy IDs or statute\nnumbers are fabricated; the README and prompt use only the standard industry terms.\n\n## Reward design\n\nTwo scorers, weighted 80/20:\n\n- **`plan_paid_reward` (0.8).** The paid dollar figure is what the payer remits and\n  audits, so it carries the weight. Scored on a tolerance band: 1.0 within 2% of the\n  true plan-paid amount, decaying linearly to 0 at 50% error. When the true plan-paid\n  is 0 (a fully denied or fully deductible-absorbed claim), only a near-0 answer scores.\n- **`denied_codes_f1` (0.2).** F1 over the set of denied codes. This rewards correct\n  coverage, prior-auth, and bundling reasoning even when the arithmetic is slightly off,\n  and it is what catches an agent that gets the dollars right by luck while denying the\n  wrong lines.\n\nWhy not weight denials higher? The plan-paid amount already depends on every denial\ndecision (a wrongly paid line inflates the recognized total), so the dollar reward\nimplicitly penalizes denial mistakes; the explicit F1 just sharpens the signal.\n\n## Edge cases handled\n\n- Fully denied claim or deductible that absorbs the entire allowed total -> plan-paid 0,\n  scored against a near-0 answer.\n- NCCI: the component line is only bundled when its comprehensive partner is itself\n  payable on the same claim (a denied comprehensive line does not trigger the edit).\n- Out-of-network reduction stacks correctly in front of deductible, coinsurance, and\n  copay rather than after them.\n- Copay never drives plan-paid below 0.\n- Allowed-vs-billed: the agent is rewarded only if it pays on allowed, not billed.\n\n## Evaluation\n\nA rules policy returning ground truth scores ~1.0; a naive \"pay billed, deny nothing\"\nbaseline scores far below it, confirming the reward is learnable and discriminating.\n\n```\nEVAL (gpt-4o-mini, n=20): mean reward 0.43 on real gpt-4o-mini rollouts, versus 0.07 for a naive baseline (the gap confirms the reward discriminates competence from guessing)\n```\n\n## Limitations and intended use\n\n- Scope is professional (CPT) claims with the most common edits; DRG/institutional\n  pricing, modifier-driven repricing, coordination-of-benefits across multiple payers,\n  and the full NCCI table are intentionally excluded to keep the reward exactly\n  verifiable. The schema generalizes to them.\n- Synthetic claims stand in for data so the environment runs today; on acquisition of an\n  RCM / billing operation, de-identified claims paired to payer paid/denied outcomes\n  drop into the same schema and the same verifier scores against the real remittance.\n- For training and evaluating adjudication agents, not for use as a production claims\n  engine.\n\n## Format\n\n```\n<think> coverage, prior auth, NCCI bundling, allowed/network, deductible -> coinsurance -> copay </think>\n<answer>{\"plan_paid\": 612.40, \"denied_codes\": [\"70450\", \"97110\"]}</answer>\n```\n\n## Usage\n\n```bash\nuv run vf-install medical-claim-adjudication\nuv run vf-eval medical-claim-adjudication -m gpt-4o-mini\n```\n\n`load_environment(num_examples=300, seed=7)` builds synthetic claims with known ground\ntruth.\n","encoding":"utf-8","truncated":false,"total_bytes":5746},"status":null}