{"data":{"kind":"file","path":"README.md","version_id":"xzf9r3oid6n14xnwru9z9yar","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":6851,"modified_at":"2026-06-05T22:08:34.427000","content_hash":"ccbcb9cb8d87744458ea51ac803a8192424b8cf10fb34c507a0e81b09b2d4f31"},"entries":[],"content":"# credit-underwriting\n\nA single-turn, verifiable (RLVR) environment for consumer installment-loan underwriting.\nThe agent receives one loan application and must (1) decide approve or deny against a\nfixed written credit policy and (2) when approving, compute the maximum principal the\npolicy will extend. Ground truth is produced by the same rule engine the agent is asked\nto reproduce, so every reward is exactly checkable.\n\n## Overview and motivation\nReal lenders run applications through an automated underwriting system (AUS) that applies\na written credit policy: score floors, income-stability checks, debt-ratio caps, residual-\nincome overlays, collateral limits, and risk-based pricing. The decision and the maximum\nloan amount are deterministic given the inputs. That determinism is what makes\nunderwriting a strong verifiable-RL target: there is one correct decision and one correct\nmaximum amount per file, with no rater subjectivity. The environment trains an agent to\napply a multi-gate policy in the correct order and to solve for the binding constraint when\nsizing the loan.\n\n## Task spec\nInput (per application): FICO score, applicant annual income, optional co-borrower annual\nincome, existing monthly debt payments, requested loan amount, loan term in months,\nemployment history in years, and whether the loan is secured (with collateral value).\n\nOutput: JSON `{\"decision\": \"approve\" | \"deny\", \"max_amount\": <number>}`, with\n`max_amount` set to 0 on a denial.\n\nThe embedded policy, applied in order (thresholds illustrative, not a real credit policy):\n- Qualifying income = applicant income + co-borrower income; monthly is that over 12.\n- Decline if FICO < 620.\n- Decline if employment history < 2 years.\n- Risk-based pricing sets the amortization APR by FICO band: >= 760 -> 7%, 720-759 -> 9%,\n  680-719 -> 12%, 620-679 -> 16%.\n- Housing payment = amortized payment on the requested loan at the priced APR over the\n  term, plus a PMI surcharge of 0.75% of principal per year (divided by 12) when the loan\n  is secured and LTV > 80%.\n- Front-end (housing) DTI = housing payment / monthly qualifying income; decline if > 0.31.\n- Back-end (total) DTI = (existing monthly debts + housing payment) / monthly qualifying\n  income; decline if > 0.43.\n- Residual income = monthly qualifying income - existing debts - housing payment; decline\n  if < $800.\n- For a secured loan, LTV = requested amount / collateral value; decline if > 0.90.\n- Otherwise approve. The maximum approved amount is the largest principal that\n  simultaneously satisfies the back-end DTI cap, the residual-income floor, and (if\n  secured) the LTV cap, accounting for the PMI surcharge where it applies, and is at least\n  the requested amount.\n\n## Domain grounding\nThe mechanics follow standard U.S. consumer-credit underwriting concepts, named here only\nas real concepts (no invented citations):\n- **Front-end and back-end debt-to-income ratios** (the \"housing\" / top ratio and the\n  \"total obligations\" / bottom ratio).\n- **Loan-to-value (LTV)** as the secured-collateral limit.\n- **FICO score bands** as the credit-quality gate.\n- **Risk-based pricing**, where weaker credit prices at a higher note rate, which raises\n  the payment and tightens DTI rather than offering a single rate to all applicants.\n- **Ability-to-repay (ATR)** style **residual-income** testing, in the spirit of the\n  residual-income method used in VA underwriting and lender reserve overlays layered on\n  top of DTI.\n- **Private mortgage insurance (PMI)** triggered above 80% LTV, the conforming convention\n  associated with **Fannie Mae and Freddie Mac conforming underwriting guidelines**. Here\n  the PMI surcharge feeds back into the DTI test, so a high-LTV loan can fail DTI even when\n  it clears the LTV cap.\n\nThreshold values are intentionally simplified illustrations of how these gates compose,\nnot any institution's published policy.\n\n## Reward design rationale\nTwo weighted, verifiable components:\n- **Decision correctness** (weight 0.6): 1.0 if approve/deny matches ground truth, else 0.0.\n  This is the high-stakes part of underwriting (a wrong approve/deny is the costly error),\n  so it carries the larger weight and is binary.\n- **Max-amount closeness** (weight 0.4): when ground truth is approve, 1.0 within ~2%\n  relative error, scaling linearly to 0.0 by ~30% error; when ground truth is a correctly\n  matched denial, no amount is needed and a matching deny scores 1.0; a wrong decision\n  scores 0.0 here as well. Graded closeness (rather than exact match) keeps the signal\n  smooth for the continuous sizing sub-task while still rewarding the agent for finding the\n  true binding constraint.\n\nThe components separate cleanly: an agent that applies the policy scores ~1.0, while an\n\"always approve at the full requested amount\" baseline scores far lower because it mis-\ndecides every file that the gates should decline and never sizes up an approvable loan.\n\n## Edge cases handled\n- Co-borrower income raises qualifying income and can flip a file from decline to approve.\n- The PMI surcharge is a function of principal (it switches on at 80% LTV), so the maximum\n  amount is solved in both the PMI-off and PMI-on regimes and the largest feasible\n  principal is taken, then clipped to the hard 90% LTV cap. The solver is validated to\n  return a feasible, binding principal on every approval.\n- Unsecured loans skip LTV and PMI entirely.\n- Zero-interest and zero-term degenerate cases are handled in the amortization helpers.\n- A correctly matched denial requires no amount, so it is not penalized on the amount term.\n\n## EVAL (gpt-4o-mini, n=20): mean reward 0.96 on real gpt-4o-mini rollouts, versus 0.205 for a naive baseline (the gap confirms the reward discriminates competence from guessing)\n\n## Limitations and intended use\n- The policy is a simplified, internally consistent approximation, not a production credit\n  policy; thresholds are illustrative and omit many real overlays (trended credit data,\n  documentation tiers, fraud screens, fair-lending controls).\n- Synthetic applications are independently sampled and do not model real population\n  correlations between FICO, income, and collateral.\n- Intended use is training and evaluating rule-application and constraint-solving behavior\n  for credit-decisioning agents. The same schema and reward transfer to real loan-\n  origination data (applications paired to underwriter decisions and approved amounts) on\n  acquisition of a lender or loan servicer. Not for use in actual credit decisions.\n\n## Usage\n```bash\nuv run vf-install credit-underwriting\nuv run vf-eval credit-underwriting -m gpt-4.1-mini\n```\n`load_environment(num_examples=300, seed=7)` builds synthetic applications with known\nground truth.\n\n## Format\n```\n<think> reasoning </think>\n<answer>{\"decision\": \"approve\", \"max_amount\": 41250.0}</answer>\n```\n","encoding":"utf-8","truncated":false,"total_bytes":6851},"status":null}