{"data":{"kind":"file","path":"README.md","version_id":"v8upbhpqo2m5bvjawqxos8az","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":5252,"modified_at":"2025-09-07T01:41:25.322000","content_hash":"ed298b887da1aed28a65835fade80ac382843ba3ddcd3dadae39e7b9e33f6876"},"entries":[],"content":"# Banking Model Risk Management (vf_mrm_mini)\n\nMinimal RL and evaluation environment for model risk management (MRM). It trains and evaluates models on short questions grounded in SR 11‑7 and the OCC Model Risk Management Comptroller’s Handbook. The environment follows the `verifiers` framework and runs locally for deterministic RL and judge‑based evaluation.\n\n## Task\n\nAgents receive a short MRM question and must reply in this strict XML format:\n\n```\n<think>…free reasoning…</think>\n<answer>…final answer, a few sentences…</answer>\n<citations>[SR11-7]; [OCC-Handbook]</citations>\n```\n\n- Only `<answer>` is judged for semantic correctness.\n- `<citations>` must contain bracket tokens defined in `vf_mrm_mini/data/sources.json`.\n\n## Dataset\n\nPath: `vf_mrm_mini/data/dataset.jsonl`. Each line is a JSON object:\n\n- prompt: instruction + question (includes tag requirements)\n- answer: concise canonical answer\n- info: metadata (includes `required_citations`, plus `tags` and `difficulty`)\n\nCovered sources:\n\n1. [SR11-7] — Supervisory Guidance on Model Risk Management (April 4, 2011). Describes purpose/scope, defines “model,” and sets principles for development, validation, and governance. Notes both direct development costs and indirect costs from reliance on incorrect or misused models. Defines model components (input, processing, reporting) and emphasizes that models are simplified representations with quality measured by precision, accuracy, robustness, stability, etc.\n2. [OCC-Handbook] — Model Risk Management, Comptroller’s Handbook (Version 1.0, August 2021). Provides examination procedures and practices. Describes sound governance (board/senior management oversight, policies/procedures, internal controls, inventory, documentation), the need for effective challenge, and roles of board and senior management.\n\nThe dataset includes factual, scenario‑based, and synthesis questions with the relevant required citation token.\n\n## Sources\n\n`vf_mrm_mini/data/sources.json` maps tokens to canonical details, e.g.:\n\n```json\n{\n  \"[SR11-7]\": {\n    \"title\": \"Supervisory Guidance on Model Risk Management\",\n    \"url\": \"https://www.federalreserve.gov/supervisionreg/srletters/sr1107a1.pdf\",\n    \"publisher\": \"Board of Governors of the Federal Reserve System and Office of the Comptroller of the Currency\",\n    \"date\": \"2011-04-04\"\n  },\n  \"[OCC-Handbook]\": {\n    \"title\": \"Model Risk Management, Comptroller’s Handbook\",\n    \"url\": \"https://www.occ.treas.gov/publications-and-resources/publications/comptrollers-handbook/files/model-risk-management/pub-ch-model-risk.pdf\",\n    \"publisher\": \"Office of the Comptroller of the Currency\",\n    \"date\": \"2021-08-01\"\n  }\n}\n```\n\n## Installation\n\nRequires Python 3.11+. Declared deps: `verifiers`, `datasets`.\n\n```bash\nuv pip install -e .\n```\n\n## Usage\n\n```python\nimport vf_mrm_mini as mrm\nenv = mrm.load_environment(use_judge=False)  # deterministic, rule-based RL\n```\n\nCLI sanity eval (rule + judge if configured):\n\n```bash\nuv run vf-eval vf-mrm-mini -n 12 -r 2\n```\n\n## Scoring\n\n- Format (0.2): XML tags present in order.\n- Citations required (0.5): counts only tokens inside `<citations>`; extra tokens lightly penalized.\n- Citations allowed‑only (0.2): `<citations>` must use whitelisted tokens.\n- Length (0.1): gentle shaping on `<answer>` only.\n\nGood output example\n\n```\n<think>Consider SR 11-7’s definition and two causes.</think>\n<answer>Model risk is the potential for adverse consequences from decisions based on incorrect or misused model outputs and reports. It arises when a model contains fundamental errors that yield inaccurate estimates for its design and intended use, and when otherwise sound models are used incorrectly or inappropriately, such as outside the environment for which they were designed.</answer>\n<citations>[SR11-7]</citations>\n```\n\nBad output example (will score poorly)\n\n```\n<think>Answer directly.</think>\n<answer>Model risk happens when models are wrong. Validate yearly.</answer>\n<citations>[Random-Source]</citations>\n```\n\nReasons: unapproved token; blanket annual rule not supported by SR 11‑7; vague answer.\n\n## Development Notes\n\n- Data and sources are bundled under `vf_mrm_mini/data`.\n- Validate dataset: `python scripts/validate_dataset.py`\n- Re‑annotate tags/difficulty after edits: `python scripts/annotate_dataset.py`\n\n## Quick Start (Local)\n\n- Install: `uv pip install -e .`\n- Lint data: `python scripts/validate_dataset.py`\n- Evaluate: `uv run vf-eval vf-mrm-mini -n 12 -r 2`\n- Baseline: `python scripts/baseline_policy.py \"What is model risk?\"`\n\nGeneration settings (for your model/policy):\n- Stop sequences: `</citations>`\n- Max output tokens: ~256\n- Temperature: 0.0–0.3; top_p: 1.0 (judge runs: temp=0)\n- Always emit `<think>`, `<answer>`, `<citations>`; only cite `[SR11-7]`/`[OCC-Handbook]`\n\nJudge (for evaluation): freeze model + params (temp 0, top_p 1) and cap concurrency; if no judge is configured, only deterministic scores are reported.\n\n## Disclaimer\n\nThis environment benchmarks model risk management knowledge grounded in SR 11‑7 and the OCC Model Risk Management Comptroller’s Handbook. It is for educational and evaluation purposes only and does not constitute supervisory guidance or advice.\n","encoding":"utf-8","truncated":false,"total_bytes":5252},"status":null}