{"data":{"kind":"file","path":"README.md","version_id":"jigzx1ag2fh2mqaibtjld9zd","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1609,"modified_at":"2026-06-02T23:05:40.639000","content_hash":"66d25efcfbb6aa5642761266960e9325d663a5a249fd33d329ee1c23c3a96096"},"entries":[],"content":"# meta-memory-state\n\n`meta-memory-state` is a small deterministic Verifiers environment for testing\nmulti-turn state tracking.\n\nEach example gives a pocket ledger with signed balances across a small set of\naccounts. The model receives 2-5 debit, credit, or transfer updates across user\nturns and must reply after each update with the full ledger inside a\n`<ledger>...</ledger>` JSON block.\n\nThe recommended first smoke uses only two accounts and one to two turns. Harder\nmulti-turn settings are available through `min_turns`, `max_turns`, and\n`account_count` once the smoke shows useful reward variance.\n\nThe reward is shaped and defensive:\n\n- per-account balance correctness\n- total correctness\n- schema adherence, including penalties for hallucinated accounts and extra\n  keys inside `balances`\n- format credit for one valid tagged JSON block\n- anti-stuffing penalty for multiple ledger candidates, repeated tags, long\n  outputs, or code fences\n- malformed, empty, or `None` outputs return low reward instead of raising\n\nThe environment also reports auxiliary metrics for parseability, exact one-ledger\nformat adherence, multiple-candidate outputs, code fences, candidate count, and\nmean assistant output length. These metrics are intended for diagnosing reward\nhacking without making held-out evals optimize a different objective.\n\nThe environment uses no tools, no sandbox, and no judge model.\n\n## Usage\n\n```python\nfrom verifiers import load_environment\n\nenv = load_environment(\n    \"meta-memory-state\",\n    seed=1337420,\n    num_examples=128,\n    min_turns=1,\n    max_turns=2,\n    account_count=2,\n)\n```\n","encoding":"utf-8","truncated":false,"total_bytes":1609},"status":null}