{"data":{"kind":"file","path":"README.md","version_id":"pe1b05rqium0wxr6z5c1nmzi","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2194,"modified_at":"2025-09-07T16:41:18.959000","content_hash":"146536feaa243b05d321d8a4c2195a95396170baf9d16d9b371aaa0fbd6b3369"},"entries":[],"content":"# Contract-NLI-Verifier\n\n### Overview\n- **Environment ID**: `contract-nli-env`\n- **Short description**: Single-turn natural language inference on legal contract clauses. The agent must determine if a hypothesis entails, contradicts, or is neutral to a given contract clause.\n- **Tags**: legal, nli, reasoning, single-turn, think, boxed-answer, train, eval\n\n### Dataset\n- **Source**: `kiddothe2b/contract-nli` (Hugging Face datasets)\n- **Splits**: Uses the `train` and `test` splits provided by the dataset JSONL.\n- **Fields used**:\n  - `premise`: The contract clause.\n  - `hypothesis`: The statement to be verified against the premise.\n  - `label`: The ground-truth label.\n\n### Prompting & Schema\n- **System message**: Instructs the agent to act as a legal analyst, reason inside `<think>`, and put the final choice in `\\boxed{...}` using one of: `ENTAILMENT`, `NEUTRAL`, `CONTRADICTION`.\n- **User message**: Built with the template: `Contract Clause (Premise): ...`, `Hypothesis: ...`, `Based on the premise, is the hypothesis an entailment, neutral, or a contradiction?`.\n- **Example schema per example**:\n  - `prompt`: A list of chat messages for the model.\n  - `premise`: The contract clause string.\n  - `hypothesis`: The hypothesis string.\n  - `answer`: The ground-truth label as a string (e.g., `\"ENTAILMENT\"`).\n\n### Parser & Rewards\n- **Parser**: `ThinkParser` with `extract_boxed_answer` to read the final label from `\\boxed{...}`.\n- **Rewards**:\n  - `correct_label_reward_func` (weight 1.0): 1.0 if the parsed label matches the ground-truth answer, else 0.0.\n  - `parser.get_format_reward_func()` (weight 0.0): Optional format adherence (not counted).\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_train_examples` | int | `-1` | Limit training set size (`-1` for all) |\n| `num_eval_examples` | int | `-1` | Limit eval set size (`-1` for all) |\n| `use_few_shot` | bool | `false` | Add few-shot messages to improve accuracy |\n\n### Quickstart\n\nEvaluate the environment using the `vf-eval` command:\n```bash\nuv run vf-eval contract-nli-env \\\n  -a '{\"num_train_examples\":100, \"num_eval_examples\":20, \"use_few_shot\":true}'\n```","encoding":"utf-8","truncated":false,"total_bytes":2194},"status":null}