{"data":{"kind":"file","path":"README.md","version_id":"fkgva0xg6r957qui4sieny3i","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2872,"modified_at":"2025-08-24T23:09:17.722000","content_hash":"d78179965758368ee09361a837d2de4fa555f89ef35435aeec0e4aadad882004"},"entries":[],"content":"# OpenMed_MedMCQA\n\n### Overview\n- **Environment ID**: `OpenMed_MedMCQA`\n- **Short description**: Single-turn medical multiple-choice QA on MedMCQA with chain-of-thought and a final decision as a boxed letter `\\\\boxed{A|B|C|D}`.\n- **Tags**: openmed, medmcqa, multiple-choice, single-turn, think, boxed-letter, train, eval\n\n### Dataset\n- **Source**: `medmcqa` (HF datasets)\n- **Splits**: Uses provided `train`, `validation`, and `test` if present; otherwise creates a train/eval holdout from `train` via `train_test_split` (seed=42).\n- **Fields used**:\n  - `question`: the stem.\n  - `opa`, `opb`, `opc`, `opd`: four option strings, mapped to A–D.\n  - `cop`: correct option (typically `A|B|C|D`; numeric or text match also supported).\n  - `exp`, `choice_type`, `subject_name`, `topic_name`: kept as-is but not required by the env.\n\n### Prompting & Schema\n- **System message**: Instructs to reason inside `<think>...</think>` and put the final choice letter in `\\\\boxed{...}` using exactly one token from `{A,B,C,D}`.\n- **User message**: Built with the provided `doc_to_text`-style template: `Question: ...`, `Choices:` with `A. ...` to `D. ...`, ending with `Answer:`.\n- **Example schema per example**:\n  - `prompt`: list of messages `[{\"role\":\"system\",...}, {\"role\":\"user\",...}]`\n  - `options`: list of 4 option strings\n  - `answer_letter`: one of `A|B|C|D`\n  - `answer_idx`: integer index (0–3)\n  - `answer`: letter (e.g., `\"C\"`)\n\n### Parser & Rewards\n- **Parser**: `ThinkParser` with `extract_boxed_answer` to read the final letter from `\\\\boxed{...}`.\n- **Rewards**:\n  - `correct_letter_reward_func` (weight 1.0): 1.0 if parsed letter equals `answer_letter` (numeric `0–3` also accepted and mapped), else 0.0.\n  - `parser.get_format_reward_func()` (weight 0.0): optional format adherence (not counted).\n\n### Environment Arguments\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_train_examples` | int | `-1` | Limit training set size (`-1` for all) |\n| `num_eval_examples` | int | `-1` | Limit eval set size (`-1` for all) |\n\n### Quickstart\n\nEvaluate with defaults (uses the env’s internal dataset handling):\n\n```bash\nuv run vf-eval OpenMed_MedMCQA \\\n  -a '{\"num_train_examples\":-1, \"num_eval_examples\":-1}'\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n- Reports (if produced) will be placed under `./environments/OpenMed_MedMCQA/reports/`.\n - Choices are displayed in a deterministic randomized label order per example (seeded by `id`); the underlying option mapping (A→opa, B→opb, …) and targets remain unchanged.\n\n## Evaluation Reports\n\n<!-- Do not edit below this line. Content is auto-generated. -->\n<!-- vf:begin:reports -->\n<p>No reports found. Run <code>uv run vf-eval OpenMed_MedMCQA -a '{\"key\": \"value\"}'</code> to generate one.</p>\n<!-- vf:end:reports -->\n","encoding":"utf-8","truncated":false,"total_bytes":2872},"status":null}