{"data":{"kind":"file","path":"README.md","version_id":"fk0w1a3mdeo9l0vh5d8rl9sp","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3739,"modified_at":"2026-01-24T17:20:47.110000","content_hash":"e78626f491d61dab9201099b1e858077b9d139cf3cb07a8e0c9d3df58fd45dd7"},"entries":[],"content":"# M-ARC\n\n### Overview\n- **Environment ID**: `m-arc`\n- **Short description**: Medical MCQ training with long-tail evaluation\n- **Tags**: medical, clinical, single-turn, multiple-choice, training, evaluation\n\n### Datasets\n\n**Training** (mixed):\n| Dataset | Source | Split | Fraction | Count |\n| ------- | ------ | ----- | -------- | ----- |\n| MedMCQA | `openlifescienceai/medmcqa` | train | 70% | ~25k |\n| MedQA-USMLE | `GBaker/MedQA-USMLE-4-options` | train | 30% | ~10k |\n\n**Evaluation**:\n| Dataset | Source | Split | Count |\n| ------- | ------ | ----- | ----- |\n| M-ARC | `mkieffer/M-ARC` | test | **100** |\n\n**Few-shot**: `TIGER-Lab/MMLU-Pro` (health category, validation split)\n\n### Task\n- **Type**: single-turn (or multi-turn with self-correction)\n- **Parser**: `Parser` or `ThinkParser`, with `extract_fn=extract_boxed_answer`\n- **Rubric**: Binary accuracy + optional dense rewards for RL stability\n\n### Quickstart\n\nRun evaluation only:\n```bash\nuv run vf-eval m-arc -a '{\"eval_only\": true}'\n```\n\nRL training with full setup:\n```bash\nprime rl run configs/lab/m-arc-qwen3-thinking.toml\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_few_shot` | int | `5` | Few-shot examples (`0` for thinking models) |\n| `use_think` | bool | `False` | Use ThinkParser for CoT |\n| `shuffle_mode` | str | `\"none\"` | `\"none\"`, `\"fixed\"`, or `\"per_rollout\"` |\n| `shuffle_seed` | int | `1618` | Base seed for shuffling |\n| **Dense Rewards** |\n| `format_reward` | bool | `False` | Reward for `\\boxed{}` format |\n| `format_reward_weight` | float | `0.1` | Weight for format reward |\n| `reasoning_reward` | bool | `False` | Reward for medical reasoning quality |\n| `reasoning_reward_weight` | float | `0.15` | Weight for reasoning reward |\n| `confidence_reward` | bool | `False` | Calibration reward (penalizes overconfidence) |\n| `confidence_reward_weight` | float | `0.1` | Weight for confidence reward |\n| `partial_credit_reward` | bool | `False` | Partial credit for near-misses |\n| `partial_credit_weight` | float | `0.1` | Weight for partial credit |\n| `diversity_reward` | bool | `False` | Group diversity bonus |\n| `diversity_reward_weight` | float | `0.05` | Weight for diversity bonus |\n| **Environment Variant** |\n| `self_correction` | bool | `False` | Multi-turn self-correction mode |\n| **Data Options** |\n| `jitter_ages` | bool | `False` | Apply age jitter augmentation (off by default to avoid label noise) |\n| `realistic_age_jitter` | bool | `True` | If jittering ages, use integer year jitter (±1) |\n| `add_difficulty` | bool | `False` | Add heuristic difficulty (0=hard, 1=easy) for buffer/curriculum filtering |\n| `medmcqa_fraction` | float | `0.7` | Fraction of training from MedMCQA |\n| `medqa_fraction` | float | `0.3` | Fraction of training from MedQA |\n| `max_train_examples` | int | `-1` | Max training examples (-1 = auto) |\n| `eval_only` | bool | `False` | Skip training data, eval only |\n\n### RL Training Configuration\n\n```toml\n[[env]]\nid = \"nappenstance/m-arc\"\n\n[env.args]\nshuffle_mode = \"fixed\"\nuse_think = false\nnum_few_shot = 0\n# Data augmentation (recommended off for stability)\njitter_ages = false\n# Dense rewards\nformat_reward = true\nreasoning_reward = true\nconfidence_reward = true\n# Training mix\nmedmcqa_fraction = 0.7\nmedqa_fraction = 0.3\n```\n\n### Metrics\n\n| Metric | Description |\n| ------ | ----------- |\n| `accuracy` | 1.0 if correct, 0.0 otherwise |\n| `format_compliance` | 0.0-0.1 for `\\boxed{}` format |\n| `reasoning_quality` | 0.0-0.25 for medical reasoning |\n| `confidence_calibration` | -0.15 to 0.05 for calibration |\n| `partial_credit` | 0.0-0.05 for near-misses |\n| `diversity_bonus` | Disabled by default (requires group-level scoring) |\n","encoding":"utf-8","truncated":false,"total_bytes":3739},"status":null}