{"data":{"kind":"file","path":"README.md","version_id":"gowxmb23orbnyq12qtbxfp96","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2534,"modified_at":"2026-01-22T12:33:03.686000","content_hash":"cf2499465a547791b6b0ca06e340505f6b2367d4250e94f20eb463df4c6f85e4"},"entries":[],"content":"# OpenMed-PICO\n\n### Overview\n- **Environment ID**: `maziyar/OpenMed-PICO`\n- **Short description**: Extract PICO elements (Participants, Interventions, Outcomes) from clinical trial abstracts\n- **Tags**: medical, clinical-trials, pico, ner, evidence-based-medicine\n\n### Datasets\n- **Primary dataset(s)**: bigbio/pico_extraction - 423 clinical trial sentences annotated by 3 medical experts with majority voting\n- **Source links**: https://huggingface.co/datasets/bigbio/pico_extraction\n- **Split sizes**: ~338 train / ~85 eval (80/20 split)\n\n### Task\n- **Type**: single-turn\n- **Parser**: Raw output (JSON extraction with regex fallbacks)\n- **Rubric overview**:\n  - JSON format validity (25%)\n  - Entity recall - finding all PICO elements (30%)\n  - Entity precision - correct predictions (25%)\n  - Label accuracy - correct P/I/O classification (15%)\n  - Output consistency - no duplicates, valid labels (5%)\n\n### Expected Output Format\n```json\n[\n  {\"text\": \"44 patients with inclusion body myositis\", \"label\": \"participant\"},\n  {\"text\": \"methotrexate\", \"label\": \"intervention\"},\n  {\"text\": \"disease progression\", \"label\": \"outcome\"}\n]\n```\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval maziyar/OpenMed-PICO\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval maziyar/OpenMed-PICO \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_train_examples` | int | `-1` | Limit on training dataset size (-1 for all) |\n| `num_eval_examples` | int | `-1` | Limit on eval dataset size (-1 for all) |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Weighted sum: format (0.25) + recall (0.30) + precision (0.25) + label_acc (0.15) + consistency (0.05) |\n| `json_format` | Valid JSON array with proper entity structure |\n| `entity_recall` | Proportion of ground truth PICO elements found |\n| `entity_precision` | Proportion of predicted entities that are correct |\n| `label_accuracy` | Correct classification of matched entities as participant/intervention/outcome |\n\n### PICO Labels\n\n| Label | Description | Examples |\n| ----- | ----------- | -------- |\n| `participant` | Study population characteristics | \"44 patients with diabetes\", \"elderly women\" |\n| `intervention` | Treatments, medications, procedures | \"methotrexate\", \"physical therapy\", \"surgery\" |\n| `outcome` | Measured results or health outcomes | \"disease progression\", \"mortality rate\", \"pain reduction\" |\n","encoding":"utf-8","truncated":false,"total_bytes":2534},"status":null}