{"data":{"kind":"file","path":"README.md","version_id":"k2bnwk8uybv0vvzs6q0i5hqw","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3899,"modified_at":"2026-02-20T23:11:44.201000","content_hash":"9b04a384b481093344c565b66928cf27ebf7de3a662b96fcd6c947ad0ad1ce6b"},"entries":[],"content":"# medhallu\n\n## Overview\n- **Environment ID**: `medhallu`\n- **Short description**: Medical hallucination detection benchmark evaluating whether models can identify factual vs. hallucinated medical answers.\n- **Tags**: hallucination-detection, medical, classification, single-turn\n\n## Datasets Information\n\n- **Paper:**: [MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models.](https://arxiv.org/abs/2502.14302)\n- **Source links**: [UTAustin-AIHealth/MedHallu](https://huggingface.co/datasets/UTAustin-AIHealth/MedHallu)\n- **Split sizes**: \n  - `pqa_labeled`: ~1k high-quality human-labeled examples\n  - `pqa_artificial`: ~9k synthetically generated examples\n\n## Task\n- **Type**: single-turn\n- **Rubric overview**: \n  - `+1.0` for correct classification (matching target `0` or `1`)\n  - `+0.01` for abstaining with `2` (unsure) (configurable via `unsure_reward`)\n  - `0.0` for incorrect classification or malformed answer\n\nThe model is presented with a medical question and an answer, then must judge:\n- `0` = Answer is factual\n- `1` = Answer is hallucinated\n- `2` = Unsure (partial credit)\n\n## Differences verses MedHallu paper\n\nThis environment intentionally differs from the MedHallu paper’s evaluation protocol:\n\n- **We evaluate both options per item**: for each dataset row, we create two evaluation examples — one pairing the question with the **Ground Truth** answer (label `0`) and one pairing it with the **Hallucinated Answer** (label `1`). The paper’s implementation samples one of the two.\n- **F1 is computed via postprocessing**: the paper reports **F1** (treating hallucination as the positive class). In this repo, you should compute F1 by postprocessing the `results.jsonl` output and dropping `\\boxed{2}` (unsure) predictions.\n\n## Quickstart\nRun an evaluation with default settings:\n\n```bash\nprime eval run medhallu -m \"openai/gpt-5-mini\" -n 5 -s\n```\n\nConfigure model and sampling:\n\n```bash\nmedarc-eval medhallu -m \"openai/gpt-5-mini\" -n 20 --subset pqa_labeled\n```\n\nNotes:\n- Use direct environment flags with `medarc-eval` (for example, `--split validation` or `--judge-model gpt-5-mini`).\n\n## Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `subset` | str | `\"pqa_labeled\"` | Dataset subset: `\"pqa_labeled\"` (1k high-quality) or `\"pqa_artificial\"` (9k generated) |\n| `difficulty` | str | `\"all\"` | Filter by difficulty: `\"easy\"`, `\"medium\"`, `\"hard\"`, or `\"all\"` |\n| `use_knowledge` | bool | `False` | If `True`, includes the \"Knowledge\" field in the prompt as additional context |\n| `unsure_reward` | float | `0.01` | Reward assigned when the model outputs `\\boxed{2}` |\n\n## Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Scalar reward used by Verifiers (see rubric overview above) |\n| `accuracy` | Exact match on the target label (`0` or `1`) |\n| `precision/recall/f1` | Not produced by the environment directly; compute via postprocessing (below) |\n\n## Postprocessing (F1)\n\nAfter you run an eval, compute paper-style F1 (positive label `1`) and update the run’s `metadata.json`:\n\n```bash\npython environments/medhallu/postprocess.py /path/to/results.jsonl\n```\n\nThis script:\n- extracts `\\boxed{0|1|2}` from completions\n- drops missing/malformed answers\n- drops `\\boxed{2}` (unsure)\n- computes `accuracy`, `precision`, `recall`, `f1` (with `1` as the positive class)\n\n## Hallucination Types\nThe model is trained to detect these hallucination categories:\n- **Misinterpretation of Question**: Off-topic or irrelevant responses due to misunderstanding\n- **Incomplete Information**: Pointing out what's false without providing correct information\n- **Mechanism and Pathway Misattribution**: False attribution of biological mechanisms or disease processes\n- **Methodological and Evidence Fabrication**: Invented research methods, statistics, or clinical outcomes\n","encoding":"utf-8","truncated":false,"total_bytes":3899},"status":null}