{"data":{"kind":"file","path":"README.md","version_id":"n9p7gcmvdsljggkisg6zm1y6","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2529,"modified_at":"2025-11-25T18:50:47.300000","content_hash":"afad226c224e41e96d36baeeda216520428aed3a38443738b749b339f0963507"},"entries":[],"content":"# omnidocbench\n\n### Overview\n- **Environment ID**: `omnidocbench`\n- **Short description**: Evaluates multimodal document parsing (OCR, layout, formulas, tables) by converting document images to structured Markdown.Ported from OmniDocBench.\n- **Tags**: multimodal, ocr, document-parsing, single-turn, vision, eval\n\n### Datasets\n- **Primary dataset(s)**: `opendatalab/OmniDocBench`\n- **Source links**: [Hugging Face](https://huggingface.co/datasets/opendatalab/OmniDocBench), [GitHub](https://github.com/opendatalab/OmniDocBench)\n- **Split sizes**: Full evaluation dataset loaded from `OmniDocBench.json`, configurable via `num_examples`.\n\n### Task\n- **Type**: single-turn\n- **Parser**: Custom logic (uses `md_tex_filter` from OmniDocBench repo) to separate text, formulas, and tables.\n- **Rubric overview**: A consolidated `End2EndRubric` that calculates `1 - EditDistance` for text blocks, formulas, and tables, plus a sequence distance metric for reading order.\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval omnidocbench\n```\n\nConfigure model and sampling (requires a vision-capable model):\n\n```bash\nuv run vf-eval omnidocbench \\\n  -m gpt-4o \\\n  -n 10 \\\n  -a '{\"num_examples\": 10, \"seed\": 42}'\n```\n\nNotes:\n- **First Run**: The environment will automatically clone the OmniDocBench evaluation repository and download the dataset from Hugging Face. This may take some time.\n- **OCR models and LaTeX tables**: If you want to evaluate OCR/non-general models that generate LaTeX tables, you will need to install `latexml` on your system and ensure you're using the correct `prompt` for the model.\n\n### Environment Arguments\nSupported arguments passed via `-a` / `--env-args`:\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_examples` | int | `-1` | Limit the number of document pages to evaluate (`-1` for all). |\n| `seed` | int | `42` | Random seed for dataset shuffling. `-1` for no shuffling. |\n| `prompt` | str | Default | Override prompt to use for the task. |\n\n### Metrics\nThe rubric emits the following metrics (all normalized 0.0 to 1.0, where 1.0 is perfect):\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Aggregated score of all sub-metrics. |\n| `text_block_reward` | Text content accuracy (1 - normalized edit distance). |\n| `reading_order_reward` | Accuracy of the sequence of detected text blocks. |\n| `display_formula_reward` | LaTeX formula recognition accuracy. |\n| `table_reward` | Table structure and content accuracy (HTML/LaTeX). |","encoding":"utf-8","truncated":false,"total_bytes":2529},"status":null}