{"data":{"kind":"file","path":"README.md","version_id":"r5zb9zq1zg57vnprzzodgehe","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3900,"modified_at":"2025-12-25T14:08:44","content_hash":"3be6d87ee98b1c4fab26c9af852718413d59ac85930c33e6506472740aec75f2"},"entries":[],"content":"# ag-q-rag\n\n### Overview\n- **Environment ID**: `ag-q-rag`\n- **Short description**: Quran RAG environment with semantic search over Quranic verses and judge-in-the-loop evaluation for Islamic knowledge QA.\n- **Tags**: `quran`, `islamic-qa`, `rag`, `tool-use`, `multilingual`, `arabic`, `semantic-search`\n\n### Datasets\n- **Primary dataset(s)**: \n  - **Quran Corpus**: `nazimali/quran` - Complete Quran with Arabic text, transliterations, and metadata for all 114 surahs\n  - **Islamic QA**: `dataset_simpleqa_islamic/simple_qa_islamic_v2.jsonl` - Question-answer pairs about Islam and the Quran\n- **Source links**: \n  - [nazimali/quran on HuggingFace](https://huggingface.co/datasets/nazimali/quran)\n- **Split sizes**: ~6,236 ayat indexed for search; QA dataset size varies\n\n### Task\n- **Type**: `multi-turn | tool use`\n- **Parser**: Default verifiers `Parser`\n- **Rubric overview**: \n  - `ToolRubric`: Validates proper tool usage for Quran search operations\n  - `JudgeRubric`: LLM-based evaluation of answer accuracy, Quranic citation quality, and alignment with Islamic teachings\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval ag-q-rag\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval ag-q-rag \\\n  -m gpt-4.1-mini \\\n  -n 20 -r 3 -t 1024 -T 0.7 \\\n  -a '{\"max_turns\": 10, \"judge_model\": \"gpt-4.1-mini\"}'\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n- The environment uses multilingual embeddings by default (no API key required for embeddings).\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `max_turns` | int | `10` | Maximum number of tool interaction turns |\n| `judge_model` | str | `\"gpt-4.1-mini\"` | Model used for judging answer quality |\n| `judge_base_url` | str | `\"https://api.openai.com/v1\"` | Base URL for judge API |\n| `judge_api_key_var` | str | `\"OPENAI_API_KEY\"` | Environment variable for judge API key |\n| `embed_model` | str | `\"sentence-transformers/paraphrase-multilingual-mpnet-base-v2\"` | Multilingual embedding model for semantic search |\n| `quran_dataset` | str | `\"nazimali/quran\"` | HuggingFace dataset for Quran corpus |\n| `qa_dataset` | str | `\"dataset_simpleqa_islamic/simple_qa_islamic_v2.jsonl\"` | Path to Islamic QA dataset (JSONL) |\n| `chroma_db_dir` | str | `\".chroma_db_quran\"` | Directory for ChromaDB persistence |\n\n### Available Tools\n\n| Tool | Description | Example |\n| ---- | ----------- | ------- |\n| `search_quran(query, n_results)` | Semantic search over all Quran verses | `search_quran(\"الصبر\", 10)` |\n| `get_surah_info(surah_number)` | Get metadata for a specific surah (1-114) | `get_surah_info(2)` |\n| `read_ayah(surah, ayah)` | Read full text of a specific ayah | `read_ayah(2, 255)` |\n| `search_surah(surah_number, query, n_results)` | Search within a specific surah | `search_surah(2, \"الصلاة\", 5)` |\n| `list_surahs()` | List all 114 surahs with basic info | `list_surahs()` |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Main scalar reward (weighted sum of tool usage + judge evaluation) |\n| `judge_score` | Binary score (1.0 if judge approves, 0.0 otherwise) |\n| `tool_usage` | Proper invocation and parsing of Quran search tools |\n\n### Example Questions\nThe environment handles bilingual (Arabic/English) questions about Islam and the Quran:\n\n- ما هو معيار التفاضل بين المسلمين في الإسلام؟\n- ما هي آية الكرسي وما فضلها؟\n- What does the Quran say about patience (sabr)?\n- Which surah is known as the heart of the Quran?\n\n### Notes\n- **Embeddings**: Uses open-source `sentence-transformers` for multilingual embeddings (Arabic + English support)\n- **Persistence**: ChromaDB stores indexed ayat locally for fast subsequent runs\n- **First run**: Initial indexing of ~6,236 ayat may take a few minutes","encoding":"utf-8","truncated":false,"total_bytes":3900},"status":null}