{"data":{"kind":"file","path":"README.md","version_id":"zvq35gghvzlxesjqz0bv4s5u","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3991,"modified_at":"2025-08-29T09:07:09.674000","content_hash":"0bc3665c2210d7bd3e6a839ad374c83ff50fe524594ce06372728452851de29d"},"entries":[],"content":"# loong-seed-graph-discrete-math\n\n> Replace the placeholders below, then remove this callout. Keep the Evaluation Reports section at the bottom intact so reports can auto-render.\n\n### Overview\n- **Environment ID**: `loong-seed-graph-discrete-math`\n- **Short description**: An environment contains 178 seed questions (53 training, 125 test) in graph discrete math. The seed data are real, human-vetted data collected from networkX library's documentation. \n- **Tags**: reasoning, graph theory\n- **Contributors**: [CAMEL-AI.org](https://www.camel-ai.org/)\n\n### Datasets\n- **Primary dataset(s)**: loong-seed-graph-discrete-math \n- **Source links**: https://huggingface.co/datasets/camel-ai/loong\n- **Split sizes**: 53 training, 125 test\n\n### Task\n- **Type**: single-turn\n- **Parser**: The extractor returns the last occurrence of text following \"Final Answer:\" (case-insensitive) from the input string, or an empty string if none is found.\n- **Rubric overview**: The rubric awards a score of 1.0 if the model’s response and the reference answer are equivalent—recursively comparing numbers (with float tolerance), lists, tuples, and dictionaries—and 0.0 otherwise.\n\n| Criterion                   | Description                                                                                                                                   | Reward                                                                |\n| --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- |\n| **Float comparison**        | If both response and answer are floats, they are considered correct if they are equal within a small tolerance (`1e-6`) using `math.isclose`. | 1.0 if within tolerance, else 0.0                                     |\n| **List/Tuple comparison**   | Both must be lists or tuples of the same length, with elements recursively compared.                                                          | 1.0 if all elements match, else 0.0                                   |\n| **Dictionary comparison**   | Both must be dicts with the same keys; values are recursively compared.                                                                       | 1.0 if all values match, else 0.0                                     |\n| **String/Other comparison** | All other types are compared directly with `==`.                                                                                              | 1.0 if equal, else 0.0                                                |\n| **Parsing**                 | Both response and answer are parsed via `ast.literal_eval` if possible. If parsing fails, raw string comparison is used.                      | Enables structured comparisons for numbers, lists, tuples, and dicts. |\n| **Recursive checking**      | Nested structures are compared recursively, so lists/dicts containing floats or other lists/dicts are handled.                                | Ensures correctness at all levels of the data structure.              |\n| **Final reward**            | The function outputs `1.0` if `deep_compare` returns `True`, else `0.0`.                                                                      | Numerical reward for model evaluation.                                |\n\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval loong-seed-graph-discrete-math\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval loong-seed-graph-discrete-math   -m gpt-4.1-mini   -n 20 -r 3 -t 1024 -T 0.7   -a '{\"key\": \"value\"}'  # env-specific args as JSON\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Metrics\nSummarize key metrics your rubric emits and how they’re interpreted.\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Main scalar reward (weighted sum of criteria) |\n\n","encoding":"utf-8","truncated":false,"total_bytes":3991},"status":null}