{"data":{"kind":"file","path":"README.md","version_id":"p56aal3glrguvmbd7ozkaalm","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":1818,"modified_at":"2026-05-28T19:40:16.444000","content_hash":"5924b16ca0864900c184fb541ba9b2506a30e2bb08e2680a43441794f40befd5"},"entries":[],"content":"# Scientific Math — CritPt Format\n\n304 supplementary problems from advanced mathematics, evaluated in the **exact CritPt two-step format**.\n\n## Format\n\nEvery problem follows the CritPt structure:\n\n1. **Free reasoning** — model thinks in natural language + LaTeX  \n2. **Strict code block** — model outputs a `def model_answer()` block:\n\n```python\nimport sympy as sp\nimport numpy as np\n\ndef model_answer() -> sp.Expr:\n    \"\"\"Return your answer as a SymPy expression.\"\"\"\n    # === YOUR SOLUTION CODE GOES HERE ===\n    return sp.simplify(...)\n```\n\n## Grading\n\nThe grader:\n1. Regex-extracts the last ` ```python def model_answer() ``` ` block\n2. Execs it in a sandbox (`__import__` allowed, no other builtins)\n3. Calls `model_answer()`\n4. Compares against the gold answer:\n   - **symbolic**: `sp.simplify(model - gold) == 0`\n   - **numeric**: `abs(complex(model - gold)) < 1e-6`\n   - **string**: normalized lowercase equality\n   - **tuple/dict**: element-wise\n\n## Adding new problems\n\nProblems in your JSONL must have these fields (from `supplementary-problems_qa_sympy_validated.jsonl`):\n\n```json\n{\n  \"id\": \"1.154.f\",\n  \"topic\": \"COMPLEX NUMBERS\",\n  \"prompt\": \"Perform the operation: ...\",\n  \"final_answer\": \"\\\\( \\\\frac{16}{5} - \\\\frac{2}{5}i \\\\)\",\n  \"answer_type\": \"symbolic\",\n  \"answer_fn\": \"Rational(16,5) - Rational(2,5)*I\"\n}\n```\n\nThe `answer_fn` field is the gold answer in `from sympy import *` notation — it's what the grader evals.\n\n## Setup & run\n\n```bash\n# Push to Prime registry (creates .prime/.env-metadata.json on success)\nprime env push --path ./scientific-math-critpt-env/\n\n# Smoke test (5 examples, 3 rollouts each)\nprime eval run sam-fraser/scientific-math-critpt \\\n  -m openai/gpt-4.1-mini -n 5 -r 3\n\n# Full eval\nprime eval run sam-fraser/scientific-math-critpt \\\n  -m <model-id> -n 50 -r 4\n```\n","encoding":"utf-8","truncated":false,"total_bytes":1818},"status":null}