{"data":{"kind":"file","path":"README.md","version_id":"wb52kpzonb3r1nk2hsiyhijp","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3474,"modified_at":"2026-02-11T22:16:11.991000","content_hash":"e4b99062235e37f0d60475fb5252d3331753ab1ad1a7d5255265c2664b759b4b"},"entries":[],"content":"# num-seq-env\n\n### Overview\n- **Environment ID**: `num-seq-env`\n- **Short description**: Inductive reasoning over numeric sequences governed by variable-order linear recurrence relations.\n- **Tags**: single-turn, math, reasoning, eval\n\n### Datasets\n- **Primary dataset(s)**: Programmatically generated. Each example contains consecutive terms from a sequence defined by an order-k linear recurrence `a(n) = c1*a(n-1) + c2*a(n-2) + ... + ck*a(n-k)`, where k is sampled uniformly from {2, 3, 4, 5} with randomized coefficients and initial values.\n- **Source links**: N/A (generated at load time)\n- **Split sizes**: 500 examples by default (configurable via `num_examples` argument)\n\n### Task\n- **Type**: single-turn\n- **Parser**: `XMLParser` with `<reasoning>` and `<answer>` fields\n- **Rubric overview**: Single `exact_match` reward function — 1.0 if the parsed `<answer>` matches the ground truth integer, 0.0 otherwise.\n\nThe model sees consecutive terms from a known position in the sequence (e.g., \"terms 10 through 20\") and is asked to compute a specific term by its absolute position. The target term may be **before or after** the shown window. A successful model will likely first identify the underlying recurrence relation — including its order — from the given terms, and then use that relation to compute the requested term.\n\n**Shown terms and identifiability.** `max_k` (default 5) is the maximum recurrence order. By default, `2*max_k + 1` terms are shown (11 for max_k=5). An order-k recurrence has k unknown coefficients; fitting it to L consecutive terms yields L-k equations in k unknowns. Showing at least 2k terms guarantees the system is (over-)determined so the coefficients — and therefore every future and past term — are uniquely recoverable. Showing `2*max_k + 1` terms ensures this holds for all k up to max_k.\n\n**Generation-time checks.** Each generated sequence is validated before inclusion:\n- **Genuine order check**: the k x k Hankel determinant of the shown terms is verified to be non-zero, confirming the sequence is truly order k which implies it is not expressible by a shorter recurrence and the coefficients are uniquely determinable.\n- **Periodicity handling**: if the characteristic polynomial has roots on the unit circle (roots of unity by Kronecker's theorem), the sequence is periodic. In this case, the sequence is rejected.\n\n### Quickstart\n\n```bash\nprime eval run num-seq-env\n```\n\nConfigure model and sampling:\n\n```bash\nprime eval run num-seq-env -m gpt-4.1-mini -n 20 -r 3 -t 1024 -T 0.7\n```\n\nPass environment-specific args:\n\n```bash\nprime eval run num-seq-env -a '{\"num_examples\": 100, \"seed\": 123, \"min_k\": 2, \"max_k\": 5}'\n```\n\n### Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `num_examples` | int | `500` | Number of dataset examples to generate |\n| `seed` | int | `42` | Random seed for reproducible dataset generation |\n| `min_k` | int | `2` | Minimum recurrence order |\n| `max_k` | int | `5` | Maximum recurrence order |\n\n### Baseline Results\n\n| Model | Accuracy | Details |\n| ----- | -------- | ------- |\n| `gpt-4.1-mini` (default) | 59.2% | `prime eval run num-seq-env` with no CLI overrides; num_examples (250) and rollouts_per_example (1) determined by the environment's `pyproject.toml` and using default env args |\n\n### Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `exact_match` | 1.0 if parsed answer matches ground truth, 0.0 otherwise |\n","encoding":"utf-8","truncated":false,"total_bytes":3474},"status":null}