{"data":{"kind":"file","path":"README.md","version_id":"pot4lry8w4u47hw2ktyubn6m","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":3748,"modified_at":"2026-01-27T13:06:21.498000","content_hash":"a4d40206a5b262d301f370c2a25ec9a08ad478ef31bdc9dc74f23034b29a51a6"},"entries":[],"content":"# FluxEM RL Environment\n\nA [Prime Intellect Verifiers](https://docs.primeintellect.ai/verifiers/overview) environment for training LLMs to use deterministic computational tools via reinforcement learning.\n\n## Overview\n\nFluxEM RL teaches models **when to delegate** computation to deterministic tools versus when to use reasoning or estimation. The environment features:\n\n- **25 core tools** across 8 domains (arithmetic, number theory, combinatorics, finance, chemistry, physics, geometry, statistics)\n- **Curriculum learning** with 4 stages: foundation → judgment → orchestration → mastery\n- **Multi-component rewards**: 40% correctness + 35% tool appropriateness + 25% reasoning quality\n- **Adversarial problems** (20%) that test judgment (e.g., problems that look computational but require pure reasoning)\n\n## Installation\n\n```bash\n# Local installation\nprime env install fluxem-rl --path ./environments/fluxem_rl\n\n# Or from Prime Intellect Hub (after pushing)\nprime env install hunterbown/fluxem-rl\n```\n\n## Usage\n\n### Training\n\n```bash\n# Run training with curriculum\nprime rl run configs/lab/fluxem_rl.toml\n```\n\n### Evaluation\n\n```bash\n# Evaluate a model\nprime eval run fluxem-rl -m openai/gpt-4o-mini -n 50\n\n# Or evaluate from hub\nprime eval run hunterbown/fluxem-rl -m Qwen/Qwen3-4B-Instruct-2507\n```\n\n### Python API\n\n```python\nfrom fluxem_rl import load_environment\n\n# Load with curriculum stage\nenv = load_environment(\n    n_problems=500,\n    stage=\"mastery\",  # foundation, judgment, orchestration, mastery\n    seed=42,\n)\n\n# Access dataset\nprint(f\"Dataset size: {len(env.dataset)}\")\n\n# Test tool execution\nfrom fluxem_rl.tools import execute_tool\nresult = execute_tool(\"arithmetic.multiply\", a=12345, b=6789)\nprint(result)  # 83810205\n```\n\n## Curriculum Stages\n\n| Stage | Description | Problem Types |\n|-------|-------------|---------------|\n| `foundation` | Basic tool use | Medical dosing, stoichiometry, cross-domain |\n| `judgment` | Context-dependent decisions | Precision vs estimation, trap problems |\n| `orchestration` | Multi-tool planning | Loan payoff, tool chaining, sensitivity analysis |\n| `mastery` | All types with adversarial | 20% trap problems requiring pure reasoning |\n\n## Problem Categories\n\n- **A. Precision Ambiguity**: Infer precision needs from context (medical vs road trip estimates)\n- **B. Hidden Complexity**: Problems that look simple but need computation (exponential growth, iterative loans)\n- **C. Multi-Step Reasoning**: Interleaved reasoning and computation (stoichiometry chains)\n- **D. Domain Recognition**: Cross-domain analogies (user growth as compound interest)\n- **E. Judgment Under Uncertainty**: Missing information, implicit constraints\n\n## Reward Function\n\n```\nReward = 0.4 * correctness + 0.35 * tool_appropriateness + 0.25 * reasoning_quality\n```\n\n### Tool Appropriateness Scoring\n\n| Expected Use | Used Tools | Score |\n|--------------|------------|-------|\n| Required | Yes | 1.0 |\n| Required | No | 0.0 |\n| Beneficial | Yes | 1.0 |\n| Beneficial | No | 0.5 |\n| Discouraged | No | 1.0 |\n| Discouraged | Yes | 0.3 |\n| Pure Reasoning | No | 1.0 |\n| Pure Reasoning | Yes | 0.0 |\n\n## Tool Format\n\nModels use XML-based tool calls:\n\n```xml\n<tool_call>\n<name>arithmetic.multiply</name>\n<args>{\"a\": 75, \"b\": 2}</args>\n</tool_call>\n```\n\nFinal answers use:\n\n```xml\n<answer>150 mg</answer>\n```\n\n## Training Configuration\n\nExample TOML config for Prime Intellect hosted training:\n\n```toml\nmodel = \"Qwen/Qwen3-4B-Instruct-2507\"\nmax_steps = 500\nbatch_size = 64\nrollouts_per_example = 8\n\n[sampling]\nmax_tokens = 1024\n\n[[env]]\nid = \"hunterbown/fluxem-rl\"\nargs = { n_problems = 500, stage = \"mastery\", max_turns = 6 }\n\n[val]\nnum_examples = 50\nrollouts_per_example = 4\ninterval = 25\n```\n\n## License\n\nMIT\n","encoding":"utf-8","truncated":false,"total_bytes":3748},"status":null}