{"data":{"kind":"file","path":"README.md","version_id":"j8k467xg3lud24l6ahw43ug8","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":6263,"modified_at":"2026-02-13T13:47:51.239000","content_hash":"c30a6759282e485e07ee784b7584bb9bafa3ed5e5d3ad6a011bdfd4c44911220"},"entries":[],"content":"# npv-calculator\n\nSingle-turn **RL environment** for **Net Present Value (NPV)** analysis. This package is **only the environment** (dataset, prompt builder, rubric); training and evaluation are run by Prime Intellect (`prime eval run`, `prime rl run`) or other trainers.\n\n**Environment criteria:** **Dataset** — bundled `npv_dataset.json` (tasks with prompt + ground truth); train/eval split via `num_eval_examples`. **Harness** — single-turn chat (one prompt → one response; `vf.SingleTurnEnv`). **Rubric** — binary (1.0/0.0) or continuous (0–1) reward; optional state columns `binary_correct`, `reasoning_points`.\n\nNPV measures profitability by comparing the present value of cash inflows and outflows; positive NPV → accept (adds value), negative NPV → reject (destroys value). The task is:\n\n1. **Calculating NPV** from discount rate and cash flows  \n2. **Deciding ACCEPT or REJECT** based on NPV sign  \n3. **Explaining reasoning** in structured JSON  \n\n## Reward (0–1)\n\n**Default (Prime-aligned): binary reward.**  \n- Rubric returns **1.0** only when NPV is within a relative tolerance (default 1%) of the true NPV *and* the decision (ACCEPT/REJECT) is correct; otherwise **0.0**. Reasoning is **not** part of the reward (reduces reward hacking).  \n\n**Optional: continuous reward** (`use_binary_reward=False`).  \n- **NPV accuracy** (70% of raw): exponential decay with relative error  \n- **Correct decision** (20% of raw): ACCEPT/REJECT matches true NPV sign  \n- **Reasoning quality** (10% of raw): keyword heuristic (discount rate, value creation, etc.); for analysis only in binary mode  \n\nRaw score 0–100 is normalized to 0–1 for the Verifiers rubric.\n\n## Dataset\n\n**2,068 problems** in total (bundled as `npv_dataset.json`): **~96.7% synthetic**, **~2.7% real** (transcribed from published worked examples), and **~0.6% inspired** (scenario from a named source; NPV verified). Each problem has project name, discount rate, cash flows, and horizon.\n\n**Synthetic generation (rigorous and objective):** The ~2,000 synthetic problems are produced by the parent repo’s `dataset.py` with a **seeded RNG** (reproducible). For each problem, cash flows and discount rate are drawn from defined ranges; the **true NPV is computed deterministically** with the standard formula NPV = Σ_t CF_t / (1 + r)^t in code (`calculate_npv`). The correct decision (ACCEPT/REJECT) follows from the sign of that NPV. Because NPV is a mathematically well-defined problem, there is a single correct answer per input—no subjective grading. To regenerate or extend the dataset, run `python dataset.py` from the parent repo.\n\n## Prompt & answer schema\n\n- **User prompt**: Project name, discount rate (%), cash flows list, and instructions to return JSON.\n- **Expected answer**: Valid JSON with `npv` (number), `decision` (\"ACCEPT\" or \"REJECT\"), and `reasoning` (string). Markdown code blocks are accepted.\n\n## Quickstart\n\n```bash\n# Install (from repo root or lab)\nprime env install npv-calculator -p ./environments\n\n# Run evaluation (Prime Inference: no separate API key after prime login)\nprime eval run npv-calculator -m google/gemini-2.5-flash -n 5 -r 1\n```\n\nUse any [Prime Inference](https://docs.primeintellect.ai/tutorials-environments/evaluating) model (e.g. `google/gemini-2.5-flash`, `openai/gpt-4.1-mini`), or custom APIs via an endpoint registry (`-e ./configs`), or one-off `-b` / `-k`. See the main NVP_ENV README section \"Connecting any LLM\" for details.\n\nDefault eval uses **50 examples** (see `[tool.verifiers.eval] num_examples` in `pyproject.toml`). Override with `-n` or env args, e.g.:\n\n```bash\nprime eval run npv-calculator -m <model> -n 20 -a '{\"num_examples\": 20}'\n```\n\n```python\nfrom npv_calculator import load_environment\n\nenv = load_environment()  # or load_environment(dataset_path=\"...\", num_examples=100)\n# env is a vf.SingleTurnEnv\n```\n\n## Environment Arguments\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `dataset_path` | str or Path | `None` | Path to `npv_dataset.json`; if `None`, uses bundled dataset |\n| `num_examples` | int | `-1` | Max number of problems (when not using train/eval split) |\n| `num_train_examples` | int | `-1` | Max training examples; `-1` = use all (or all minus eval holdout) |\n| `num_eval_examples` | int | `-1` | If > 0, hold out this many for eval (train/eval split with `eval_seed`) |\n| `eval_seed` | int | `42` | Seed for train/eval split when `num_eval_examples` > 0 |\n| `use_binary_reward` | bool | `True` | If `True`, rubric returns 1.0 or 0.0 (Prime-aligned). If `False`, continuous 0–1. |\n| `npv_tolerance` | float | `0.01` | Relative tolerance for \"correct\" NPV in binary mode (e.g. 0.01 = 1%). |\n\n## Metrics\n\n| Metric | Meaning |\n| ------ | ------- |\n| `npv_reward` | Reward in [0, 1]. **Binary (default):** 1.0 if NPV within tolerance and decision correct, else 0.0. **Continuous:** NPV accuracy (70% of raw) + decision (20%) + reasoning (10%). |\n\n## State columns (saved evals)\n\nFor analysis, you can save extra fields with `prime eval run ... --save-results -C binary_correct,reasoning_points`:\n\n| Column | Meaning |\n| ------ | ------- |\n| `binary_correct` | Whether NPV and decision were correct (binary mode). |\n| `reasoning_points` | 0–10 keyword-based reasoning score (analysis only; not part of reward). |\n\n## Hosted Training tips\n\n- **Baseline:** Run a baseline eval first; Prime recommends baseline reward in the **10–80%** band. If 0%, try a Thinking model or relax `npv_tolerance`; if 80%+, consider harder examples.\n- **Thinking models:** For math/reasoning, use a Thinking variant (e.g. `Qwen/Qwen3-4B-Thinking-2507`) in your `prime rl run` config.\n\n## Requirements\n\n- Python ≥3.10  \n- `verifiers>=0.1.8`  \n- `datasets>=2.14.0`  \n\n## Changelog\n\n#### v0.1.0\n- Initial release: SingleTurnEnv with bundled NPV dataset, composite reward (NPV accuracy + decision + reasoning).\n- **Binary reward (default):** Rubric returns 1.0 or 0.0 by default (Prime Intellect–aligned); reasoning excluded from reward. Use `use_binary_reward=False` for continuous 0–1.\n\n## Links\n\n- [Prime Intellect Environments](https://docs.primeintellect.ai/tutorials-environments/environments)  \n- [Verifiers](https://github.com/PrimeIntellect-ai/verifiers)  \n","encoding":"utf-8","truncated":false,"total_bytes":6263},"status":null}