{"data":{"kind":"file","path":"README.md","version_id":"hzljdga8o929rrmawptpt5s7","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2152,"modified_at":"2025-09-06T16:14:23.859000","content_hash":"6796eab63bf12428d2e6cecbbe75fadec8d83b70d1a532dc03f18a2d597fab55"},"entries":[],"content":"# Learning to Reason for Long-Form Story Generation\n\n### Overview\n- **Methodology**: Verifiable Rewards via Completion Likelihood Improvement (vrcli)\n- **Environment ID**: `vrcli-next-chapter-prediction`\n- **Short description**: Next chapter prediction for a story.\n- **Tags**: `next-chapter-prediction`, `long-story-generation`, `plan-generation`\n\n### Datasets\n- **Primary dataset(s)**: Synthetic data generated from raw books.\n- **Source links**: [Gutenberg](https://www.gutenberg.org/)\n- **Split sizes**: 80% train, 20% eval\n- **Synthetic Guide**: [Synthetic Guide](./prepare_data/README.md)\n\n### Task\n- **Type**: single-turn\n- **Parser**: ThinkParser, Custom Next Chapter Plan parser\n- **Rubric overview**: Verifiable Rewards via Completion Likelihood Improvement (VRCLI)\n\nTrain RL Model to generate plans A for the next chapter generation. Advantages is calculated by perplexity improvement of generating ground truth with and without plan A (include parsed metadata from previous chapters).\nUsing frozen model `PI` as chapter generator, soft reward is calculated by:\n\n```\nI_PI(x, y, a) = 100 * (1 - PPL(y|x, a) / PPL(y|x))\n```\n\nReward signal that RL Model receives is thresholded by:\n\n```\nR = 0 if I_PI(x, y, a) < 0.05\nR = 0.5 if 0.05 <= I_PI(x, y, a) < 0.1\nR = 0.9 if 1 <= I_PI(x, y, a) < 2\nR = 1.0 if I_PI(x, y, a) >= 2\n```\n\n`x` is previous story information, `y` is ground truth next chapter content, `a` is plan A generated by RL Model.\n\n### Quickstart\n\nInstall `vrcli` environment\n\n```bash\nvf-install vrcli\n```\n\nThis environment reward technique is to use an independent base model act as Next Chapter Predictor (ncp) to calculate perplexity of prompt to generate next chapter with/without plan (generated from reasoning model we are likely to train it).\n\nWe have to launch an independent vllm to serve that ncp model.\n\n```\nexport VRCLI_NCP_MODEL_NAME=\"google/gemma-3-270m-it\"\nexport VRCLI_NCP_BASE_URL=\"http://localhost:8000/v1\"\nvllm serve $(VRCLI_NCP_MODEL_NAME)\n```\n\nRun an evaluation:\n\n```bash\nvf-eval vrcli \\\n-s # optional: save output locally into a json file for later review\n```\n\nTo view locally saved evaluation:\n\n```bash\nvf-tui\n```","encoding":"utf-8","truncated":false,"total_bytes":2152},"status":null}