{"data":{"kind":"file","path":"README.md","version_id":"kio3c4kael4o8ta89euz9mnm","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":8472,"modified_at":"2026-01-06T07:02:46.406000","content_hash":"f15294e29924ab4c7539559e4fdb2e207f1c5703d45e0ac31ef003be86e80d54"},"entries":[],"content":"<p align=\"center\">\n</p>\n\n<p align=\"center\">\n  <img src=\"https://github.com/user-attachments/assets/40c36e38-c5bd-4c5a-9cb3-f7b902cd155d#gh-light-mode-only\" alt=\"Prime Intellect\" width=\"312\">\n  <img src=\"https://github.com/user-attachments/assets/6414bc9b-126b-41ca-9307-9e982430cde8#gh-dark-mode-only\"  alt=\"Prime Intellect\" width=\"312\">\n</p>\n\n---\n\n<h3 align=\"center\">\nPRIME-RL: Async RL Training at Scale\n</h3>\n\n---\n\n</br>\n<p align=\"center\">\n  <a href=\"https://github.com/PrimeIntellect-ai/prime-rl/actions/workflows/style.yaml\">\n    <img src=\"https://github.com/PrimeIntellect-ai/prime-rl/actions/workflows/style.yaml/badge.svg\" alt=\"Style\" />\n  </a>\n  <a href=\"https://github.com/PrimeIntellect-ai/prime-rl/actions/workflows/cpu_tests.yaml\">\n    <img src=\"https://github.com/PrimeIntellect-ai/prime-rl/actions/workflows/cpu_tests.yaml/badge.svg\" alt=\"Test\" />\n  </a>\n  <a href=\"https://github.com/PrimeIntellect-ai/prime-rl/actions/workflows/gpu_tests.yaml\">\n    <img src=\"https://github.com/PrimeIntellect-ai/prime-rl/actions/workflows/gpu_tests.yaml/badge.svg\" alt=\"Test\" />\n  </a>\n</p>\n\n## Overview\n\nPRIME-RL is a framework for large-scale asynchronous reinforcement learning. It is designed to be easy-to-use and hackable, yet capable of scaling to 1000+ GPUs. Beyond that, here is why we think you might like it:\n\n1. Integrates natively with [`verifiers`](https://github.com/PrimeIntellect-ai/verifiers) environments via the [Environments Hub](https://app.primeintellect.ai/dashboard/environments?ex_sort=most_stars)\n2. Supports end-to-end post-training, including SFT and RL training and evals\n3. Multi-node deployment with [FSDP2](https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html) training and [vLLM](https://github.com/vllm-project/vllm) inference backend\n4. Designed for asynchronous training in decentralized settings\n5. Hackable, modular and extensible by nature\n\n## Setup\n\n> *We develop and test on NVIDIA RTX 3090/4090/5090, A100, H100, H200, and B200. If your setup fails, please create an [issue](https://github.com/PrimeIntellect-ai/prime-rl/issues).*\n\n### Prerequisites\n\nCurrently, you **need at least one NVIDIA GPU to use PRIME-RL**. If you don't already have access to one, we recommend our [compute platform](https://app.primeintellect.ai) for everything from renting on-demand single GPUs for developing, debugging and small ablations, to [reserving 1000+ GPU clusters](https://app.primeintellect.ai/dashboard/quotes) for production-scale training.\n\n### Quick Setup\n\nSet up PRIME-RL in a single command.\n\n```bash\ncurl -sSL https://raw.githubusercontent.com/PrimeIntellect-ai/prime-rl/main/scripts/install.sh | bash\n```\n\n<details>\n<summary>\nManual Setup\n</summary>\n<br>\n\n1. Clone the repository\n\n```bash\ngit clone https://github.com/PrimeIntellect-ai/prime-rl.git\ncd prime-rl\n```\n\n2. Install [uv](https://docs.astral.sh/uv/)\n\n```bash\ncurl -LsSf https://astral.sh/uv/install.sh | sh\nsource $HOME/.local/bin/env\n```\n\n3. Install dependencies from the lock file\n\n```bash\nuv sync\n```\n\n3.1. Optional: Install Flash Attention 3 (on Hopper GPUs only, for flash_attention_3 attention backend)\n\n> *NOTE*: This step will take a while, as it builds the Flash Attention 3 extension from source, as it has no wheels prebuilt.\n> *NOTE*: After this step, you can't run `uv sync` or `uv run` as it will uninstall the package, you can avoid it by running `uv sync --inexact` or `uv run --no-sync`\n\n```bash\nuv pip install \"flash-attn-3 @ git+https://github.com/Dao-AILab/flash-attention.git@main#subdirectory=hopper\" --no-build-isolation\n```\n\n</details>\n\n<details>\n<summary>\nValidate your environment setup\n</summary>\n<br>\n\n1. Check that the environment uses Python 3.12\n\n```bash\nuv run python -V\n```\n\n2. Check that `flash-attn` is installed\n\n```bash\nuv run python -c \"import flash_attn\"\n```\n\n3. Check that you can run SFT trainer  (*this requires 1 GPU*)\n\n```bash\nuv run sft @ configs/debug/sft/train.toml\n```\n\n4. Check that you can run the RL trainer (*this requires 1 GPU*)\n\n```bash\nuv run trainer @ configs/debug/rl/train.toml\n```\n\n5. Check that you can run the inference server (*this requires 1 GPU*)\n\n```bash\nuv run inference @ configs/debug/infer.toml\n```\n\n*Keep the inference server running in the background for the next steps.*\n\n5.1. Check that you can run the orchestrator against the inference server\n\n```bash\nuv run orchestrator @ configs/debug/orch.toml\n```\n\n5.2. Check that you can run evals against the inference server\n\n```bash\nuv run eval @ configs/debug/eval.toml\n```\n\n</details>\n\n### Additional Setup\n\n1. If you want to log your runs to [W&B](https://wandb.ai), log in\n\n```bash\nuv run wandb login\n# Or set `export WANDB_API_KEY=...`\n```\n\n2. If you require gated/ private models or datasets from [HuggingFace](https://huggingface.co), log in\n\n```bash\nuv run hf auth login\n# Or set `export HF_TOKEN=...`\n```\n\n## Training Examples\nWe provide end-to-end training examples in the [`examples`](examples) directory to highlight features of the framework and guide you through the process of training your own models.\n1. [**Reverse Text**](examples/reverse_text/README.md): Train `Qwen3-0.6B` to reverse a small chunk of text. Demonstrates tiny-scale single-turn SFT and RL training. Can be trained on a single consumer GPU in a few minutes, and is ideal for getting started.\n2. [**Wordle**](examples/wordle/README.md): Train `Qwen3-1.7B` to play Wordle. A fun example of multi-turn SFT and RL training. Can be trained on a 2-4 H100 GPUs in a few hours. Ideal for exploring the multi-turn training capabilities of the framework.\n3. [**Alphabet Sort**](examples/alphabet_sort/README.md): Train `Qwen3-4B-Instruct-2507` to sort names alphabetically. Demonstrates multi-turn RL training via LoRA without SFT warmup. Can be trained on a single H100 GPU in just over an hour. Ideal for exploring LoRA-based training.\n4. *More to come...*\n\n## Docs\n\nCheck out the [docs](docs) directory for in-depth guides on how to use PRIME-RL.\n\n- [**Entrypoints**](docs/entrypoints.md) - Overview of the main components (orchestrator, trainer, inference) and how to run SFT, RL, and evals\n- [**Configs**](docs/configs.md) - Configuration system using TOML files, CLI arguments, and environment variables\n- [**Environments**](docs/environments.md) - Installing and using verifiers environments from the Environments Hub\n- [**Async Training**](docs/async.md) - Understanding asynchronous off-policy training and step semantics\n- [**Logging**](docs/logging.md) - Logging with loguru, torchrun, and Weights & Biases\n- [**Checkpointing**](docs/checkpointing.md) - Saving and resuming training from checkpoints\n- [**Benchmarking**](docs/benchmarking.md) - Performance benchmarking and throughput measurement\n- [**Deployment**](docs/deployment.md) - Training deployment on single-GPU, multi-GPU, and multi-node clusters\n- [**Troubleshooting**](docs/troubleshooting.md) - Common issues and their solutions\n\n## Contributing\n\nWe warmly welcome community contributions! We use [issues](https://github.com/PrimeIntellect-ai/prime-rl/issues) to track bugs, feature requests, and share our internal roadmap. If you encounter bugs, have pain points during development, or have ideas for new features, please open an issue.\n\nContributions are welcome via PR. Please follow these guidelines:\n1. Install the [pre-commit hooks](#pre-commit-hooks) to ensure your code is formatted correctly.\n2. Please keep your PR in \"Draft\" until it is ready for review.\n3. If your PR resolves an issue, please link the issue in the PR description\n4. If you can, try running the [test suite](#tests) locally to ensure your changes are working as expected.\n\n### Pre-Commit Hooks\n\nPlease install the [pre-commit](https://pre-commit.com) hooks to ensure your code is formatted correctly.\n\n```bash\nuv run pre-commit install\n```\n\n### Tests\n\nRun the full test suite \n\n```bash\nuv run pytest -v\n```\n\nTo run unit tests, run\n\n```bash\nuv run pytest tests/unit -v\n```\n\nTo run integration tests, run\n\n```bash\nuv run pytest tests/integration -v\n```\n\nTo run CPU-only tests, use the inverse of the `gpu` marker:\n\n```bash\nuv run pytest -v -m \"not gpu\"\n```\n\n## License\n\nThis project is licensed under the Apache 2.0 license, as found in the [License](LICENSE) file.\n\n## Citation\n\nIf you find our work useful, feel free to cite it using\n\n```tex\n@misc{primeintellect2025prime-rl,\n  author = {Prime Intellect},\n  title = {PRIME-RL},\n  url = {https://github.com/PrimeIntellect-ai/prime-rl},\n  year = {2025}\n}\n```\n","encoding":"utf-8","truncated":false,"total_bytes":8472},"status":null}