{"data":{"kind":"file","path":"README.md","version_id":"hruhod4neyldvuwohkrf415x","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":13416,"modified_at":"2025-11-16T10:33:41.859000","content_hash":"fff9666db25f15a98548cd7f69b76eae08908be529dfdc772f594fd72802bbc0"},"entries":[],"content":"# C2S Age Prediction Environment\n\nA verifiers-compatible reinforcement learning environment for training language models to predict donor age from cell gene expression data.\n\n**🌐 Available on Prime Intellect Hub:** `your-desired-username/c2s-age-env`\n\n```bash\nprime env install your-desired-username/c2s-age-env\n```\n\n## Overview\n\nThis environment uses cell-to-sentence (C2S) representations of gene expression data along with donor metadata to train models for age prediction. The reward function is based on Mean Absolute Error (MAE), encouraging accurate age predictions.\n\n## Dataset\n\nUses the AIDA Asian PBMC cell age-related dataset:\n- Dataset: `transhumanist-already-exists/aida-asian-pbmc-cell-age-related-cell-sentence-balanced-120k`\n- Contains gene expression data converted to sentences\n- Includes metadata: sex, tissue, cell type, ethnicity, disease status, smoking status\n\n## Features\n\n- **MAE-based reward**: Normalized reward in [0, 1] range based on prediction accuracy\n- **Flexible dataset size**: Configure number of samples for quick testing or full training\n- **Gene count control**: Limit number of genes in prompt to manage token usage\n- **Rich metadata**: Incorporates multiple biological and demographic features\n- **Single-turn completion**: Optimized for efficient training with completion API\n\n## Installation\n\n### Prerequisites\n\n- Python 3.11+\n- uv package manager\n\n### Setup\n\n1. Install uv (if not already installed):\n```bash\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n```\n\n2. Install the environment:\n```bash\ncd age\nuv pip install -e .\n```\n\n3. Install prime CLI (for deployment):\n```bash\nuv tool install prime\nprime login\n```\n\n## Usage\n\n### Quick Test\n\nTest the environment loads correctly:\n\n```bash\npython test_env.py\n```\n\nOr test programmatically:\n\n```bash\npython -c \"from age_predict import load_environment; env = load_environment(ds_size=10); print(env)\"\n```\n\n### Training with Prime-RL\n\nPrime-RL provides asynchronous, distributed RL training at scale. It uses three separate TOML configs:\n- **orchestrator.toml**: Environment and rollout configuration\n- **trainer.toml**: PPO training parameters and FSDP settings\n- **inference.toml**: vLLM inference server configuration\n\n#### 1. Install Prime-RL\n\n```bash\ncd ../..  # Go to parent of rl_envs\ncurl -sSL https://raw.githubusercontent.com/PrimeIntellect-ai/prime-rl/main/scripts/install.sh | bash\n```\n\n#### 2. Install Environment from Hub\n\n```bash\ncd prime-rl\nprime env install your-desired-username/c2s-age-env\n```\n\nThis downloads the environment from the Hub. No local setup needed!\n\n#### 3. Development Training (Quick Test)\n\nUse the `-dev` configs for fast iteration with small dataset:\n\n```bash\ncd prime-rl\nuv run rl \\\n  --trainer @ ../rl_envs/age/prime-rl/trainer-dev.toml \\\n  --orchestrator @ ../rl_envs/age/prime-rl/orchestrator-dev.toml \\\n  --inference @ ../rl_envs/age/prime-rl/inference-dev.toml \\\n  --trainer-gpus 1 --inference-gpus 1\n```\n\nThis runs on 100 samples with reduced settings for quick testing.\n\n#### 4. Full-Scale Training\n\nUse the production configs for full training:\n\n```bash\ncd prime-rl\nuv run rl \\\n  --trainer @ ../rl_envs/age/prime-rl/trainer.toml \\\n  --orchestrator @ ../rl_envs/age/prime-rl/orchestrator.toml \\\n  --inference @ ../rl_envs/age/prime-rl/inference.toml \\\n  --trainer-gpus 2 --inference-gpus 6\n```\n\n**GPU Allocation:**\n- `--trainer-gpus`: GPUs for FSDP training (2+ recommended for 27B model)\n- `--inference-gpus`: GPUs for vLLM rollout generation (6+ recommended)\n\n#### 5. Optional: Weights & Biases Logging\n\nAdd W&B tracking:\n\n```bash\nuv run rl \\\n  --trainer @ ../rl_envs/age/prime-rl/trainer.toml \\\n  --orchestrator @ ../rl_envs/age/prime-rl/orchestrator.toml \\\n  --inference @ ../rl_envs/age/prime-rl/inference.toml \\\n  --trainer-gpus 2 --inference-gpus 6 \\\n  --wandb.project age-prediction-rl \\\n  --wandb.name my-experiment\n```\n\n#### 6. Monitor Training\n\nPrime-RL provides:\n- **TensorBoard**: `tensorboard --logdir checkpoints/age-rl`\n- **Checkpoints**: Saved in `checkpoints/age-rl/` (configured in trainer.toml)\n- **Console logs**: Real-time training metrics\n\n### Alternative: Prime CLI (Simple)\n\nFor simpler single-command training (without prime-rl):\n\n```bash\nprime env init c2s-age-env\nprime env test --config configs/dev.toml\nprime train --config configs/train.toml\n```\n\n### Share Your Environment\n\nPush to the Environments Hub:\n\n```bash\nprime env push\n```\n\nOr keep it private:\n\n```bash\nprime env push --visibility PRIVATE\n```\n\n## Configuration\n\n### Simple Configs (`configs/`)\n\nFor use with Prime CLI or basic testing:\n\n- `dev.toml`: Quick testing with 100 samples, 500 genes\n- `train.toml`: Full training configuration\n- `eval.toml`: Evaluation with 1000 samples\n\n### Prime-RL Configs (`prime-rl/`)\n\nFor distributed RL training with prime-rl:\n\n**Production:**\n- `orchestrator.toml`: Full dataset, 8 rollouts per example\n- `trainer.toml`: PPO training with FSDP, full fine-tuning\n- `inference.toml`: vLLM server with 2-GPU tensor parallelism\n\n**Development:**\n- `orchestrator-dev.toml`: 100 samples, 4 rollouts (fast testing)\n- `trainer-dev.toml`: 100 training steps, frequent checkpointing\n- `inference-dev.toml`: Single GPU, smaller batches\n\n### Key Environment Parameters\n\n- `max_genes_in_prompt`: Maximum genes to include (default: 1000)\n- `ds_size`: Dataset size, null for full dataset\n- `max_age_error`: Maximum age error for reward normalization (default: 60.0)\n- `split`: Dataset split to use (train/test)\n- `rollouts_per_example`: Number of model samples per prompt (RL exploration)\n\n## Reward Function\n\nThe MAE reward function provides normalized rewards:\n\n```\nreward = 1 - min(|predicted_age - true_age|, max_age_error) / max_age_error\n```\n\n- Perfect prediction: reward = 1.0\n- Error >= max_age_error: reward = 0.0\n- Invalid/unparseable prediction: reward = 0.0\n\n## Model\n\nRecommended starting model:\n- `transhumanist-already-exists/C2S-Scale-Gemma-2-27B-age-prediction-fullft`\n- Uses full fine-tuning (no LoRA as per requirements)\n\n## Project Structure\n\n```\nage/\n├── age_predict.py          # Main environment implementation\n├── __init__.py             # Package initialization\n├── pyproject.toml          # UV project dependencies\n├── test_env.py             # Test suite for environment\n├── README.md               # This file\n├── .gitignore              # Git ignore rules\n├── configs/                # Simple configs (Prime CLI)\n│   ├── dev.toml           # Development config\n│   ├── train.toml         # Training config\n│   └── eval.toml          # Evaluation config\n├── prime-rl/               # Prime-RL configs (distributed RL)\n│   ├── orchestrator.toml      # Production orchestrator\n│   ├── trainer.toml           # Production trainer\n│   ├── inference.toml         # Production inference\n│   ├── orchestrator-dev.toml  # Dev orchestrator\n│   ├── trainer-dev.toml       # Dev trainer\n│   └── inference-dev.toml     # Dev inference\n├── AGENTS.md              # Original requirements\n└── Create_and_upload_an_environment.md\n```\n\n## Implementation Details\n\n### Dataset Processing\n\nEach example is converted to:\n- **prompt**: Formatted text with gene sentence and metadata\n- **answer**: Ground truth age as string\n- **info**: Dictionary with numeric age and metadata for reward calculation\n\n### Prompt Format\n\n```\nThe following is a list of {num_genes} gene aging related gene names\nfrom Open Genes database ordered by descending expression level\nin a {organism} cell.\n\nSex: {sex}\nSmoking status: {smoking_status}\nTissue: {tissue}\nCell type: {cell_type}\nAging related cell sentence: {gene_sentence}\nPredict the Age of the donor from whom these cells were taken.\nAnswer only with age value in years:\n```\n\n### System Prompt\n\n```\nYou are an expert aging biologist.\nGiven gene expression summaries and metadata,\npredict the donor's age in years as a number only.\n```\n\n## Development\n\n### Running Tests\n\n```bash\nuv pip install -e \".[dev]\"\npytest\n```\n\n### Code Formatting\n\n```bash\nblack age_predict.py\nruff check age_predict.py\n```\n\n## Prime-RL Training Details\n\n### Architecture\n\nPrime-RL uses an asynchronous architecture:\n\n1. **Inference Server (vLLM)**: Generates rollouts at high throughput\n2. **Trainer (FSDP)**: Updates model with PPO algorithm\n3. **Orchestrator**: Coordinates between inference and training\n\n### Why Prime-RL?\n\n- **Scale**: Train on 1000+ GPUs with distributed FSDP\n- **Speed**: Asynchronous rollout generation with vLLM\n- **Flexibility**: Full control over PPO hyperparameters\n- **Checkpointing**: Resume from any training step\n\n### Training Process\n\n1. Orchestrator samples prompts from environment\n2. vLLM inference server generates multiple completions per prompt\n3. Environment evaluates completions and returns rewards\n4. Trainer updates model using PPO on collected rollouts\n5. Repeat\n\n## Checkpoint Management\n\n### Space-Efficient Checkpoints\n\nPrime-RL checkpoints are saved in HuggingFace format (compatible with vLLM). To save disk space:\n\n**Current settings:**\n- `save_total_limit = 1`: Keeps only the last checkpoint\n- `save_safetensors = true`: Uses efficient safetensors format\n- `save_optimizer = true/false`: Toggle optimizer state saving\n\n**Disk usage per checkpoint (27B model):**\n- Model weights only: ~54 GB (safetensors)\n- With optimizer states: ~108 GB\n- With scheduler: ~110 GB\n\n**To save more space:**\n\nEdit `prime-rl/trainer.toml`:\n```toml\n[checkpointing]\nsave_total_limit = 1  # Only last checkpoint\nsave_safetensors = true  # Efficient format\nsave_optimizer = false  # Skip optimizer (can't resume training!)\n```\n\n⚠️ If `save_optimizer = false`, you can do inference but NOT resume training.\n\n### Checkpoint Location\n\nCheckpoints are saved to:\n- Production: `../../prime-rl/checkpoints/age-rl/checkpoint-{step}/`\n- Development: `../../prime-rl/checkpoints/age-rl-dev/checkpoint-{step}/`\n\n### Finding Latest Checkpoint\n\n```bash\n# Production\nls -lt ../../prime-rl/checkpoints/age-rl/ | head\n\n# Development\nls -lt ../../prime-rl/checkpoints/age-rl-dev/ | head\n```\n\n## Evaluating Checkpoints\n\n### Quick Evaluation with vLLM\n\nUse the provided evaluation script for fast, batched inference:\n\n```bash\npython eval_checkpoint.py \\\n  --checkpoint ../../prime-rl/checkpoints/age-rl/checkpoint-1500 \\\n  --num-samples 1000 \\\n  --batch-size 32\n```\n\n**Options:**\n- `--checkpoint`: Path to checkpoint directory\n- `--num-samples`: Number of test samples (default: 1000)\n- `--batch-size`: Inference batch size (default: 32)\n- `--tensor-parallel-size`: GPUs for model parallelism (default: 1)\n- `--temperature`: Sampling temperature (default: 0.0 for deterministic)\n- `--output`: Save metrics to JSON file\n\n**Example with multi-GPU:**\n```bash\npython eval_checkpoint.py \\\n  --checkpoint ../../prime-rl/checkpoints/age-rl/checkpoint-2000 \\\n  --num-samples 5000 \\\n  --batch-size 64 \\\n  --tensor-parallel-size 2 \\\n  --output results/eval-checkpoint-2000.json\n```\n\n### Evaluation Metrics\n\nThe script reports:\n- **Mean/Median Reward**: Normalized reward (0-1 scale)\n- **MAE (Mean Absolute Error)**: Average age prediction error in years\n- **Invalid Rate**: Percentage of unparseable predictions\n- **Example predictions**: First 10 predictions with errors\n\n### Loading Checkpoints for Other Uses\n\nCheckpoints are standard HuggingFace format and work with:\n\n**vLLM:**\n```python\nfrom vllm import LLM\n\nllm = LLM(\n    model=\"../../prime-rl/checkpoints/age-rl/checkpoint-1500\",\n    tensor_parallel_size=2,\n    dtype=\"bfloat16\"\n)\n```\n\n**Transformers:**\n```python\nfrom transformers import AutoModelForCausalLM\n\nmodel = AutoModelForCausalLM.from_pretrained(\n    \"../../prime-rl/checkpoints/age-rl/checkpoint-1500\",\n    torch_dtype=\"bfloat16\"\n)\n```\n\n**Push to HuggingFace Hub:**\n```python\nfrom transformers import AutoModelForCausalLM\n\nmodel = AutoModelForCausalLM.from_pretrained(\"./checkpoints/age-rl/checkpoint-1500\")\nmodel.push_to_hub(\"your-username/age-prediction-rl-v1\")\n```\n\n## Troubleshooting\n\n### Environment Not Found\n\nIf prime-rl can't find the environment:\n```bash\n# Reinstall from Hub\nprime env install your-desired-username/c2s-age-env\n```\n\n### GPU Memory Issues\n\nReduce batch sizes in configs:\n- `trainer.toml`: Decrease `per_device_train_batch_size`\n- `inference.toml`: Lower `gpu_memory_utilization` to 0.7\n\n### Slow Rollout Generation\n\nIncrease inference GPUs or reduce:\n- `orchestrator.toml`: Lower `max_concurrent`\n- `orchestrator.toml`: Reduce `rollouts_per_example`\n\n### Disk Space Issues\n\n1. Set `save_total_limit = 1` in trainer.toml\n2. Set `save_optimizer = false` (inference only)\n3. Delete old checkpoints: `rm -rf checkpoints/age-rl/checkpoint-*`\n4. Use checkpoint-specific dirs and mount separate disk\n\n## References\n\n- [Verifiers Documentation](https://docs.primeintellect.ai/verifiers)\n- [Prime-RL Documentation](https://docs.primeintellect.ai/verifiers/training)\n- [Prime-RL GitHub](https://github.com/PrimeIntellect-ai/prime-rl)\n- [Verifiers GitHub](https://github.com/PrimeIntellect-ai/verifiers)\n- [Dataset on HuggingFace](https://huggingface.co/datasets/transhumanist-already-exists/aida-asian-pbmc-cell-age-related-cell-sentence-balanced-120k)\n- [Model on HuggingFace](https://huggingface.co/transhumanist-already-exists/C2S-Scale-Gemma-2-27B-age-prediction-fullft)\n\n## License\n\nAdd your license information here.\n\n## Contributing\n\nAdd contribution guidelines here.\n","encoding":"utf-8","truncated":false,"total_bytes":13416},"status":null}