{"data":{"kind":"file","path":"README.md","version_id":"xg5dwt2uid4zsxtwqbwd0xuk","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2512,"modified_at":"2025-09-14T15:10:01.136000","content_hash":"4d53bb3f90d0da602d8c4043ffb4687986aa17788436877d011fd32356871801"},"entries":[],"content":"# finalyser-nighttime-lights\n\n### Overview\n- **Environment ID**: `finalyser-nighttime-lights`\n- **Short description**: Predicts national GDP from NASA nighttime satellite light imagery features using LLM-based reasoning.\n- **Tags**: economics, remote-sensing, gdp-prediction, reinforcement-learning, llm-evaluation\n\n### Datasets\n- **Primary dataset(s)**: [Country Nightlight Dataset (Kaggle)](https://www.kaggle.com/datasets/abhijeetdtu/country-nightlight-dataset)  \n  Contains yearly nighttime light satellite imagery for multiple countries, along with GDP values.\n- **Source links**:  \n  - [NASA Earth at Night](https://earthobservatory.nasa.gov/features/NightLights)  \n  - [Kaggle Dataset](https://www.kaggle.com/datasets/abhijeetdtu/country-nightlight-dataset)\n- **Split sizes**: Depends on user’s configuration. By default, an 80/20 split is applied between train and evaluation sets.\n\n### Task\n- **Type**: `single-turn`\n- **Parser**: `NightlightGDPParser` (custom)  \n  Extracts a numeric GDP prediction from raw model outputs.\n- **Rubric overview**:  \n  The rubric evaluates:  \n  - **Accuracy**: Relative error between predicted GDP and true GDP (scaled reward `1 - error`).  \n  - **Format**: Ensures the model produces a valid numeric output.  \n\n  Weighted reward: `reward = accuracy * 1.0 + format * 0.2`\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval finalyser-nighttime-lights\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval finalyser-nighttime-lights   -m gpt-4.1-mini   -n 20 -r 3 -t 1024 -T 0.7   -a '{\"key\": \"value\"}'  # env-specific args as JSON\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\n\n| Arg          | Type | Default   | Description |\n| ------------ | ---- | --------- | ----------- |\n| `root_path`  | str  | required  | Path to directory containing the nighttime light images. |\n| `max_examples` | int  | -1        | Limit on dataset size (use -1 for all). Useful for testing quickly. |\n| `test_size`  | float | 0.2       | Fraction of examples to allocate to the evaluation set. |\n| `seed`       | int  | 42        | Random seed for reproducible train/test splits. |\n\n### Metrics\n\n| Metric    | Meaning |\n| --------- | ------- |\n| `reward`  | Weighted scalar reward (combines accuracy and format). |\n| `accuracy`| Scaled score `1 - relative_error` between predicted GDP and ground truth. |\n| `format`  | Binary score (1 if output is numeric, 0 otherwise). |\n\n\n","encoding":"utf-8","truncated":false,"total_bytes":2512},"status":null}