{"data":{"kind":"file","path":"README.md","version_id":"wf0445ume0aagqtqp9otj1q2","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":2847,"modified_at":"2025-08-28T04:51:14.192000","content_hash":"1bb89a627e02730abfdea3c94a094b7d661edfca0b4f0b1b75fed8d4c0da7899"},"entries":[],"content":"# check-creativity\n\n### Overview\n- **Environment ID**: `check-creativity`\n- **Short description**: Evaluates the creative quality of text responses across multiple dimensions of creativity\n- **Tags**: creativity, writing, nlp, text-generation, single-turn, eval\n\n### Datasets\n- **Primary dataset(s)**: Creative writing prompts across 6 categories (storytelling, descriptive, poetry, philosophical, experimental, imaginative)\n- **Source links**: Built-in prompt dataset with ~48 unique creative prompts\n- **Split sizes**: Train: ~38 prompts, Eval: ~10 prompts\n\n### Task\n- **Type**: single-turn\n- **Parser**: CreativityParser (custom parser for extracting creative text)\n- **Rubric overview**: \n  - Lexical diversity (type-token ratio, unique words)\n  - Vocabulary richness (sophisticated word usage)\n  - Structural variety (sentence variation, punctuation)\n  - Originality (avoiding clichés, unique expressions)\n  - Coherence (relevance to prompt, appropriate length)\n  - Imagery (sensory details, descriptive language)\n  - Format compliance (meeting word count requirements)\n\n### Quickstart\nRun an evaluation with default settings:\n\n```bash\nuv run vf-eval check-creativity\n```\n\nConfigure model and sampling:\n\n```bash\nuv run vf-eval check-creativity   -m gpt-4.1-mini   -n 20 -r 3 -t 1024 -T 0.7   -a '{\"key\": \"value\"}'  # env-specific args as JSON\n```\n\nNotes:\n- Use `-a` / `--env-args` to pass environment-specific configuration as a JSON object.\n\n### Environment Arguments\nConfiguration options for customizing the creativity evaluation:\n\n| Arg | Type | Default | Description |\n| --- | ---- | ------- | ----------- |\n| `dataset_split` | str | `\"train\"` | Which dataset split to use (train/eval) |\n| `reward_weights` | dict | See below | Custom weights for creativity dimensions |\n\nDefault reward weights:\n- `w_entropy`: 1.0 - Word distribution entropy\n- `w_distinct`: 1.0 - Distinct word ratio\n- `w_uncommon`: 1.0 - Uncommon word usage\n- `w_bigrams`: 1.0 - Bigram diversity\n- `w_sentence_len_var`: 1.0 - Sentence length variance\n- `w_word_len_var`: 1.0 - Word length variance\n- `w_sentence_end_var`: 1.0 - Sentence ending diversity\n- `format`: 0.5 - Format compliance\n\n### Metrics\nKey metrics for evaluating creative text quality:\n\n| Metric | Meaning |\n| ------ | ------- |\n| `reward` | Weighted sum of all creativity dimensions |\n| `entropy` | Shannon entropy of word distribution |\n| `distinct_ratio` | Ratio of unique words to total words |\n| `uncommon_words` | Proportion of words not in common word list |\n| `bigram_diversity` | Ratio of unique bigrams to total bigrams |\n| `sentence_length_var` | Standard deviation of sentence lengths |\n| `word_length_var` | Standard deviation of word lengths |\n| `sentence_ending_var` | Entropy of sentence ending punctuation |\n| `format_compliance` | Meeting word count requirements (100-300 words) |\n\n","encoding":"utf-8","truncated":false,"total_bytes":2847},"status":null}