{"data":{"kind":"file","path":"README.md","version_id":"uejc4x6aif69aevkhp982xak","entry":{"name":"README.md","path":"README.md","is_directory":false,"size":6899,"modified_at":"2026-05-12T19:27:06.135000","content_hash":"eb9d35597d2419d3c256468a305a0812b8b19a2a43a5a12498a0a54a38a64d3d"},"entries":[],"content":"# Slack Offline Environment for Chromium-Based WootzApp APK\n\nThis package runs Slack-style browser tasks inside a Chromium-based APK customized for WootzApp offline-page behavior on an Android emulator. It is self-contained for local evaluation: Android World is included in `android_world/`, the offline Slack workspace files are included in `wootzapp_files/`, and the eval tasks are included in `datasets/`.\n\n## Layout\n\n```text\nandroid_world/                         Android emulator, customized Chromium APK, CDP bridge, noVNC\nwootzapp_files/offline_slack_workspace Offline Slack workspace files loaded by the APK\ndatasets/slack_offline_tasks.jsonl     Evaluation tasks\nscripts/setup_env.sh                   Create the Python environment and install this package\nscripts/build_browser.sh               Build the local Android emulator/browser image\nscripts/start_browser.sh               Start the Android emulator and browser APK from the local image\nscripts/run_eval.sh                    Run the Slack offline eval tasks\nscripts/stop_browser.sh                Stop the emulator container\noutputs/                               Local eval and task traces\n```\n\n## Output Directory Format\n\nEach task gets its own output folder directly under `outputs/`. Browser traces and Prime eval artifacts are separated inside that task folder.\n\n```text\noutputs/\n  slack_northstar_retrieval_001_arzooo_replies/\n    task/\n      step_00_initial_get_agent_observation.json\n      step_00_initial_env_observation.json\n      step_01_after_action_get_agent_observation.json\n      step_01_after_action_env_observation.json\n      step_*_get_agent_observation.json\n      step_*_env_observation.json\n    evals/\n      slack-offline-env--gpt-4.1-mini/\n        <run_id>/\n          metadata.json\n          results.jsonl\n          env_server.log\n          env_worker_0.log\n```\n\n`task/` contains two observation logs for each step:\n\n- `*_get_agent_observation.json`: raw `ChromiumRL.getAgentObservation` output from the browser.\n- `*_env_observation.json`: the environment observation used by the agent, including Slack message grouping and Slack pane scroll state.\n\n`evals/` contains Prime's eval output for that same task: metadata, result rows, and server/worker logs.\n\nThe eval script keeps Prime upload enabled. Prime first writes its normal local upload folder under:\n\n```text\noutputs/evals/<env-model>/<run_id>/\n```\n\nAfter Prime uploads from that default folder, the script copies the same eval artifacts into the matching task folder:\n\n```text\noutputs/<task_id>/evals/<env-model>/<run_id>/\n```\n\nIf you run `prime eval run` manually without the provided scripts, Prime may write to its default output path instead. Use `scripts/run_eval.sh` for the task-grouped layout above.\n\n## Requirements\n\n- Docker with Compose v2\n- `/dev/kvm` available on the host\n- Python environment with `prime` installed\n- `OPENAI_API_KEY` set for model calls and optional LLM judge calls\n\n## Quick Start\n\n```bash\n./scripts/setup_env.sh\nsource .venv/bin/activate\ncp .env.example .env\n# edit .env and set OPENAI_API_KEY\n./scripts/build_browser.sh\n./scripts/run_eval.sh\n```\n\n`build_browser.sh` builds the local Android World image. Run it once, and rerun it only after changing files under `android_world/`.\n\n`run_eval.sh` starts the Chromium-based WootzApp APK in the emulator, prepares the packaged offline Slack workspace for the APK, and runs each task one by one. It does not build the Docker image.\n\n## Runtime Configuration\n\nThe browser runtime is configured in `docker-compose.slack.yml`.\n\n```text\nBROWSER=wootzapp\nCDP_SOCKET_NAME=wootzapp_world_devtools\nWOOTZ_AUTO_OPEN_OFFLINE_PAGES=1\nEMULATOR_PROFILE=mobile\nEMULATOR_HEADLESS=0\n```\n\nImportant options:\n\n- `BROWSER=wootzapp` selects the Chromium-based WootzApp APK as the browser inside Android.\n- `WOOTZ_AUTO_OPEN_OFFLINE_PAGES=1` enables the APK's offline Slack URL handling.\n- `EMULATOR_PROFILE=mobile` runs the emulator in mobile profile.\n- To use tablet layout instead, change `EMULATOR_PROFILE=mobile` to `EMULATOR_PROFILE=tablet` in `docker-compose.slack.yml`.\n- `EMULATOR_HEADLESS=0` keeps the emulator visible through VNC/noVNC.\n\n## Browser Runtime\n\nBuild the local browser/emulator image:\n\n```bash\n./scripts/build_browser.sh\n```\n\nStart only the browser/emulator:\n\n```bash\n./scripts/start_browser.sh\n```\n\n`start_browser.sh` requires the local image `slack-android-world:local` to already exist. It starts the container with `docker compose up --no-build --no-recreate`, so browser startup does not rebuild the Android image.\n\nUseful endpoints:\n\n```text\nnoVNC: http://localhost:6080\nCDP:   ws://localhost:39224\nAPI:   http://localhost:45000/health\nADB:   localhost:5555\n```\n\nAccess noVNC from the same machine:\n\n```text\nhttp://localhost:6080\n```\n\nAccess noVNC from your laptop when this runs on a remote server:\n\n```bash\nssh -N -L 6080:localhost:6080 ubuntu@YOUR_SERVER_IP\n```\n\nThen open this on your laptop:\n\n```text\nhttp://localhost:6080\n```\n\nStop the browser/emulator:\n\n```bash\n./scripts/stop_browser.sh\n```\n\n## How The Offline Browser Uses Slack Files\n\nThe browser is a Chromium-based APK with WootzApp-specific offline page handling running inside Android World. The packaged offline Slack workspace lives in:\n\n```text\nwootzapp_files/offline_slack_workspace\n```\n\nThe scripts prepare those files automatically. The APK then resolves Slack URLs such as:\n\n```text\nhttps://app.slack.com/client/TFE44LJJU/CSKLF1W6P\n```\n\nto the packaged offline workspace. The agent sees and controls the page through CDP using the environment tools.\n\n## Evaluation Flow\n\n```text\nscripts/run_eval.sh\n  -> scripts/start_browser.sh\n  -> docker compose builds/runs android_world\n  -> offline Slack workspace is prepared automatically\n  -> prime eval run slack-offline-env\n  -> slack_offline_env.py loads tasks\n  -> browser_env.py creates the browser environment\n  -> browser_mode.py controls the Chromium-based APK through CDP\n  -> rubric.py scores the final answer and Slack-side actions\n  -> outputs/<task_id>/task/ stores browser traces\n  -> outputs/<task_id>/evals/<env-model>/<run_id>/ stores Prime eval artifacts\n```\n\nRun a subset of tasks:\n\n```bash\nSTART_TASK=1 END_TASK=3 ./scripts/run_eval.sh\n```\n\nChange model or turns:\n\n```bash\nMODEL=gpt-4.1-mini MAX_TURNS=20 ./scripts/run_eval.sh\n```\n\nRun without restarting the browser:\n\n```bash\nSTART_BROWSER=false ./scripts/run_eval.sh\n```\n\n## Files That Matter\n\n- `slack_offline_env.py`: loads the dataset and builds the verifier environment.\n- `browser_env.py`: defines the browser-agent prompt and connects tool state to the environment.\n- `browser_mode.py`: manages CDP, browser actions, observations, offline workspace setup, and browser APK control.\n- `rubric.py`: scores retrieval answers, Slack actions, safety behavior, and optional LLM judge checks.\n- `datasets/slack_offline_tasks.jsonl`: task definitions and hidden scoring metadata.\n","encoding":"utf-8","truncated":false,"total_bytes":6899},"status":null}