Take A Coffee

Overnight agent testing

Do not hand an agent the whole night without a harness.

Agentic apps are harder than web E2E: streaming output, fuzzy correctness, tool calls, state drift, and a host that may sleep while the run is active.

Scope

One behavior slice

Give the agent a narrow requirement, explicit write set, and a stop rule if the fix needs more files. Prevent overnight scope creep.

Signals

Transcript assertions

For streaming chat, assert state transitions, tool calls, guardrails, final receipt fields, and representative user turns instead of only snapshots.

Host

Heartbeat proof

Record host and worker heartbeats. If the laptop slept, network dropped, or the terminal died, the run should say that instead of pretending success.

Host heartbeat

Catch the boring failure before it wastes the night.

Run this next to the overnight test so sleep/wake, lid close, and network interruptions show up as missing timestamps.

while true; do date >> ~/overnight-agent-host.log; sleep 30; done

Stop rules for a useful morning.

Want Codex to handle the local host mode?

Take A Coffee gives Codex temporary awake mode, host checks, restore step, and a receipt for local macOS and Windows overnight runs.

Run it with Codex