Startup Failures
Server won’t start: “Missing client for …”
.env or config.toml (see Configuration Guide).
Server won’t start: “JWT_SECRET must be set”
AUTH_USE_AUTH=true) but didn’t provide a JWT secret.
Fix: Generate a secret and set it:
AUTH_USE_AUTH=false
Runtime Errors
API returns “An unexpected error occurred” on every request
Cause: This is almost always a database issue. The health endpoint (/health) will return {"status": "ok"} even when the database is unreachable because it doesn’t check the database connection. The actual error appears in the server logs.
Common causes and fixes:
- Database is unreachable — Check that PostgreSQL is running and the
DB_CONNECTION_URIis correct - Migrations haven’t been run — The server starts successfully without tables, but every API call will fail. Run:
In Docker:
- pgvector extension not installed — The
vectorextension must be enabled in your database:
sqlalchemy.exc.OperationalError— database connection issuesqlalchemy.exc.ProgrammingErrorwith “relation does not exist” — migrations not runpsycopg.OperationalError— connection refused or authentication failed
Health check passes but API calls fail
The/health endpoint is a lightweight check that confirms the server process is running. It does not verify:
- Database connectivity
- That migrations have been run
- That LLM providers are reachable
Deriver not processing messages
Messages are stored but no observations, summaries, or representations are being generated. Common causes:-
Deriver isn’t running — In manual setup, the deriver is a separate process:
In Docker, it starts automatically via
docker compose up. -
Deriver can’t reach the database — Check deriver logs for connection errors. The deriver uses the same
DB_CONNECTION_URIas the API server. -
Missing LLM API key for deriver provider — By default the deriver uses Google Gemini (
LLM_GEMINI_API_KEY). Check deriver logs for API errors. -
Processing backlog — With
DERIVER_WORKERS=1(default), high message volume can cause a backlog. Increase workers: -
Representation Batch Max — By default the deriver buffers representation work until a session has enough tokens for that representation, set via
DERIVER_REPRESENTATION_BATCH_MAX_TOKENS. Sub-threshold tails become eligible afterDERIVER_REPRESENTATION_BATCH_MAX_AGE_SECONDS(default 1800 seconds), so quiet sessions eventually flush without disabling batching globally. Set the age to0for legacy behavior where sub-threshold tails wait indefinitely. See token batching for more details
Alternative Provider Issues
OpenRouter / custom provider not working
If calls to an OpenAI-compatible proxy fail:-
Verify the endpoint and key are set. Use
transport = "openai"with a base URL override: -
Check model names match the provider’s format. OpenRouter uses
vendor/modelformat (e.g.,anthropic/claude-haiku-4-5), not the raw model ID. - Ensure your model supports tool calling. The deriver, dialectic, and dream agents require tool use. Check the provider’s model page for tool calling support.
- Check server logs for the actual error. API errors from the upstream provider will appear in Honcho’s logs with the HTTP status code and message body.
vLLM / Ollama not responding
-
Verify the model server is running and accessible from the Honcho process (or container):
-
In Docker,
localhostinside a container doesn’t reach the host. Usehost.docker.internal(macOS/Windows) or the host’s network IP: - Structured output failures — vLLM’s structured output support is limited to certain response formats. If you see JSON parsing errors, check the deriver/dream logs for the raw response. See Deriver produces no observations below.
Deriver produces no observations
If messages are processed (the queue drains, no errors in logs) but peers never accumulate observations — and you’re using an OpenAI-compatible provider — the likely cause is that the provider doesn’t support OpenAI Structured Outputs (json_schema). The OpenAI backend requests json_schema by default; providers like Z.AI GLM and some Ollama/vLLM deployments either reject it or silently ignore it and return prose, which the deriver can’t parse into observations.
Fix: set STRUCTURED_OUTPUT_MODE=json_object on the deriver’s model config to request loose JSON mode, which injects the schema into the prompt instead:
DREAM_DEDUCTION_MODEL_CONFIG__STRUCTURED_OUTPUT_MODE).
Thinking budget errors with non-Anthropic providers
If you see errors likethinking budget not supported, invalid parameter, or silent failures where agents produce no output, one of your per-component *_MODEL_CONFIG__THINKING_BUDGET_TOKENS overrides is likely set to a value > 0 with a provider that doesn’t support Anthropic-style extended thinking. The built-in defaults do not set thinking budgets, so this only applies if you added those overrides yourself.
Fix: Set *_MODEL_CONFIG__THINKING_BUDGET_TOKENS=0 for every component when using models that don’t support thinking:
*_MODEL_CONFIG__THINKING_EFFORT instead of *_MODEL_CONFIG__THINKING_BUDGET_TOKENS.
Database Issues
Connection string format
The connection URI must use thepostgresql+psycopg prefix:
Checking migration status
Cache & Redis
Redis is optional
Redis is used for caching whenCACHE_ENABLED=true (default: false). If Redis is unreachable, Honcho gracefully falls back to in-memory caching and logs a warning. This means:
- The server and deriver will still start and function normally
- Performance may be reduced under high load without Redis
- You do not need Redis for local development or testing
Redis connection issues
If you see Redis connection warnings in logs butCACHE_ENABLED=false, they can be safely ignored. If you want caching:
Docker Issues
Docker build fails with permission errors
The Honcho Dockerfile uses BuildKit mount syntax and creates a non-rootapp user. Common build failures:
1. BuildKit not enabled
The Dockerfile uses RUN --mount=type=cache which requires Docker BuildKit. If you see syntax errors during build:
/etc/docker/daemon.json):
COPY, RUN, or when the container tries to access mounted volumes.
:z to volume mounts in docker-compose.yml:
app user, but docker-compose.yml.example mounts .:/app which overlays the container filesystem with host-owned files. The app user inside the container may not have permission to read them.
If you see permission errors at runtime (not build time), you can either:
- Run without the source mount (remove
- .:/appfrom volumes — the image already contains the code) - Or fix ownership:
sudo chown -R 100:101 .(matches theappuser inside the container)
Containers start but API fails
- Check container status:
docker compose ps - Check API logs:
docker compose logs api - Check database logs:
docker compose logs database - Ensure migrations ran:
docker compose exec api uv run alembic upgrade head
Port conflicts
If port 8000 is already in use:Rebuilding after code changes
Getting Help
If your issue isn’t covered here:- Check the logs — most issues are diagnosed from server or deriver logs
- GitHub Issues — Report bugs
- Discord — Join our community
- Configuration — See the Configuration Guide for all available settings