Health Check
Warning
Claude Code Auth
OAuth valid, 23h remaining
Remediation (Related Failure Group)
QdrantContainer restart → docker restart qdrant-server
🟢Docker Enginerunning12 containers up
🟢database-servicerunninguptime 14d
🟢workflow-engine-1runninguptime 14d
🟢SSH Connectionrunninghost OK
🟢API Gatewayrunninglocal endpoint
🟢workflow-engine Workflowrunninglocal endpoint
🟢Dashboardrunninglocal endpoint
🟢Supervisor Agent Standbyrunning14 modules
🟢Ollamarunning3 models loaded
🟢VOICEVOXrunninglocal endpoint
🟢Whisperrunninglocal endpoint
🟢Connectionrunning248MB / 14 tables
🟢pg_dumprunninglast: 04:00 161MB
🟢Qdrant Serverrunninglocal endpoint
🔴Qdrant-reindexdownIndex rebuild stopped
🟢Internal SSDrunning31% used
🟡Swap usagewarning0.3GB / threshold 0.5GB
🟢External SSDrunning31% used (661GB free)
Last Check: 2026/04/05 12:34:56
Recent Incidents
✓C2-3Qdrant connection timeout12sAuto2026/04/05 08:12
✓A1-2database-service OOM restart45sAuto2026/04/04 23:41
✗B2-1Ollama inference timeout(30s exceeded)0sManual2026/04/04 15:22
✓D1-1tunnel-wfe connection down8sAuto2026/04/03 19:55
✓C1-1PostgreSQL connection pool exhaustion3sAuto2026/04/03 07:14
✓A1-3MinIO health check failed15sAuto2026/04/02 11:30
Knowledge Base
| Code | Incident | Recovery | Result | Duration | Reuse | Auto |
|---|---|---|---|---|---|---|
| C2-3 | Qdrant connection timeout | docker restart qdrant-server | success | 12s | 5 | |
| A1-2 | database-service OOM restart | docker restart database-service | success | 45s | 3 | |
| B2-1 | Ollama inference timeout | ollama serve restart | failed | 0s | 0 | |
| D1-1 | tunnel-wfe connection down | cloudflared restart | success | 8s | 7 | |
| C1-1 | PostgreSQL connection pool exhaustion | pg_terminate idle conn | success | 3s | 2 | |
| A1-3 | MinIO health check failed | docker restart minio | success | 15s | 1 |
Thresholds
system_resources
service_health
Incident Log
6
Auto Recovery
health_monitor
container_restart
2026/04/05 08:12:34
Qdrant connection timeout detected. Container restart performed.
recovery: docker restart qdrant-server (12sec)
Auto Recovery
health_monitor
oom_restart
2026/04/04 23:41:18
database-service OOM Killed detected. Auto restart performed.
recovery: docker restart database-service (45sec)
Manual
ollama_monitor
inference_timeout
2026/04/04 15:22:05
Ollama inference 30sec timeout. Manual intervention needed.