Observability
Use health, readiness, metrics, stats, audit, and wiretap together to understand node health and enforcement behavior.
Neuwerk exposes several different observability surfaces because no single endpoint answers every operational question.
The main ones are:
/healthfor process liveness/readyfor operational readiness/metricsfor Prometheus metrics/api/v1/statsfor a compact runtime snapshot/api/v1/audit/findingsfor structured deny history/api/v1/wiretap/streamfor live traffic observation- process logs for runtime detail
Audit and wiretap depend on performance mode. If performance mode is disabled, those API surfaces
return 503 until it is enabled again.
Start Here First
When you need a quick health check, use this order:
GET /healthGET /readyGET /metricsGET /api/v1/stats
That sequence separates “is the process up” from “is the node actually ready to enforce traffic”.
Liveness And Readiness
/health answers whether the management process is alive.
/ready answers whether the node is ready to do useful work. It returns 200 only when the
dataplane, policy state, DNS proxy, service plane, and any cluster-specific checks are in a usable
state.
Important readiness checks include:
dataplane_runningdataplane_configpolicy_readydns_allowlistservice_planedrainingclusterpolicy_replication
Metrics
/metrics is the raw Prometheus surface. Use it for:
- dashboards
- alerting
- capacity trend analysis
- error-rate monitoring
Metrics are best for trends and saturation. They are not always the fastest path to an incident
answer, which is why /ready and /api/v1/stats are better first checks.
Runtime Snapshot
/api/v1/stats is the fastest compact view of the running node.
Use it when you want:
- dataplane counters
- DNS state summary
- TLS and service-plane state
- cluster catch-up context
It is easier to interpret during incident response than parsing the entire Prometheus surface.
Audit And Wiretap
Audit and wiretap answer different questions:
- audit shows persisted structured findings about denies and selected auth events
- wiretap shows live traffic observation
Use audit when you need evidence of repeated policy outcomes. Use wiretap when you need to confirm what is happening right now on a live path.
Both surfaces are intentionally gated by performance mode. Read Performance Mode if those workflows are unavailable.
Logs
Logs remain the best source for startup failures, component crashes, and explicit runtime errors.
Use JSON logs when you need to ship events into an external logging system. Use compact logs when you need local readability.
Practical Rule
Start with readiness, confirm with stats, then use metrics, audit, wiretap, and logs to narrow the problem. That order usually gets you to the right runtime faster than jumping straight into raw counters.