Alerts

Build alerts from readiness first, then add cluster, capacity, and service-plane error signals.

Neuwerk exposes alerting signals, but it does not ship a bundled alert rule pack. Build alerts from the runtime surfaces that already exist.

First Alert To Create

If you only create one urgent alert per node, start with:

  • /ready returning 503

Readiness is the best first-line signal because it already combines several conditions that matter to operators.

Alert Classes

The most useful alert groups are:

  • node availability: /health and /ready
  • cluster health: cluster, policy_replication, and Raft-related metrics
  • dataplane capacity: active flows and NAT port utilization
  • service-plane failures: TLS interception and fail-closed counters
  • integration failures: integration error counters when cloud or other integrations are active

Better For Alerts

High-signal alert inputs include:

  • readiness failures
  • sustained cluster peer errors
  • sustained NAT port saturation
  • sustained svc_fail_closed_total
  • integration error counters

Better For Dashboards Or Investigation

These are usually better as dashboards or trend alerts than immediate pages:

  • total DNS query volume
  • total deny volume
  • general HTTP request volume
  • other workload-shaped counters

A spike in deny counters may mean a problem, or it may mean policy is working exactly as designed.

Use Stats And Audit After The Alert Fires

After an alert fires, move to:

  • GET /api/v1/stats
  • GET /api/v1/audit/findings

Those two surfaces help distinguish a component failure from a deliberate policy outcome.

If audit is unavailable, verify performance mode before assuming the audit subsystem is broken.