Skip to content

Alert Configuration

This guide covers how to configure alerts effectively so you get notified about real problems without being overwhelmed by noise.

Set meaningful thresholds

For crons, the alert_threshold determines how many consecutive failures trigger a failure alert:

ThresholdBest for
1Critical health checks where any failure requires immediate attention
2Important crons where a single transient failure is acceptable
3Non-critical crons where intermittent failures are expected

A threshold of 3 means you only get alerted when something is persistently broken, not when a single request times out due to a network blip.

Avoid over-alerting

Common mistakes that lead to alert fatigue:

  • Threshold of 1 on every cron — You get alerted on every transient failure. Reserve threshold 1 for truly critical monitors.
  • Alerts enabled on every queue — If you have high-volume queues, individual job failures may be normal. Enable alerts only on queues where failures are unexpected.
  • No maintenance windows — If you deploy frequently, temporary failures during deployment trigger alerts. Create maintenance windows for planned downtime.

Use warning alerts as early signals

Recuro sends a warning alert on the first failure in a sequence (before the threshold is reached). Use these as early signals in your dashboard without configuring them to notify externally. Review the Alerts page periodically to spot emerging patterns.

Combine alerts with success assertions

Instead of just alerting on HTTP errors, add success assertions to catch subtle failures:

{
"success_assertions": [
{ "type": "status_code_equals", "value": "200" },
{ "type": "body_contains", "value": "\"healthy\":true" },
{ "type": "response_time_under_ms", "value": "5000" }
],
"alert_threshold": 2
}

This catches cases where your endpoint returns 200 but the response indicates a degraded state or slow performance.

Separate notification channels by severity

If you have both critical and non-critical crons, consider using separate teams:

  • Production-critical team → Slack alerts to #oncall-alerts
  • Background tasks team → Email alerts (checked during business hours)

This prevents non-critical alerts from drowning out critical ones.

Use maintenance windows

Before planned downtime:

  1. Go to Maintenance Windows
  2. Create a window covering the deployment or maintenance period
  3. Scope it to the affected cron or queue (or leave unscoped for all resources)

This suppresses alerts during the window, preventing false alarms.

Next steps