Monitoring and Observability

This guide covers how to use Recuro’s built-in monitoring features to track the health of your crons and queues.

Dashboard overview

The Recuro dashboard (Dashboard in the sidebar) provides an at-a-glance view of your scheduling infrastructure:

Cron stats — Total crons, active vs. paused, recent execution counts
Queue stats — Total queues, active vs. inactive, job counts by status
Alert summary — Unread alerts count and recent alert activity
Activity chart — Executions and job runs over the last 7, 14, or 30 days
Response time stats — Average and p95 response times across crons and queues
Usage stats — Current plan, requests used vs. limit, projected monthly usage

Tracking success rates

Cron success rates

On the cron detail page, review the execution history:

Look for patterns in failures (time of day, day of week)
Check the last_status field — completed or failed
Monitor consecutive_failures — if it keeps climbing, something is persistently wrong
Use the alert_threshold to get notified before the failure count gets out of hand

Queue success rates

On the queue detail page, review job statistics:

Total jobs, completed, failed, pending, dead-lettered
Response time trends (sparklines on the queue list page)
DLQ depth — a growing DLQ indicates unresolved failures

Using completion callbacks for external monitoring

If you use an external monitoring system (Datadog, Grafana, PagerDuty), send execution results there via completion callbacks:

curl -X POST https://app.recurohq.com/api/crons \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Payment Sync",
    "url": "https://api.yourapp.com/sync",
    "cron_expression": "0 * * * *",
    "callback_url": "https://monitoring.yourapp.com/recuro-webhook"
  }'

Your callback endpoint receives a POST with the execution status, duration, and failure reason. Parse this to populate your monitoring dashboards.

Key metrics to watch

Metric	Where to find it	What it tells you
Consecutive failures	Cron detail page	Persistent endpoint issues
DLQ depth	Dead Letter Queue page	Unresolved job failures
Response time (p95)	Dashboard, cron/queue detail	Endpoint performance degradation
Usage percentage	Usage page	How close you are to your plan limit
Unread alerts	Alerts page	Unaddressed issues
Projected usage	Usage page	Whether you will hit your limit this month

Setting up proactive monitoring

Set alert thresholds on all critical crons (threshold 1 or 2)
Enable queue alerts on queues that must succeed
Configure notification channels (Slack for real-time, email for digest)
Create maintenance windows for planned deployments
Review the DLQ weekly and replay or purge stale jobs
Check the Usage page periodically to avoid hitting hard limits

Next steps

Setting Up Alerts — Configure alerting
Viewing and Debugging Runs — Inspect execution details
Alert Configuration — Avoid alert fatigue