Layer 8 · Autonomous Response

CyGuru

Alert in, remediation out. With dedup, approval gates, and full audit.

CyGuru

CyGuru is the alert-driven self-heal pipeline. Two detection inputs (Alertmanager webhook + 60s SelfMonitorLoop probes), one dedup layer (60-minute window), one approval gate (default OFF for destructive), one executor. Handlers ship for pod crashloops, MongoDB rolling restarts, node memory pressure cordoning, and stale OVN chassis recovery. Every action audited; every action reversible.

Gallery

See Cyguru in action.

Use Cases

Where CyGuru wins.

1

Pod crashloop auto-recovery

Crashloop >3 restarts in 5min → CyGuru proposes pod-restart with audit context. Admin approves; executor runs; success metric exported to Prometheus.

2

MongoDB rolling restart on outage

Mongo replica becomes unreachable → CyGuru proposes safe rolling restart. Avoids 3am pages for the SRE on-call.

3

Node memory pressure auto-cordon

Node >90% memory → cordon + drain non-critical pods. New schedules go elsewhere. Operator notified, not paged.

4

OVN stale chassis cleanup

OVN chassis registered by UUID instead of hostname → known production bug → CyGuru auto-fixes via node-agent restart.

5

Custom handlers for your stack

Implement RemediationHandler interface. CyGuru handles dedup, approval, audit, metric emission. You write the fix logic.

6

Compliance-driven approval workflows

Different remediations route to different approvers. PHI-touching = compliance officer. Network = security. Etc.

Key Capabilities

What's inside.

🔁

60-min dedup window

Prevents alert storms from generating remediation storms.

Approval-first default

Default policy: human approves before execution. Trust earned over time, narrow LOW_RISK allowlist.

📈

Full Prometheus metrics

proposed / deduped / approved / executed counters; duration histograms; auto-dashboards in Grafana.

🤖

Two detection inputs

Alertmanager webhook (push) + 60s SelfMonitorLoop (pull). Different remediation triggers, same pipeline.

📜

Reversible by design

Every executor action has a rollback plan or is non-destructive. Audit log captures both.

🔧

Pluggable handlers

Implement Java RemediationHandler interface; CyGuru loads it. No fork required.

vs competition: Replaces PagerDuty toil + custom remediation scripts. Not a replacement for capacity planning or root-cause analysis.
Full battle card →

Ready to see CyGuru in action?

30-day proof of concept on two idle servers. We bring the SE. You bring the use case.