Skip to content

[Feature]: Honor severity label for firing alerts and include annotations in output #112

@Inderdeep01

Description

@Inderdeep01

Describe the feature request

Description

Currently, check_prometheus determines the Icinga exit status based solely on the alert state:

  • firing → CRITICAL (exit 2)
  • pending → WARNING (exit 1)
  • inactive → OK (exit 0)

This ignores the severity label that Prometheus/Alertmanager/vmalert rules commonly use to indicate the intended severity level. As a result, firing alerts with severity=warning are reported as CRITICAL in monitoring systems like Icinga, causing unnecessary noise and incorrect escalations.

Additionally, the output only includes labels but not annotations (summary, description), which are often essential for on-call engineers to understand and triage alerts quickly.

Problem

Example current output:

[CRITICAL] - 17 Alerts: 17 Firing - 0 Pending - 0 Inactive
_ [CRITICAL] [XXXXXServiceLowHealthyRatio] is firing - value: 0.00 - {...,"severity":"warning"}

The alert has severity=warning in its labels, but the plugin returns CRITICAL because the alert is firing.

Impact:

  • Warning-level alerts create CRITICAL noise in monitoring systems
  • Paging and escalation paths don't align with actual alert severity
  • Alert annotations (summary/description) are not visible in the output

Proposed Solution

1. Map severity label to exit status for firing alerts

For firing alerts, check the severity label and map it to the appropriate exit code:

Severity Label Exit Status
critical (default) CRITICAL (2)
warning, warn WARNING (1)
info, informational OK (0)

The lookup should check alert-level labels first, then fall back to rule-level labels.

Proposed change to internal/alert/alert.go:

func (a *Rule) GetStatus() (status int) {
    state := a.AlertingRule.State

    switch state {
    case string(v1.AlertStateFiring):
        status = check.Critical
    case string(v1.AlertStatePending):
        status = check.Warning
    case string(v1.AlertStateInactive):
        status = check.OK
    default:
        status = check.Unknown
    }

    // Honor severity label for firing alerts
    if state == string(v1.AlertStateFiring) {
        severity := ""
        if a.Alert != nil {
            if v, ok := a.Alert.Labels["severity"]; ok {
                severity = strings.ToLower(string(v))
            }
        }
        if severity == "" {
            if v, ok := a.AlertingRule.Labels["severity"]; ok {
                severity = strings.ToLower(string(v))
            }
        }
        switch severity {
        case "warning", "warn":
            return check.Warning
        case "info", "informational":
            return check.OK
        case "critical":
            return check.Critical
        }
    }

    return status
}

2. Include annotations in output

Append summary and description annotations to the alert output for better context:

// In GetOutput(), after writing labels:

// Append common annotations for clarity in downstream UIs.
if summary, ok := a.Alert.Annotations["summary"]; ok {
    out.WriteString(fmt.Sprintf(" - summary: %s", strings.ReplaceAll(string(summary), "\n", " ")))
}
if description, ok := a.Alert.Annotations["description"]; ok {
    out.WriteString(fmt.Sprintf(" - description: %s", strings.ReplaceAll(string(description), "\n", " ")))
}

Expected Output After Change

[WARNING] - 17 Alerts: 17 Firing - 0 Pending - 0 Inactive
_ [WARNING] [XXXXXServiceLowHealthyRatio] is firing - value: 0.00 - {...,"severity":"warning"} - summary: Healthy ratio below threshold - description: Service healthy ratio is 0%

Alternatives Considered

  1. CLI flag to enable severity mapping - Could add --honor-severity flag to make this opt-in for backwards compatibility

Use Case

We use vmalert to evaluate Prometheus-style alerting rules and surface them in Icinga using check_prometheus. Our alerting rules use severity=warning for non-critical alerts that should not page on-call engineers. Without this change, all firing alerts appear as CRITICAL in Icinga regardless of their intended severity.

Additional Context

This follows the common pattern used by Alertmanager and most Prometheus alerting setups where the severity label indicates the intended alert level. Supporting this label would make check_prometheus more compatible with standard Prometheus/Alertmanager workflows.

I have a working implementation of this and can submit a PR if you're interested.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions