Skip to content

Contact Points & Alert Routing

Our alerting architecture utilizes Label-Based Routing. Instead of hardcoding where an alert should be delivered inside the alert definition itself, Grafana uses a central policy to route the alerts to the correct contact points based on labels (team, environment, severity).

See the Register Team guide to configure your team settings. This ensures your contact points are provisioned automatically.

Benefits:

  • Decoupled: Alert definitions are clean and solely focused on the metric condition.
  • Scalable: Changing a team's MS Teams channel updates all their alert routing instantly without requiring a redeployment of hundreds of alert rules.
  • Predictability: Alert routing is based on a pre-defined framework. This ensures alerts are predictable and consistent accross teams.

How the Routing Works Under the Hood

Warning

Experimental! Currently, only routing based on the team label parent tree is implemented. This section explains the intended routing policy we are working towards.

Grafana processes incoming alerts using a Notification Policy. The tree evaluates alerts using a Parent/Child hierarchy:

flowchart TD

alert

team

sev[Severity]

bh["Blackhole:<br> No (further) alerting"]

env[Which env?]

prd[Production]


chan_acc[#Acceptance Alerts Channel]

chan_prd[#Production Alerts Channel]

chan_oncall[Page On-call engineer]


alert --> team[Always send to<br> team channel] --> sev

sev --S1 / S2--> env

sev --S3 / S4--> bh

env -- Acceptance --> chan_acc

env --> Dev/Test --> bh

env --> prd

prd --> chan_prd

prd --> chan_oncall

classDef blackhole fill:red,stroke:darkred,color:black,stroke-width:3px;

class bh blackhole;
  • The Team Match (Parent): The system looks at the team label. If it matches cloud_platform_team, it drops the alert into the Cloud Platform branch. By default, it intends to send this to the default contact point.

  • The Environment Override (Child): Before sending, the environment label is checked. If the environment is dev, tst or acc, and the team has configured a DTA channel, Grafana intercepts overrides the destination to the non-prod channel.

  • The Severity Match (Parent): The severity label is evaluated. If severity matches S3/S4, no further alerting is performed. If the severity matches S1/S2, the alert continues to the child policies.

  • The Environment Override (Child): Before sending, the environment label is checked. If the environment is dev or tst, no further alerting is performed. Alerts for acc are forwarded to a general #Acceptance Alerts channel. Production alerts (prd) are forwareded to the general #Production Alerts Channel and the OpsGenie to page an on-call engineer.