Contact Points & Alert Routing¶
Our alerting architecture utilizes Label-Based Routing. Instead of hardcoding where an alert should be delivered
inside the alert definition itself, Grafana uses a central policy to route the alerts to the correct contact points
based on labels (team, environment, severity).
See the Register Team guide to configure your team settings. This ensures your contact points are provisioned automatically.
Benefits:
- Decoupled: Alert definitions are clean and solely focused on the metric condition.
- Scalable: Changing a team's MS Teams channel updates all their alert routing instantly without requiring a redeployment of hundreds of alert rules.
- Predictability: Alert routing is based on a pre-defined framework. This ensures alerts are predictable and consistent accross teams.
How the Routing Works Under the Hood¶
Warning
Experimental! Currently, only routing based on the team label parent tree is
implemented. This section explains the intended routing policy we are working towards.
Grafana processes incoming alerts using a Notification Policy. The tree evaluates alerts using a Parent/Child hierarchy:
flowchart TD
alert
team
sev[Severity]
bh["Blackhole:<br> No (further) alerting"]
env[Which env?]
prd[Production]
chan_acc[#Acceptance Alerts Channel]
chan_prd[#Production Alerts Channel]
chan_oncall[Page On-call engineer]
alert --> team[Always send to<br> team channel] --> sev
sev --S1 / S2--> env
sev --S3 / S4--> bh
env -- Acceptance --> chan_acc
env --> Dev/Test --> bh
env --> prd
prd --> chan_prd
prd --> chan_oncall
classDef blackhole fill:red,stroke:darkred,color:black,stroke-width:3px;
class bh blackhole;
-
The Team Match (Parent): The system looks at the
teamlabel. If it matches cloud_platform_team, it drops the alert into the Cloud Platform branch. By default, it intends to send this to the default contact point. -
The Environment Override (Child): Before sending, the
environmentlabel is checked. If theenvironmentisdev,tstoracc, and the team has configured a DTA channel, Grafana intercepts overrides the destination to the non-prod channel. -
The Severity Match (Parent): The
severitylabel is evaluated. Ifseveritymatches S3/S4, no further alerting is performed. If theseveritymatches S1/S2, the alert continues to the child policies. -
The Environment Override (Child): Before sending, the
environmentlabel is checked. If theenvironmentisdevortst, no further alerting is performed. Alerts foraccare forwarded to a general #Acceptance Alerts channel. Production alerts (prd) are forwareded to the general #Production Alerts Channel and the OpsGenie to page an on-call engineer.