Azure Alerts

When you start going live with resources in any production environment, it is of absolute importance that proper alerting is defined and implemented prior to doing so.

In this blog post we will cover basic Azure alerting fundamentals, along with how to define alerts in code and deploy using Azure pipeline in an automated fashion.

Fundamentals

Azure Alerts allow you to configure alerts that will notify people via e-mail, SMS, Slack, Teams, etc. It also provides the ability to run an automated workflow (Logic App) when a certain alert is triggered to perform remediation.

There are some common alerting elements that you need to understand to be successful, I will cover them here.

Action Groups

Action Groups are used to define who and what will be notified when an alert is triggered. You can create as many unique Action Groups as you require to fit your requirements.

You can use e-mail, SMS, webhooks and more. You can also notify based on Azure Resource Manager Role, which will e-mail people assigned to specific roles at the subscription level (eg. Monitoring Reader).

Alert Rules

Alert Rules are the actual alerts that you build with the criteria you define. They will notify the Action Group(s) that is associated with the Alert Rule. There are several different types of alert rules:

  • Metric
  • Log search (Kusto / KQL)
  • Activity log

Some alerts are stateless and some are stateful. When an alert is stateless you will need to manually mark it as resolved after it fires. Stateful alerts will fire once and then resolve themselves based on specifc interval assuming the alert condition is not met during that period.

Create a Basic Alert

You will need an Action Group before you can create an Alert Rule. Create a test one before continuing.

  1. Go to the Azure portal and search for Alerts in the global search bar.

  2. Click on + New alert rule.

  3. Under scope select the resource you’d like to target, in my example I am using a Storage account.

  4. Add a condition, in my example I trying to get an alert when the specific Storage account is deleted.

  5. Select your Action Group(s).

  6. Give the Alert Rule a name, description and select a Resource group where you want to store the alert.

  7. Click Create alert rule.

By default Alert Rules are hidden resources in Azure, when you go to the Resource group you will need to check off Show hidden types to see it.

Inspect ARM Template

After you create an Alert Rule it is helpful to view the ARM template that is generated to understand the deployment and to assist with automating it in the future.

  1. Go to the Azure portal and search for Alerts in the global search bar.

  2. Click Manage alert rules.

  3. Find the rule you just create and click it.

  4. When it opens click Properties.

  1. Go to Export template and view the structure of the Alert.

Automating Alerts End to End

Using what you learned above, you can create your own ARM template that can be used to deploy the necessary Action Group(s) and Alert Rules.

I’ve created an example to illustrate this, edit it to your liking, you can source it from my GitHub repo. Be sure to update the receiverEmail in the template and subscriptionId, resourceGroupName and the logAnalyticsWorkspaceName in the parameters file.

This example will show you how to deploy an Action Group and all 3 types of Alert Rules that were described above. The deploment will create the following:

  • An Action Group named monitorTeam with a single e-mail receiver.

  • A Log search Alert called vmLowDiskSpace that will notify the Action Group when Virtual machine disk space is less than 5%.

  • A Log search Alert called vmLowMemory that will notify the Action Group when Virtual machine memory is less Than 250 MB free.

  • A Metric Alert called vmCpuUtilization that will notify the Action Group when Average CPU utilization is greater than 75% for 15 minutes. It will auto resolve when CPU utilization returns to normal.

  • A Metric Alert called vmOsDiskQueueLength that will notify the Action Group when Virtual machine disk queue length is a concern. It will auto resolve when Disk queue length returns to normal.

  • An Activity log Alert called vmDeletionSucceeded that will notify the Action Group when Virtual machine is deleted successfully.

ARM Template

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "actionGroupName": {
            "type": "string",
            "defaultValue": "monitorTeam",
            "minLength": 1,
            "metadata": {
                "description": "Unique name (within the Resource Group) for the Action group."
            }
        },
        "actionGroupShortName": {
            "type": "string",
            "defaultValue": "monitorTeam",
            "minLength": 1,
            "maxLength": 12,
            "metadata": {
                "description": "Short name for the Action group."
            }
        },
        "logAnalyticsWorkspaceId": {
          "type": "String",
          "metadata": {
            "displayName": "Log analytics workspace",
            "description": "Auditing writes database events to a log analytics workspace.",
            "strongType": "omsWorkspace"
          }
        },
        "location": {
          "type": "string",
          "defaultValue": "[resourceGroup().location]",
          "metadata": {
            "description": "Location for all resources."
          }
        }
    },
    "variables": {},
    "resources":[ 
    {
        "type": "Microsoft.Insights/actionGroups",
        "apiVersion": "2018-03-01",
        "name": "[parameters('actionGroupName')]",
        "location": "Global",
        "properties": {
            "groupShortName": "[parameters('actionGroupShortName')]",
            "enabled": true,
            "smsReceivers": [],
            "emailReceivers": [
                {
                    "name": "emailReceiver",
                    "emailAddress": "<receiverEmail>"
                }
            ]
        }
    },
    {
        "name":"vmLowDiskSpace",
        "type":"Microsoft.Insights/scheduledQueryRules",
        "apiVersion": "2018-04-16",
        "location": "[parameters('location')]",
        "dependsOn": [
            "[parameters('actionGroupName')]"
        ],
        "properties": {
            "description": "Virtual Machine Low Disk Space - Less Than 5%",
            "enabled": "true",
            "source": {
                "query": "Perf | where CounterName == \"% Free Space\" | summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 15m), Computer, InstanceName | summarize AggregatedValue = arg_min(AggregatedValue, InstanceName, Computer) by Computer | project Computer, DriveLetter=InstanceName, FreeSpacePercent=AggregatedValue | where FreeSpacePercent < 5 | order by FreeSpacePercent asc",
                "dataSourceId": "[parameters('logAnalyticsWorkspaceId')]",
                "queryType": "ResultCount"
            },
            "schedule": {
                "frequencyInMinutes": 15,
                "timeWindowInMinutes": 15
            },
            "action": {
                "odata.type": "Microsoft.WindowsAzure.Management.Monitoring.Alerts.Models.Microsoft.AppInsights.Nexus.DataContracts.Resources.ScheduledQueryRules.AlertingAction",
                "severity": 2,
                "aznsAction": {
                    "actionGroup": "[array(resourceId('Microsoft.Insights/actionGroups', parameters('actionGroupName')))]",
                    "emailSubject": "Virtual Machine Low Disk Space - Less Than 5%"
                },
                "trigger": {
                    "thresholdOperator": "GreaterThanOrEqual",
                    "threshold": 1
                }
            }
        }
    },
    {
        "name":"vmLowMemory",
        "type":"Microsoft.Insights/scheduledQueryRules",
        "apiVersion": "2018-04-16",
        "location": "[parameters('location')]",
        "dependsOn": [
            "[parameters('actionGroupName')]"
        ],
        "properties": {
            "description": "Virtual Machine Low Memory - Less Than 250 MB Free",
            "enabled": "true",
            "source": {
                "query": "Perf | where CounterName == \"Available MBytes\" | summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 15m), Computer | summarize arg_min(AggregatedValue, Computer) by Computer | project Computer, MemoryGBFree=AggregatedValue/1024 | where MemoryGBFree < 0.25 | order by MemoryGBFree asc",
                "dataSourceId": "[parameters('logAnalyticsWorkspaceId')]",
                "queryType": "ResultCount"
            },
            "schedule": {
                "frequencyInMinutes": 15,
                "timeWindowInMinutes": 15
            },
            "action": {
                "odata.type": "Microsoft.WindowsAzure.Management.Monitoring.Alerts.Models.Microsoft.AppInsights.Nexus.DataContracts.Resources.ScheduledQueryRules.AlertingAction",
                "severity": 2,
                "aznsAction": {
                    "actionGroup": "[array(resourceId('Microsoft.Insights/actionGroups', parameters('actionGroupName')))]",
                    "emailSubject": "Virtual Machine Low Memory - Less Than 250 MB Free"
                },
                "trigger": {
                    "thresholdOperator": "GreaterThanOrEqual",
                    "threshold": 1
                }
            }
        }
    },
    {
        "name":"vmCpuUtilization",
        "type":"Microsoft.Insights/metricAlerts",
        "apiVersion": "2018-03-01",
        "location": "Global",
        "dependsOn": [
            "[parameters('actionGroupName')]"
        ],
        "properties": {
            "severity": 2,
            "enabled": true,
            "scopes": [
                "[subscription().Id]"
            ],
            "evaluationFrequency": "PT15M",
            "windowSize": "PT15M",
            "criteria": {
                "allOf": [
                    {
                        "threshold": 75,
                        "name": "Metric1",
                        "metricNamespace": "Microsoft.Compute/virtualMachines",
                        "metricName": "Percentage CPU",
                        "dimensions": [],
                        "operator": "GreaterThan",
                        "timeAggregation": "Average",
                        "criterionType": "StaticThresholdCriterion"
                    }
                ],
                "odata.type": "Microsoft.Azure.Monitor.MultipleResourceMultipleMetricCriteria"
            },
            "autoMitigate": true,
            "targetResourceType": "Microsoft.Compute/virtualMachines",
            "targetResourceRegion": "[parameters('location')]",
            "actions": [
                {
                    "actionGroupId": "[resourceId('Microsoft.Insights/actionGroups', parameters('actionGroupName'))]",
                    "webHookProperties": {}
                }
            ],
            "description": "Average CPU Utilization Greater Than 75%"
        }
    },
    {
        "name":"vmOsDiskQueueLength",
        "type":"Microsoft.Insights/metricAlerts",
        "apiVersion": "2018-03-01",
        "location": "Global",
        "dependsOn": [
            "[parameters('actionGroupName')]"
        ],
        "properties": {
            "description": "O/S Disk Queue Length",
            "severity": 2,
            "enabled": true,
            "scopes": [
                "[subscription().Id]"
            ],
            "evaluationFrequency": "PT5M",
            "windowSize": "PT5M",
            "criteria": {
                "allOf": [
                    {
                        "alertSensitivity": "Low",
                        "failingPeriods": {
                            "numberOfEvaluationPeriods": 4,
                            "minFailingPeriodsToAlert": 4
                        },
                        "name": "Metric1",
                        "metricNamespace": "Microsoft.Compute/virtualMachines",
                        "metricName": "OS Disk Queue Depth",
                        "dimensions": [],
                        "operator": "GreaterThan",
                        "timeAggregation": "Average",
                        "criterionType": "DynamicThresholdCriterion"
                    }
                ],
                "odata.type": "Microsoft.Azure.Monitor.MultipleResourceMultipleMetricCriteria"
            },
            "autoMitigate": true,
            "targetResourceType": "Microsoft.Compute/virtualMachines",
            "targetResourceRegion": "[parameters('location')]",
            "actions": [
                {
                    "actionGroupId": "[resourceId('Microsoft.Insights/actionGroups', parameters('actionGroupName'))]",
                    "webHookProperties": {}
                }
            ]
        }
    },
    {
        "name":"vmDeletionSucceeded",
        "type": "Microsoft.Insights/activityLogAlerts",
        "apiVersion": "2017-04-01",
        "location": "Global",
        "properties": {
            "scopes": [
                "[subscription().Id]"
            ],
            "condition": {
                "allOf": [
                    {
                        "field": "category",
                        "equals": "Administrative"
                    },
                    {
                        "field": "resourceType",
                        "equals": "Microsoft.Compute/virtualMachines"
                    },
                    {
                        "field": "operationName",
                        "equals": "Microsoft.Compute/virtualMachines/delete"
                    },
                    {
                        "field": "status",
                        "equals": "succeeded"
                    }
                ]
            },
            "actions": {
                "actionGroups": [
                    {
                        "actionGroupId": "[resourceId('Microsoft.Insights/actionGroups', parameters('actionGroupName'))]",
                        "webhookProperties": {}
                    }
                ]
            },
            "enabled": true,
            "description": "Virtual Machine Deleted Successfully"
        }
    }
    ]
}

ARM Template Parameters

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "actionGroupName": {
      "value": "monitorTeam"
    },
    "actionGroupShortName": {
      "value": "monitorTeam"
    },
    "logAnalyticsWorkspaceId": {
      "value": "/subscriptions/<subscriptionId>/resourceGroups/<resourceGroupName>/providers/Microsoft.OperationalInsights/workspaces/<logAnalyticsWorkspaceName>"
    }
  }
}

Manual Deployment

It’s always a good idea to deploy manually first to make sure everything is in order. You can deploy the previous example using the following PowerShell command.

New-AzResourceGroupDeployment -TemplateFile .\azuredeploy.json -TemplateParameterFile .\azuredeploy.parameters.json -ResourceGroupName "<resourceGroupName>" -Verbose

Automated Deployment With DevOps Pipeline

  1. Go to Azure Active Directory and create a new Application Registration

  1. Generate a new secret for the Application Registration, copy the secret to the clipboard

  1. Assign the Application Registration Contributor role at the Resource group level, and Monitoring Reader role at the Subscription level.

  2. From Azure DevOps, go to Project Settings and create a new Service Connection using the Application Registration details and the new secret. Verify and save.

  1. Source the Azure DevOps YAML pipeline below. You can source this YAML pipeline from my GitHub repository as well.

This pipeline runs when there is a commit to master branch. Be sure to replace serviceConnectionName, subscriptionId and resourceGroupName in the YAML pipeline with values you require.

Once complete, execute the pipeline, your Action Group and Alerts will be deployed automatically.

trigger:
- master

pool:
  vmImage: 'windows-latest'

steps:
- task: AzureResourceManagerTemplateDeployment@3
  inputs:
    deploymentScope: 'Resource Group'
    azureResourceManagerConnection: '<serviceConnectionName>'
    subscriptionId: '<subscriptionId>'
    action: 'Create Or Update Resource Group'
    resourceGroupName: '<resourceGroupName>'
    location: 'Canada Central'
    templateLocation: 'Linked artifact'
    csmFile: '$(System.DefaultWorkingDirectory)/azuredeploy.json'
    csmParametersFile: '$(System.DefaultWorkingDirectory)/azuredeploy.parameters.json'
    deploymentMode: 'Incremental'

Here is the output from a successful pipeline execution.

If you go to the Azure portal, you will see this in the Resource group where you deployed the Alerts into.