When businesses migrate infrastructure to Azure, application performance monitoring tops their list of priorities. They want to know how to take advantage of Azure’s wealth of logs and metrics to monitor the performance of their application, the VMs and services it depends on, and the underlying infrastructure platform. Azure users approach VIAcode with three fundamental questions.
- What do you monitor in Azure?
- What are the key monitoring primitives?
- How is Azure monitoring put together?
Underlying these questions is a concern about the division of responsibility for application performance monitoring. What does Azure monitor and what is the user responsible for monitoring? To understand how Azure’s performance monitoring tools can be leveraged by DevOps teams to identify and address application performance issues, it’s helpful to have a mental model of the telemetry that is available and the services Azure provides to leverage the raw data.
Azure’s monitoring tools are complex, multi-layered, and some services appear to have overlapping functionality. In this article, we’ll discuss the basic components that constitute Azure’s application performance monitoring services.
Azure Resource Manager
Azure Resource Manager (ARM) is the “control center” for Azure resources. ARM is used to deploy, manage, and delete resources. It orchestrates the lifecycle of all Azure services. Users interact with ARM via several interfaces, including the web portal, the Azure CLI, and various APIs.
The ARM is used to control two categories of infrastructure: compute and other services an application uses. Compute includes VMs, Functions, AppService, and Kubernetes, among others. Services your application might depend on include Storage and EventHub.
All of these components — ARM, compute, and services — generate telemetry that can be leveraged for application performance monitoring. The telemetry is made available to users via Azure Monitor.
What Telemetry Does Azure Make Available?
It is essential to understand which telemetry Azure is responsible for monitoring, and which falls to the user to monitor. In a nutshell, Azure monitors and responds to telemetry at the infrastructure layer. For example, Azure cares what is happening to the infrastructure on which your VMs run and the VM itself. But it is the user’s responsibility to respond to telemetry at the level of the guest operating system and the application — although Azure provides telemetry and tools for performance monitoring at all of those levels.
Service Health allows users to monitor the health of Azure infrastructure. This is where users go to find out about recommendations from Microsoft, scheduled infrastructure maintenance plans, and potential performance degradation at the hardware, network, and data center layer.
The Activity Log is a log of activities that occur in the Azure Resource Manager. It includes events such as the creation or deletion of resources, service health incidents, and resource health alerts. The Activity Log is the “source of truth” for events affecting your infrastructure at the level of individual resources.
Infrastructure logs record events related to every Azure resource. They can be compared to the Windows Event Log or Syslog on Linux. Infrastructure logs are collected and centrally stored within the Log Analytics service, which can be used to analyze and correlate log data across resources.
Application logs record what is happening at the application level, what individual applications know about their state, including actions and exceptions. Azure application logs are comparable to trace logs and are extremely helpful when diagnosing application performance issues, allowing users to correlate application events with infrastructure events.
In addition to logs, every service exposes its health through metrics. Metrics are time-series data used to generate alerts and build charts that provide users with an insight into service performance.
Azure Monitor, a service available via the Azure Portal, is the endpoint that ties together the logs and metrics generated by Azure compute resources and services. Via the Azure Monitor service, Azure users access monitoring primitives such as metrics and logs, service health, and custom alerts and dashboards for monitoring the performance of their application.
On top of these monitoring services, Azure also provides a range of domain-specific monitoring tools called Services and Insights. Security Center contains everything related to security, and Application Insights consolidates data relevant to application performance. These solutions understand telemetry types and provide tailored experiences on top of the raw telemetry: they are aware of what the data means in context and can infer insights from it. For example, Insights generate network diagrams and application topologies based on logs and metrics.
We have discussed three levels of monitoring services that Azure provides: comprehensive logs and metrics from all services, core monitoring primitives that give users access to the data in multiple formats, and insights and solutions that provide domain-specific monitoring solutions.
Is All This Telemetry Necessary?
Azure generates massive quantities of data, which has to be stored and processed. Do Azure users need all of that telemetry? In short, yes. A comprehensive array of logs and analytics from all levels of the stack empowers Azure users to establish correlations and carry out root cause analyses quickly and efficiently.
As we discussed earlier, Azure generates telemetry from ARM and infrastructure (what Azure knows), from within VMs via the Azure VM agent (what the guest OS knowns), and from within applications via the SDK (what the application knows). If telemetry is missing, it is challenging — and potentially impossible — to identify the root cause of application performance issues.
If you would like more information on Application Monitoring in Azure fill out the form below to have a conversation with one of our cloud experts.