AWS CloudWatch: How To Monitor Application Performance In AWS



Here are best practices for monitoring application servers, database servers and network performance using Amazon's CloudWatch service for AWS.

There will be times your cloud application does not perform as expected. Database queries will take longer to return than you would like. Networks will experience periods of high latency. Business logic will take longer to run than expected. Your first indication of a problem may be a user contacting you to inform you of poor performance. This “wait and see what happens” approach to monitoring makes your users responsible for detecting performance problems. It also leaves you with little information to work with to identify the root cause of the performance problem. AWS enables a more effective way to collect data on your application performance with the CloudWatch service.

An important aspect of any monitoring tool is the ability to capture a variety of measures about different parts of your application. This is especially important for distributed systems, which can suffer slow overall performance for several different reasons. Consider the user interface of a management reporting tool that takes over ten seconds to refresh when it should refresh in under 2 seconds. The cause could be related to high CPU utilization prolonging computation, poor indexing strategy that leads to reading an unnecessarily large number of disk blocks, or insufficient RAM resulting in high rates of swapping. Because so many problems can manifest themselves with similar symptoms, we need additional information to diagnose the root cause of a problem. AWS CloudWatch serves this purpose by proving a wide range of metrics across multiple services.

What CloudWatch Is Capable Of
CloudWatch collects and stores data about performance details of instances and services. For example, a systems administrator can collect data on EC2 instances including: CPU utilization, disk read and write operations, and number of bytes received and sent over the network. The specific data collected are known as metrics and each AWS service has a specific set. The Relational Database Service (RDS), for instance, has metrics on read and write latency, number of I/O operations per second, and number of database connections.

The metrics are organized by dimensions, or attributes of a service, that allow you track data from multiple instances or services. EC2 metrics may be organized and grouped by image ID, instance ID, instance type or auto scaling group. RDS metrics are grouped by database instance, database class, or database engine (e.g. MySQL, Oracle or MS SQL). Metrics also have timestamps associated with them. This provides the equivalent of a time dimension to metrics across CloudWatch.

Metrics have units of measures associated with them. Network IO is measured in terms of bytes. CPU utilization is measured as a percentage. Database connections to an RDS database is a count.

Metrics may be viewed in the management console, using the command line, or by making API calls. You can also graph performance data using CloudWatch visualization features. One or more metrics can be graphed at any time by searching for the instance by region, resource, metric name, or any other metadata. After selecting all metrics you want shown, a graph will appear on the bottom of the window. In addition to selecting specific metrics, admins can modify the time and statistic aggregations using filter drop-downs above the graph. Graphs can be saved by copying and pasting the graph URL into the browser.

In addition to viewing performance metrics, you can use these measures to perform actions under specific conditions. CloudWatch includes the ability to set alarms for certain metrics. First, a threshold must be set for a metric, as well as a specified time period in which the metric is observed. When a metric changes states an alarm message is sent out. Only after a metric has been in an inappropriate state for a certain amount of time will an action be invoked.

For example, let’s assume you want to set an alarm based on CPU utilization. After graphing the appropriate metric, in this case CPU usage, you will define a period from the drop-down list and, most likely, select average for the statistical aggregation. Now a name for the alarm can be entered, such as HighCPUAlarm, as well as a description of what the alarm is keeping track of. Finally, you can set the exact threshold, using the appropriate drop-down, which will be tested for every set time period.

In addition to sending messages, alarms are useful for invoking certain actions, such as shutting down unused instances. During the traditional alarm setup there is a check box for “Take the action” with the options Stop or Terminate. If the alarm metric exits its threshold for too long, this action will be taken, in addition to a notification being sent.

Free Vs Paid Tiers Of Service
AWS CloudWatch provides both free and paid tiers of service. Basic monitoring services for EC2, EBS Volumes, Elastic Load Balancers and RDS DB instances all fall within the free tier. Additionally, each customer gets 10 metrics, 10 alarms and 1 million API requests per month. The metrics differ from other free metrics in the amount of customization that can be applied to them.

As for traditional pricing, detailed EC2 monitoring will run $3.50 per instance per month, for CloudWatch Custom Metrics it is $0.50 per metric per month, CloudWatch Alarms are $0.10 per alarm per month, API requests are $0.01 per 1000, and CloudWatch Logs are $0.50 per GB ingested and $0.03 per GB archived per month.

Monitoring distributed applications, which are typical in the cloud, presents more challenges than monitoring a single application. AWS CloudWatch compensates for some of those additional challenges by providing a single, unified monitoring framework for both custom applications running on VM instances and AWS services.
Powered by Blogger.