Hello again and welcome to back to my series of blog posts all about #100DaysofAWS in which I am talking all about the most interesting and exciting services AWS has to offer.

CloudWatch is on the agenda today!

CloudWatch is AWS’s principal real-time monitoring and observability service, and it is designed to make the lives of DevOps, SysOps and IT Managers much easier by understanding exactly what is going on in your AWS infrastructure.

So how does it work?

CloudWatch integrates with the entire fleet of AWS services to give you actionable and reliable data in order to gain an system-wide view of performance changes, resource utilization and operational of your application’s health.

The most efficient way to understand how CloudWatch works is to think of it as being a collection of multiple services:

  1. CloudWatch Logs
  2. CloudWatch Metrics
  3. Amazon EventBridge ( Formally CloudWatch Events)
  4. CloudWatch Alarms
  5. CloudWatch DashBoards

Lets break down these services and have a look first at CloudWatch Logs.

CloudWatch logs is the fundamental part of CloudWatch in which all of the other services are built upon. Simply put, CloudWatch Logs is used to monitor, store and access your log files of your application.

It is important to know that Logs must belong to a Log Group, which is a group of Log files. A Log in a group is called a Log Stream, and interestingly all Log Files by default never expire. This way you don’t have to ever worry about misplacing logs or loosing access to them, and you can make sure you are being extremely diligent in your service monitoring.

Whilst CloudWatch logs are automatically integrated with most AWS services, it is useful to know that there are some cases where you have to utilise IAM permissions to integrate Logs with the service.

With CloudWatch Logs, you can use a powerful feature known as CloudWatch Logs Insights to gain a deeper comprehension of your operations. Insights can be use to search through your logs data interactively and gleam understanding into how your services are being used. Insights actually have their own Query language in which you can run commands so you can more efficiently and effectively respond to any operational issues in real time.

Don’t Worry!

You don’t have to learn how to write these queries yourself, as AWS help you with auto-completion of queries and give you access to a bucket load of sample queries so you start running them right away on your logs.

The next aspect of CloudWatch we will dive into is CloudWatch Metrics.

CloudWatch Metrics is built on top of CloudWatch Logs, and it is a time-ordered set of data points. Within the logs exist the data that is then extracted from CloudWatch Metrics and then plotted on a graph for you to use. By default, most services include totally free metrics.

You can also create your own Custom Metrics using the CLI/SDK in which you can set any kind of data that you like, and publish it to CloudWatch Metrics. If you want your data to paint a picture in an extremely high resolution (sub 1 minute tracking of metrics) then Custom Metrics can help you get to as low as reporting metrics every second. This can be super valuable if you have an extremely fast moving workflow which you need to watch extremely closely.

If you happen to be interested here is a list of all the services that publish data to CloudWatch Metrics.

The next stage in our understanding of CloudWatch is getting to grips with Amazon EventBridge.

Amazon EventBridge is a serverless event bus that holds event data, and allows you to define rules in order to react to events which are then in turn consumed by targets.

Lets break this down a bit more.

EventBridge can connect together applications using these events. An event in this instance, is a notifier that the state of a system has changed which is emitted by a producer. There are actually three kinds of event bus you need to know about:

  1. Default Event Bus — Each AWS account has a automatically configured default event bus.
  2. Custom Event Bus — This Event Bus is designed to hold events from multiple AWS accounts, using organizations.
  3. SaaS Event Bus — This is designed to integrate with Third Part SaaS providers.

When we talk about a producer, we are normally talking about some kind of AWS service that emits events, and when we talk about events, we are talking about JSON objects that are moving through even bus, travelling in what is called a stream.

Partner Sources refer to third-party apps that can also emit events to an event bus, used in the SaaS Event Bus e.g. Auth0, DataDog etc.

The Rules we mentioned earlier also refer to the defined directive on what events in particular to capture and pass around to targets. You can have a maximum of 100 rules per bus to help manage your data in as insightful way as possible.

Targets are AW services which in turn consume events — and it’s worth knowing that you can have a maximum of 5 per rule.

The Diagram below breaks down this process very simply.

Diagram of an Event Producer pushing an event to the EventBridge event bus, then being directed by the rule to be consumed by the target AWS services (Lambda, Kinesis Data Firehose and SNS.)

Next, let’s talk CloudWatch Alarms.

CloudWatch Alarms are designed to alert the user of when some kind of threshold is breached. A notification is triggered based on some specific metrics that you define within your use case.

There are three states an alarm can be in and that is;

Alarm OK means the threshold hasn’t been reached.

Alarm means the threshold has been reached.

If the data needed to make the decision is missing or incomplete, the monitor transitions to this state.

The first example of CloudWatch Alarms that you may bump into when getting stuck into your AWS account is through Billing Alarms. These highly configurable alarms can help you mitigate disasters, save money and streamline your application on AWS to be running as smoothly as possible.

You can use alarms in two main ways in order to be notified of important happenings within your account:

  1. Static Threshold Alarms

With Static Threshold alarms the alarm goes to state when the metric breaches the threshold that you have defined. An example of this can be if an EBS volume is receiving too many IOPS, or if an RDS Database is receiving to many reads and writes.

2. Anomaly Detection Alarms

Anomaly Detection is interesting as it learns the most typical way your service is run by going back over historic data. When it detects something as being out of the ordinary the alarm goes into state.

Another interesting way CloudWatch Alarms can used is through Composite Alarms. These alarms are designed to watch the alarm state of other alarms, and only trigger an state if a number of conditions are met. For example, an alarm to trigger an Auto Scaling Group may only be triggered if both an alarm for a certain number of IOPS in a EBS volume goes into state at the same time Read / Writes are too high on a database. This fine grained control of alarming gives you a fantastic high degree of control over how your service are provisioned.

Lastly we will talk about CloudWatch DashBoards.

CloudWatch DashBoards allow you to either choose from a default Dashboards, or choose your own custom CloudWatch Dashboards based off CloudWatch Metrics. Below is a visual glimpse of how a Dashboard looks:

Visual of CloudWatch DashBoards with multiple widgets

This service allows you too see exactly what is going on for key metrics in your account, and you can use charts, graphs and numbers and percentages to visualise your metrics in an easy-to-digest way. You can either limit access to users of the AWS account or share with third parties who may have interest in the data.

That brings us to the end of our talk about CloudWatch. Let me know if you have any questions, and as always keep on building!

Jack Lavelle

I am obsessed with AWS, the cloud and all things related.