Last Updated:

ECS - Monitor container-level CPU and memory using Docker stats and CloudWatch metrics

Jenade Moodley
Jenade Moodley ECS

When using ECS, you run containers in the form of an ECS task, where each task can contain one or more container definitions, up to a maximum of 10. If you want to monitor the performance of your tasks, you can monitor the containers at the task level. This does offer good insight into the working of your task, but it doesn't allow you to view the performance of each container. In a delicate environment where you need to monitor each and every aspect of the system, monitoring each container would be extremely beneficial. There is no CloudWatch metric which monitors container-level resource usage, but it is possible to build our own CloudWatch metric using the container-level resource usage statistics provided by the Docker daemon on the EC2 host. These statistics are obtained by the more widely known docker stats command.

The docker stats command returns a live data stream for running containers. With this command, you would be able to obtain the CPU, memory, network I/O, as well as disk I/O usage of each container. For this post, I will only be covering CPU and memory usage. You can take a more detailed look at docker stats from Docker's documentation.

To provide a short overview of the steps we will be deploying, we will first create a script which obtains the resource usage statistics using docker stats, and upload these values to CloudWatch metrics. The statistics obtained will need to be easily obtained and parsed, hence the statistics will be obtained as a JSON, and be parsed using jq (more on this below). This script needs to be run at set intervals in order to gain a good understanding of the container resource usage patterns. To do this, we will create a cron job which will run this script at a set frequency.

Installing the dependencies

As we will be sending the metrics to CloudWatch, and parsing through the JSON statistics using jq, we need to install the AWS CLI, as well as the jq binary. As you are using ECS, you would most likely be using the ECS-optimized AMI, which is based on Amazon Linux, or Amazon Linux 2. As such, the commands you would use to install these dependencies would be as below:

sudo yum install python3 python3-pip jq -y
sudo pip3 install -U awscli

Creating the script

Let's first look at the commands which will be included in the script.

We first need to obtain the resource usage statistics using docker stats. By default,docker stats is presented as text, which is not very easy to handle when trying to filter for specific values. As such, we can format the statistics into a JSON using the --format parameter. This will be based on a publicly available example docker-stats-json. The statistics will then be formed into a valid JSON object using jq.

STATS=$(docker stats --no-stream --format "{\"container\": \"{{ .Container }}\",\"name\": \"{{ .Name }}\", \"memory\": { \"raw\": \"{{ .MemUsage }}\", \"percent\": \"{{ .MemPerc }}\"}, \"cpu\": \"{{ .CPUPerc }}\"}" | jq '.' -s -c )

As we will be uploading the data to CloudWatch metrics, CloudWatch is region-specific. As such we also need to include the region in our script. We can obtain the region programmatically from the metadata service.

REGION=$(curl http://169.254.169.254/latest/meta-data/placement/region 2> /dev/null)

We also need a way to identify and structure our metrics in CloudWatch. One good way to manage this is to use the EC2 instance ID as a dimension for the metric, as well as the Container Name, and Container ID. This will allow us to identify the exact container which provided the metric. As such, we first need the EC2 instance ID, which can also be obtained from the instance metadata.

INSTANCE_ID=$(curl http://169.254.169.254/latest/meta-data/instance-id 2> /dev/null)

Finally, we need to parse through the JSON to obtain the Container Name, Container ID, CPU, and Memory usage of each container. To do this we first need to obtain the total number of containers.

NUM_CONTAINERS=$(echo "$STATS" | jq '. | length')

We then iterate through the JSON to obtain the values we need.

for (( i=0; i<$NUM_CONTAINERS; i++ )) 
do CPU=$(echo "$STATS" | jq -r .[$i].cpu | sed 's/%//')
MEMORY=$(echo "$STATS" | jq -r .[$i].memory.percent | sed 's/%//')
CONTAINER=$(echo $STATS | jq -r .[$i].container)
CONTAINER_NAME=$(echo $STATS | jq -r .[$i].name)

And then upload these values to CloudWatch metrics using the AWS CLI.

/usr/local/bin/aws cloudwatch put-metric-data --metric-name CPU --namespace DockerStats --unit Percent --value $CPU --dimensions InstanceId=$INSTANCE_ID,ContainerId=$CONTAINER,ContainerName=$CONTAINER_NAME --region $REGION
/usr/local/bin/aws cloudwatch put-metric-data --metric-name Memory --namespace DockerStats --unit Percent --value $MEMORY --dimensions InstanceId=$INSTANCE_ID,ContainerId=$CONTAINER,ContainerName=$CONTAINER_NAME --region $REGION
done

Putting it all together, the completed script should look as below.

#!/bin/bash

REGION=$(curl http://169.254.169.254/latest/meta-data/placement/region 2> /dev/null)
INSTANCE_ID=$(curl http://169.254.169.254/latest/meta-data/instance-id 2> /dev/null)
STATS=$(docker stats --no-stream --format "{\"container\": \"{{ .Container }}\",\"name\": \"{{ .Name }}\", \"memory\": { \"raw\": \"{{ .MemUsage }}\", \"percent\": \"{{ .MemPerc }}\"}, \"cpu\": \"{{ .CPUPerc }}\"}" | jq '.' -s -c )
NUM_CONTAINERS=$(echo "$STATS" | jq '. | length')

for (( i=0; i<$NUM_CONTAINERS; i++ )) 
do CPU=$(echo "$STATS" | jq -r .[$i].cpu | sed 's/%//')
MEMORY=$(echo "$STATS" | jq -r .[$i].memory.percent | sed 's/%//')
CONTAINER=$(echo $STATS | jq -r .[$i].container)
CONTAINER_NAME=$(echo $STATS | jq -r .[$i].name)

/usr/local/bin/aws cloudwatch put-metric-data --metric-name CPU --namespace DockerStats --unit Percent --value $CPU --dimensions InstanceId=$INSTANCE_ID,ContainerId=$CONTAINER,ContainerName=$CONTAINER_NAME --region $REGION
/usr/local/bin/aws cloudwatch put-metric-data --metric-name Memory --namespace DockerStats --unit Percent --value $MEMORY --dimensions InstanceId=$INSTANCE_ID,ContainerId=$CONTAINER,ContainerName=$CONTAINER_NAME --region $REGION

done

Creating the Cron job

For the sake of simplicity, I have placed the above script at /opt/docker-stats-monitor/monitor.sh in my EC2 instance.

We first need to make the above script executable.

chmod +x /opt/docker-stats-monitor/monitor.sh

We then need to create the cron job by creating a file in /etc/cron.d.

echo '* * * * * root bash /opt/docker-stats-monitor/monitor.sh' > /etc/cron.d/docker-stats

The above example creates a cron job which obtains and uploads the resource-usage statistics every minute. You can adjust the frequency by changing the cron schedule expression.

The Results

Automating the Procedure

When using ECS, you would hardly access the EC2 host, if at all. You would also most likely be incorporating Auto Scaling or Capacity Providers to launch the EC2 instance, hence you cannot SSH into each instance to configure this setup. To make this a plausible solution, you can implement the above steps using a User Data script.

#!/bin/bash

yum install python3 python3-pip jq -y
pip3 install -U awscli

mkdir -p /opt/docker-stats-monitor

cat > /opt/docker-stats-monitor/monitor.sh << 'EOF'
#!/bin/bash

REGION=$(curl http://169.254.169.254/latest/meta-data/placement/region 2> /dev/null)

INSTANCE_ID=$(curl http://169.254.169.254/latest/meta-data/instance-id 2> /dev/null)

STATS=$(docker stats --no-stream --format "{\"container\": \"{{ .Container }}\",\"name\": \"{{ .Name }}\", \"memory\": { \"raw\": \"{{ .MemUsage }}\", \"percent\": \"{{ .MemPerc }}\"}, \"cpu\": \"{{ .CPUPerc }}\"}" | jq '.' -s -c )

NUM_CONTAINERS=$(echo "$STATS" | jq '. | length')

for (( i=0; i<$NUM_CONTAINERS; i++ )) 
do CPU=$(echo "$STATS" | jq -r .[$i].cpu | sed 's/%//')
MEMORY=$(echo "$STATS" | jq -r .[$i].memory.percent | sed 's/%//')
CONTAINER=$(echo $STATS | jq -r .[$i].container)
CONTAINER_NAME=$(echo $STATS | jq -r .[$i].name)

/usr/local/bin/aws cloudwatch put-metric-data --metric-name CPU --namespace DockerStats --unit Percent --value $CPU --dimensions InstanceId=$INSTANCE_ID,ContainerId=$CONTAINER,ContainerName=$CONTAINER_NAME --region $REGION

/usr/local/bin/aws cloudwatch put-metric-data --metric-name Memory --namespace DockerStats --unit Percent --value $MEMORY --dimensions InstanceId=$INSTANCE_ID,ContainerId=$CONTAINER,ContainerName=$CONTAINER_NAME --region $REGION

done
EOF

chmod +x /opt/docker-stats-monitor/monitor.sh

echo '* * * * * root bash /opt/docker-stats-monitor/monitor.sh' > /etc/cron.d/docker-stats

Conclusion

This is a feasible solution for monitoring Docker containers, however one big caveat which you would have noticed is that this solution requires running commands on the EC2 host used for running the containers. This means that this solution can only work for tasks using the EC2 Launch Type, and will not work for Fargate tasks.

And that's all there is to it. Let me know your thoughts in the comments below, and feel free to adjust these steps to meet your use-case, and share with the rest of the community.

Comments