Last Updated:
https://resources.nvidia.com/en-us-ngc/ai-fingertips-blog

How to use ARM-based GPU EC2 instances as ECS container instances

Jenade Moodley
Jenade Moodley ECS

With new EC2 instance types such as G5g, you can now use GPUs with ARM on EC2. While ECS offers some the ECS-optimized AMI as a way to quickly setup your EC2 instances for ECS workloads, you will find that the GPU optimized AMI is only for the x86 platform, and there is no support for ARM-based GPU instances. The reason for this is quite simple; NVIDIA GPU container workloads require the nvidia-docker container runtime, and support is not yet available for Amazon Linux. Support is only available for CentOS 8, RHEL 8, RHEL 9, and Ubuntu versions 18.04 and above. You can review the supported distributions in nvidia-docker documentation.

Given that CentOS 8 has reached EOL and RedHat has dropped support for the Docker runtime engine, that leaves us only with Ubuntu. You could try and setup Docker on CentOS or RHEL, but given the lack of support this would not be advisable.

To setup an Ubuntu ARM-based GPU instance for ECS, we would need to follow the below steps.

  1. Install NVIDIA drivers.
  2. Install Docker and NVIDIA container runtime.
  3. Install ECS agent.

In my testing I used Ubuntu 20.04 on a G5g.xlarge instance, but these steps should work regardless of your distribution.

Install NVIDIA Drivers

G5g instances currently support the Tesla driver version 470.82.01 or later. You can download the appropriate Driver directly from Nvidia.

You can use the below specifications to find the driver, straight from AWS documentation

InstanceProduct TypeProduct SeriesProduct
G5gTeslaT-SeriesNVIDIA T4G

In this example, I am using version “510.73.08” of the driver, the latest version of the driver at the time of writing this article. This driver can be installed with the below steps:

  1. Download the driver.
$ wget https://us.download.nvidia.com/tesla/510.73.08/NVIDIA-Linux-aarch64-510.73.08.run
  1. Install gcc, make and headers.
$ sudo apt-get update -y
$ sudo apt-get install gcc make linux-headers-$(uname -r) -y
  1. Run the executable.
$ chmod +x NVIDIA-Linux-aarch64-510.73.08.run
$ sudo sh ./NVIDIA-Linux-aarch64-510.73.08.run  --disable-nouveau --silent
  1. (Optional) Test if GPU is detected.
$ nvidia-smi

Install Docker and NVIDIA container runtime

  1. Download and install Docker.
$ curl https://get.docker.com | sh   && sudo systemctl --now enable docker
  1. Setup NVIDIA repository information.
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
  1. Install NVIDIA container runtime.
$ sudo apt-get update
$ sudo apt-get install -y nvidia-docker2 nvidia-container-runtime
$ sudo systemctl restart docker
  1. Confirm if the NVIDIA runtime is used by Docker.
$ sudo docker info --format '{{json .Runtimes.nvidia}}'

You should receive the below output.

{"path":"nvidia-container-runtime"}

Install ECS agent

  1. Download the ARM-based ECS agent for Ubuntu.
$ curl -O https://s3.us-east-1.amazonaws.com/amazon-ecs-agent-us-east-1/amazon-ecs-init-latest.arm64.deb
  1. Install the ECS agent.
$ sudo dpkg -i amazon-ecs-init-latest.arm64.deb
  1. Configure GPU support for the agent.
$ sudo mkdir -p /etc/ecs/
$ sudo touch /etc/ecs/ecs.config
$ echo "ECS_ENABLE_GPU_SUPPORT=true" | sudo tee -a /etc/ecs/ecs.config

At this stage, you can add any additional configuration to the ecs.config file such as setting the ECS cluster.

  1. Start the agent.
$ sudo systemctl enable ecs
$ sudo systemctl start ecs

And that’s it! You would see the instance in the ECS console, registered as a container instance. You can begin assigning GPUs to your containers and schedule them on these instances. If you would like to test a sample application, you can use the below ECS task definition to simply check the NVIDIA GPU with nvidia-smi.

    {
        "containerDefinitions": [
            {
            "memory": 80,
            "essential": true,
            "name": "gpu",
            "image": "nvidia/cuda:11.4.0-base-ubuntu20.04",
            "resourceRequirements": [
                {
                "type":"GPU",
                "value": "1"
                }
            ],
            "command": [
                "sh",
                "-c",
                "nvidia-smi"
            ],
            "cpu": 100
            }
        ],
        "family": "example-ecs-gpu"
    }

Conclusion

These steps are necessary in order to setup Ubuntu for ECS workloads on G5g instances. There may be other ARM-based GPU instances added, but this looks to be the only one for now. We can expect support for Amazon Linux once NVIDIA adds support, but there’s no confirmation when or even if that will be. It would also be nice to see additional OS support such as Rocky Linux to allow for more variety, but this ultimately depends on NVIDIA. Only time will tell what else we can use. For now, this is a working solution which you can use to setup your ECS workloads on ARM-based GPU EC2 instances.

Comments