Custom Docker Image
Kubernetes containers are typically running using non-root user for ensuring a high degree of security. As a result, installing an OS software package in the running pod is prohibited and hence, concrete planning is a must to make sure that the installed OS software and its associated drivers suffice the application’s requirement. This article describes the method to build a custom docker image by installing the compatible CUDA driver version in the CML (Cloudera Machine Learning) engine image so that the CML session pod can make use of the GPU card.
Prepare the Dockerfile as shown in the following example. The filename is cudatoolkit_for_tf.Dockerfile.
FROM docker.repository.cloudera.com/cloudera/cdsw/engine:16-cml-2022.01-2 USER root ARG OS=ubuntu1804 ARG cudnn_version=8.1.0.77 ARG cuda_version=cuda11.2 RUN wget https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/cuda-${OS}.pin RUN mv cuda-${OS}.pin /etc/apt/preferences.d/cuda-repository-pin-600 RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/3bf863cc.pub RUN add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/${OS}/x86_64/ /" RUN apt-get update RUN apt-get install -y cuda-toolkit-11-2 RUN apt-get install -y libcudnn8=${cudnn_version}-1+${cuda_version} RUN apt-get install -y libcudnn8-dev=${cudnn_version}-1+${cuda_version} RUN apt-get install -y python3 RUN apt-get install -y python3-pip RUN python3 -m pip install tensorflow
Build the docker image.
docker build --network host -t nexus.cdpkvm.cldr:9999/cdppvcds140/cudatoolkit_tf:11.3 . -f cudatoolkit_for_tf.Dockerfile
Upon successful build, push it into the Nexus registry. In this example, the Nexus docker registry hosts the container image.
docker image push nexus.cdpkvm.cldr:9999/cdppvcds140/cudatoolkit_tf:11.3
In CML, navigate to Site Administration > Runtime/Engine and add the new docker image under the
Engine Images
section.Assuming a CML project has already been created, navigate to Project Settings > Runtime/Engine of the project. Select the custom image.
Add the environment variable to inform the OS about the location of the CUDA library for this project.
Create a CML session with GPU in the resource profile.
Run the following Tensorflow code to detect the presence of the GPU card.