Nvidia GPU Dashboard
One might wonder why the system takes a long time to train a model using GPU. One of the possible reasons could be GPU card is not fully utilized to its maximum capability due to how the script/code is written. Question is how does one know if the GPU card is firing its cylinders to its full potential? Let’s do an experiment using Cloudera Machine Learning (CML) on Kubernetes platform powered by Openshift 4.8. CML is embedded with workbench
and Jupyterlab
notebook IDE for data scientist to do coding, EDA, etc. The experiment makes use of Jupyterlab
notebook to explore the performance output, e.g. whether the code is fully/under utilizing the allocated GPU resource.
In this case, Jupyterlab-nvdashboard is to be installed to give an overview of the GPU utilization when running the code in the Jupyterlab
notebook.
Create a CML Jupyterlab session with 2 CPU/16 GiB memory and 1 GPU profile.
Open a
Terminal Access
box of the CML session and install theJupyterlab-nvdashboard
Python module.pip3 install jupyterlab-nvdashboard
Upon successful installation, you can open and view the Nvidia real-time GPU dashboard. Run a sample code to train a model using GPU card and monitor the graph.
To validate the dashboard, you may also install and run the Python module
gpustat
to check the actual utilization percentage at any point of time.