How to Autoscale Kubernetes Pods Based on GPU

May 31, 2022
Share this post
Sharing to FacebookSharing to LinkedInSharing to XSharing to Email

There are several resources available on the internet on how to scale your Kubernetes pods based on CPU, but when it comes to Kubernetes pods based on GPU, it’s hard to find a concise breakdown that outlines each step and how to test. In this article, we outline the steps to scale Kubernetes pods based on GPU metrics. These steps are performed on a AKS (Azure Kubernetes Service), but work well with most cloud service providers, as well as, with self managed clusters.

For this tutorial, you will need an API key. Contact us to download yours.


Step 0: Prerequisites

Kubernetes cluster

You’ll need to have a Kubernetes cluster up and running for this tutorial. To set up an AKS cluster, see this guide from Azure.

Note: Your cluster should have at least 2 GPU enabled nodes.

Kubectl

To manage Kubernetes resources, set up the kubectl command line client.

Here is a guide to install kubectl if you haven’t installed it already.

Helm

Helm is used to manage packaging, configuration and deployment of resources to the Kubernetes cluster. In this tutorial we’ll make use of the helm.

Use this guide and follow your OS specific installation instructions.

Step 1: Install metrics server

Now that we have prerequisites installed and setup, we’ll move ahead with installing Kubernetes plugins and tools to set up auto scaling based on GPU metrics.

Metrics server collects various resource metrics from Kubelet and exposes it via a metrics API of Kubernetes. Most of the cloud (ie. AKS), as well as the local distribution of Kubernetes, have this metrics-server already installed. If you’re not sure, follow the instructions to check and install it.

  1. To check if you have metrics-service running



kubectl get pods -A | grep metrics-server


If the metrics-server is installed, you should see an output like this.



kube-system metrics-server-774f99dbf4-tjw6l


  1. In case you don’t have it installed, use the following command to install it.



kubectl apply -f <br>https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml


Step 2: Install nvidia device plugin

The Nvidia device plugin for Kubernetes is a Daemonset that allows you to run GPU enabled containers in your cluster.

Install it using the following command:



kubectl create -f <br>https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.11.0/nvidia-device-plugin.yml

To learn more about the Nvidia device plugin, see this resource here.

Step 3: Install dcgm exporter

DCGM-Exporter collects GPU telemetry using Go bindings of NVIDIA’s API and allows you to monitor health and utilization of GPU. It exposes an easy to consume http endpoint (/metrics) for monitoring tools like Prometheus.

Run the following command to install dcgm-exporter:



kubectl create -f <br>https://raw.githubusercontent.com/NVIDIA/dcgm-exporter/master/dcgm-exporter.yaml


Once it is running, you can try to query its /metrics endpoint.

First, forward port 9400 of dcgm-exporter service. (Run this command in a separate terminal)



kubectl port-forward svc/dcgm-exporter 9400:9400


Query /metrics endpoint.



curl localhost:9400/metrics


Step 4: Install kube-prometheus-stack

Next, install the prometheus stack using the kube-prometheus-stack.values. This value file has some changes that are suggested by NVIDIA (to make prometheus available to local machines) and an additionalScrapeConfigs which create a job to scrape the metrics exported by dcgm-exporter.

Find the kube-prometheus-stack.values file below.

Add & update the helm repo:



helm repo add prometheus-community <br>https://prometheus-community.github.io/helm-charts<br>helm repo update


Once we have the helm repo set up, inspect the helm chart and modify the settings.



<br>helm inspect values prometheus-community/kube-prometheus-stack &gt; /tmp/kube-prometheus-stack.values


In the Prometheus instance section of the chart, update the service type from ClusterIP to NodePort. This change will allow Prometheus server to be available at your local machine at port 30900.



From:<br> ## Port to expose on each node<br> ## Only used if service.type is 'NodePort'<br> ##<br> nodePort: 30090<br> ## Loadbalancer IP<br> ## Only use if service.type is "loadbalancer"<br> loadBalancerIP: ""<br> loadBalancerSourceRanges: []<br> ## Service type<br> ##<br> type: ClusterIP<br>To:<br> ## Port to expose on each node<br> ## Only used if service.type is 'NodePort'<br> ##<br> nodePort: 30090<br> ## Loadbalancer IP<br> ## Only use if service.type is "loadbalancer"<br> loadBalancerIP: ""<br> loadBalancerSourceRanges: []<br> ## Service type<br> ##<br> type: NodePort


Update the value of serviceMonitorSelectorNilUsesHelmValues to false.



## If true, a nil or {} value for prometheus.prometheusSpec.serviceMonitorSelector<br>## will cause the prometheus resource to be created with selectors based on<br>## values in the helm deployment, which will also match the servicemonitors created<br>##<br>serviceMonitorSelectorNilUsesHelmValues: false


Add this configMap to the additionalScrapeConfigs section of the helm chart.



<br>additionalScrapeConfigs:<br>- job_name: gpu-metrics<br> scrape_interval: 1s<br> metrics_path: /metrics<br> scheme: http<br> kubernetes_sd_configs:<br> - role: endpoints<br> namespaces:<br> names:<br> - default<br> relabel_configs:<br> - source_labels: [__meta_kubernetes_pod_node_name]<br> action: replace<br> target_label: kubernetes_node


Once you have your helm chart ready, install kube-prometheus-stack via Helm.



helm install prometheus-community/kube-prometheus-stack <br> --create-namespace --namespace prometheus <br> --generate-name <br> --values /tmp/kube-prometheus-stack.values


After installation is finished, your output should look like this.



NAME: kube-prometheus-stack-1652691100<br>LAST DEPLOYED: Mon May 16 14:22:12 2022<br>NAMESPACE: prometheus<br>STATUS: deployed<br>REVISION: 1<br>NOTES:<br>kube-prometheus-stack has been installed. Check its status by running:<br> kubectl --namespace prometheus get pods -l "release=kube-prometheus-stack-1652691100"


Step 5: Install prometheus-adapter

Now we’ll install the prometheus-adapter . The adapter gathers available metrics from Prometheus at a regular interval.



prometheus_service=$(kubectl get svc -nprometheus -lapp=kube-prometheus-stack-prometheus -ojsonpath='{range .items[*]}{.metadata.name}{"n"}{end}')<br>helm upgrade <br>--install prometheus-adapter prometheus-community/prometheus-adapter <br>--set rbac.create=true,prometheus.url=http://${prometheus_service}.prometheus.svc.cluster.local,prometheus.port=9090


This will take a moment to set up, after it’s up, you should be able to make

Step 6: Create a HPA which scales based on GPU

Now that all the pieces are available, create a HorizontalPodAutoscaler and configure it to scale on the bases of GPU utilization metric (DCGM_FI_DEV_GPU_UTIL)



apiVersion: autoscaling/v2beta2<br>kind: HorizontalPodAutoscaler<br>metadata:<br> name: my-gpu-app<br>spec:<br> maxReplicas: 3 # Update this accordingly<br> minReplicas: 1<br> scaleTargetRef:<br> apiVersion: apps/v1beta1<br> kind: Deployment<br> name: my-gpu-app<br> metrics:<br> - type: Pods # scale based on gpu<br> pods:<br> metric:<br> name: DCGM_FI_DEV_GPU_UTIL<br> target:<br> type: AverageValue<br> averageValue: 80

There are other GPU metrics available than just DCGM_FI_DEV_GPU_UTIL. Find a complete list of available metrics in their docs.

Step 7: Create a LoadBalancer service (Optional)

This is an optional step to expose your app to the web. If you are setting up your cluster using a cloud service provider, there’s a good chance that it’ll allocate a public IP address, which you can use to interact with your application. Alternatively, you can create a service of type nodePort and access your app via that.



apiVersion: v1<br>kind: Service<br>metadata:<br> name: app-ip<br> labels:<br> component: app<br>spec:<br> type: LoadBalancer<br> selector:<br> component: app<br> ports:<br> - name: http<br> port: 80<br> targetPort: 8080

In this configuration, we are assuming that our app runs on port 8080 and we are mapping it to port 80 of the pod.

Step 8: Summing it all up together

Now that we have all the external pieces that we need, let’s create a Kubernetes manifest file and save it as autoscaling-demo.yml.

For demonstration we’ll use container image of deid application, this is Private AI’s container based de-identification system. You can use any GPU based application of your choice.



apiVersion: apps/v1<br>kind: Deployment<br>metadata:<br>name: my-gpu-app<br>spec:<br>replicas: 1<br>selector:<br> matchLabels:<br> component: app<br>template:<br> metadata:<br> labels:<br> component: app<br> spec:<br> containers:<br> - name: app<br> securityContext:<br> capabilities: # SYS_ADMIN capabilities needed for DCMG Exporter<br> add:<br> - SYS_ADMIN<br> resources:<br> limits:<br> nvidia.com/gpu: 1<br> image: privateai/deid:2.11full_gpu # You can use any GPU based image<br>---<br>apiVersion: v1<br>kind: Service<br>metadata:<br>name: app-ip<br>labels:<br> component: app<br>spec:<br>type: LoadBalancer<br>selector:<br> component: app<br>ports:<br> - name: http<br> port: 80<br> targetPort: 8080 # The port might be different for your application<br>---<br>apiVersion: autoscaling/v2beta2<br>kind: HorizontalPodAutoscaler<br>metadata:<br>name: my-gpu-app<br>spec:<br>maxReplicas: 2 # Update this according to your desired number of replicas<br>minReplicas: 1<br>scaleTargetRef:<br> apiVersion: apps/v1beta1<br> kind: Deployment<br> name: my-gpu-app<br>metrics:<br> - type: Pods<br> pods:<br> metric:<br> name: DCGM_FI_DEV_GPU_UTIL<br> target:<br> type: AverageValue<br> averageValue: 30<br>


Step 9: Create a deployment

Run kubectl create command to create your deployment.



kubectl create -f deid.yaml<br>


Once your deployment is complete, you should be able to see the running status of pods and our HorizontalPodAutoscaler, which will scale based on GPU utilization.

To check the status of pods



$ kubectl get pods<br>NAME READY STATUS RESTARTS AGE<br>dcgm-exporter-6bjn8 1/1 Running 0 3h37m<br>dcgm-exporter-xmn74 1/1 Running 0 3h37m<br>my-gpu-app-675b967d56-q7swb 1/1 Running 0 12m<br>prometheus-adapter-6696b6d76-g2csx 1/1 Running 0 104m


To check the status of Horizontal Pod Autoscaler



$ kubectl get hpa<br>NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE<br>my-gpu-app Deployment/my-gpu-app 0/30 1 2 1 2m15s


Getting your public/external ip



$ kubectl get svc<br>NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE<br>app-ip LoadBalancer 10.0.208.227 20.233.60.124 80:31074/TCP 15s<br>dcgm-exporter ClusterIP 10.0.116.180 9400/TCP 3h55m<br>kubernetes ClusterIP 10.0.0.1 443/TCP 4h26m<br>prometheus-adapter ClusterIP 10.0.12.96 443/TCP 122m


20.233.60.124 is your IP.

Step 10: Test autoscaling

Increase the GPU utilization by making requests to the application. When the average GPU utilization (target) crosses 30, max average utilization set by us, you’ll observe that the application will scale up and spin another pod.

Making a request to your app

Here we are making a request to /deidentiy_text endpoint of our deid container. You can make a request to any resource which utilizes GPU.



for ((i=1;i&lt;=10;i++)); <br>do curl -X POST http://20.233.60.124/deidentify_text <br>-H &#039;content-type: application/json&#039; -d &#039;{&quot;text&quot;: [&quot;My name is John and my friend is Grace&quot;, &quot;I live in Berlin&quot;], &quot;unique_pii_markers&quot;: false, &quot;key&quot;: “”}' &amp;; <br>done


Need an API key? Contact us to download yours.

Meanwhile keep observing the status of horizontal pod autoscaler. When the GPU utilization (target) crosses 30, the system will automatically spin up another instance of pod.



$ kubectl get hpa<br>NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE<br>my-gpu-app Deployment/my-gpu-app 40/30 2 2 1 30m<br>


Check the status of pods, you’ll notice that now we have another my-gpu-app spinned up by our autoscaler.



$ kubectl get pods<br>NAME READY STATUS RESTARTS AGE<br>dcgm-exporter-6bjn8 1/1 Running 0 3h37m<br>dcgm-exporter-xmn74 1/1 Running 0 3h37m<br>my-gpu-app-675b967d56-q7swb 1/1 Running 0 30m<br>my-gpu-app-572f924e36-q7swb 1/1 Running 0 5m<br>prometheus-adapter-6696b6d76-g2csx 1/1 Running 0 104m<br>


Additional resources for Kubernetes GPU deployment

Interested in receiving more tech tips like autoscaling Kubernetes pods based on GPU? Sign up for Private AI’s mailing list to get notified about the latest information on machine learning deployment, privacy, and more.

Sign up for our Community API

The “get to know us” plan. Our full product, but limited to 75 API calls per day and hosted by us.

Get Started Today

Data Left Behind: AI Scribes’ Promises in Healthcare

Data Left Behind: Healthcare’s Untapped Goldmine

The Future of Health Data: How New Tech is Changing the Game

Why is linguistics essential when dealing with healthcare data?

Why Health Data Strategies Fail Before They Start

Private AI to Redefine Enterprise Data Privacy and Compliance with NVIDIA

EDPB’s Pseudonymization Guideline and the Challenge of Unstructured Data

HHS’ proposed HIPAA Amendment to Strengthen Cybersecurity in Healthcare and how Private AI can Support Compliance

Japan's Health Data Anonymization Act: Enabling Large-Scale Health Research

What the International AI Safety Report 2025 has to say about Privacy Risks from General Purpose AI

Private AI 4.0: Your Data’s Potential, Protected and Unlocked

How Private AI Facilitates GDPR Compliance for AI Models: Insights from the EDPB's Latest Opinion

Navigating the New Frontier of Data Privacy: Protecting Confidential Company Information in the Age of AI

Belgium’s Data Protection Authority on the Interplay of the EU AI Act and the GDPR

Enhancing Compliance with US Privacy Regulations for the Insurance Industry Using Private AI

Navigating Compliance with Quebec’s Act Respecting Health and Social Services Information Through Private AI’s De-identification Technology

Unlocking New Levels of Accuracy in Privacy-Preserving AI with Co-Reference Resolution

Strengthened Data Protection Enforcement on the Horizon in Japan

How Private AI Can Help to Comply with Thailand's PDPA

How Private AI Can Help Financial Institutions Comply with OSFI Guidelines

The American Privacy Rights Act – The Next Generation of Privacy Laws

How Private AI Can Help with Compliance under China’s Personal Information Protection Law (PIPL)

PII Redaction for Reviews Data: Ensuring Privacy Compliance when Using Review APIs

Independent Review Certifies Private AI’s PII Identification Model as Secure and Reliable

To Use or Not to Use AI: A Delicate Balance Between Productivity and Privacy

To Use or Not to Use AI: A Delicate Balance Between Productivity and Privacy

News from NIST: Dioptra, AI Risk Management Framework (AI RMF) Generative AI Profile, and How PII Identification and Redaction can Support Suggested Best Practices

Handling Personal Information by Financial Institutions in Japan – The Strict Requirements of the FSA Guidelines

日本における金融機関の個人情報の取り扱い - 金融庁ガイドラインの要件

Leveraging Private AI to Meet the EDPB’s AI Audit Checklist for GDPR-Compliant AI Systems

Who is Responsible for Protecting PII?

How Private AI can help the Public Sector to Comply with the Strengthening Cyber Security and Building Trust in the Public Sector Act, 2024

A Comparison of the Approaches to Generative AI in Japan and China

Updated OECD AI Principles to keep up with novel and increased risks from general purpose and generative AI

Is Consent Required for Processing Personal Data via LLMs?

The evolving landscape of data privacy legislation in healthcare in Germany

The CIO’s and CISO’s Guide for Proactive Reporting and DLP with Private AI and Elastic

The Evolving Landscape of Health Data Protection Laws in the United States

Comparing Privacy and Safety Concerns Around Llama 2, GPT4, and Gemini

How to Safely Redact PII from Segment Events using Destination Insert Functions and Private AI API

WHO’s AI Ethics and Governance Guidance for Large Multi-Modal Models operating in the Health Sector – Data Protection Considerations

How to Protect Confidential Corporate Information in the ChatGPT Era

Unlocking the Power of Retrieval Augmented Generation with Added Privacy: A Comprehensive Guide

Leveraging ChatGPT and other AI Tools for Legal Services

Leveraging ChatGPT and other AI tools for HR

Leveraging ChatGPT in the Banking Industry

Law 25 and Data Transfers Outside of Quebec

The Colorado and Connecticut Data Privacy Acts

Unlocking Compliance with the Japanese Data Privacy Act (APPI) using Private AI

Tokenization and Its Benefits for Data Protection

Private AI Launches Cloud API to Streamline Data Privacy

Processing of Special Categories of Data in Germany

End-to-end Privacy Management

Privacy Breach Reporting Requirements under Law25

Migrating Your Privacy Workflows from Amazon Comprehend to Private AI

A Comparison of the Approaches to Generative AI in the US and EU

Benefits of AI in Healthcare and Data Sources (Part 1)

Privacy Attacks against Data and AI Models (Part 3)

Risks of Noncompliance and Challenges around Privacy-Preserving Techniques (Part 2)

Enhancing Data Lake Security: A Guide to PII Scanning in S3 buckets

The Costs of a Data Breach in the Healthcare Sector and its Privacy Compliance Implications

Navigating GDPR Compliance in the Life Cycle of LLM-Based Solutions

What’s New in Version 3.8

How to Protect Your Business from Data Leaks: Lessons from Toyota and the Department of Home Affairs

New York's Acceptable Use of AI Policy: A Focus on Privacy Obligations

Safeguarding Personal Data in Sentiment Analysis: A Guide to PII Anonymization

Changes to South Korea’s Personal Information Protection Act to Take Effect on March 15, 2024

Australia’s Plan to Regulate High-Risk AI

How Private AI can help comply with the EU AI Act

Comment la Loi 25 Impacte l'Utilisation de ChatGPT et de l'IA en Général

Endgültiger Entwurf des Gesetzes über Künstliche Intelligenz – Datenschutzpflichten der KI-Modelle mit Allgemeinem Verwendungszweck

How Law25 Impacts the Use of ChatGPT and AI in General

Is Salesforce Law25 Compliant?

Creating De-Identified Embeddings

Exciting Updates in 3.7

EU AI Act Final Draft – Obligations of General-Purpose AI Systems relating to Data Privacy

FTC Privacy Enforcement Actions Against AI Companies

The CCPA, CPRA, and California's Evolving Data Protection Landscape

HIPAA Compliance – Expert Determination Aided by Private AI

Private AI Software As a Service Agreement

EU's Review of Canada's Data Protection Adequacy: Implications for Ongoing Privacy Reform

Acceptable Use Policy

ISO/IEC 42001: A New Standard for Ethical and Responsible AI Management

Reviewing OpenAI's 31st Jan 2024 Privacy and Business Terms Updates

Comparing OpenAI vs. Azure OpenAI Services

Quebec’s Draft Regulation Respecting the Anonymization of Personal Information

Version 3.6 Release: Enhanced Streaming, Auto Model Selection, and More in Our Data Privacy Platform

Brazil's LGPD: Anonymization, Pseudonymization, and Access Requests

LGPD do Brasil: Anonimização, Pseudonimização e Solicitações de Acesso à Informação

Canada’s Principles for Responsible, Trustworthy and Privacy-Protective Generative AI Technologies and How to Comply Using Private AI

Private AI Named One of The Most Innovative RegTech Companies by RegTech100

Data Integrity, Data Security, and the New NIST Cybersecurity Framework

Safeguarding Privacy with Commercial LLMs

Cybersecurity in the Public Sector: Protecting Vital Services

Privacy Impact Assessment (PIA) Requirements under Law25

Elevate Your Experience with Version 3.5

Fine-Tuning LLMs with a Focus on Privacy

GDPR in Germany: Challenges of German Data Privacy (Part 2)

Comply with US Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence using Private AI

How to Comply with EU AI Act using PrivateGPT