在 K8S 上簡單實現 Nvidia GPU Time-Slicing
Nvidia 的 GPU 目前是市場上使用的主流,在雲的世界裡面,由於大部分的使用場景是按需 (On Demand),因此 K8S 慢慢地也是雲端管理資源的一個利器,如何在 Kubernetes 上調用 GPU 的資源相對地也越來越普遍,本篇整理了目前網路上可以看到 Nvidia GPU 於操作方法,並且介紹一種簡單實現 GPU Time-Slicing 的設定。
在 Kubernetes 上使用 GPU 的方法
由於檸檬爸使用的雲端環境主要是 Azure,所以從 Use GPUs for compute-intensive workloads on Azure Kubernetes Service (AKS) 的文件出發,可以總結在 K8S 上面使用 GPU 有以下兩種做法:
- 複雜且消耗資源的做法:Nvidia GPU Operator
- 簡單但比較受限的做法:直接部署 Nvidia Device Plugin
Nvidia GPU Operator
Nvidia GPU Operator 是一個比較全面部署 GPU 相關軟件在 K8S 上面的管理套件,除了 Nvidia Device Plugin 之外,GPU Operator 還可以依照需求按照以下順序幫忙叢集安裝以下程式:
- Nvidia Driver Installer
- Nvidia Container Toolkit Installer
- Nvidia Device Plugin
- DCGM Exporter
根據網站的介紹,安裝完成以後,應該要可以看到以下的 Pods 列表。
root@test:~# kubectl -n gpu-operator get pods
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-jdqpb 1/1 Running 0 35d
gpu-operator-67f8b59c9b-k989m 1/1 Running 6 (35d ago) 35d
nfd-node-feature-discovery-gc-5644575d55-957rp 1/1 Running 6 (35d ago) 35d
nfd-node-feature-discovery-master-5bd568cf5c-c6t9s 1/1 Running 6 (35d ago) 35d
nfd-node-feature-discovery-worker-sqb7x 1/1 Running 6 (35d ago) 35d
nvidia-container-toolkit-daemonset-rqgtv 1/1 Running 0 35d
nvidia-cuda-validator-9kqnf 0/1 Completed 0 35d
nvidia-dcgm-exporter-8mb6v 1/1 Running 0 35d
nvidia-device-plugin-daemonset-7nkjw 1/1 Running 0 35d
nvidia-driver-daemonset-5.15.0-105-generic-ubuntu22.04-g5dgx 1/1 Running 5 (35d ago) 35d
nvidia-operator-validator-6mqlm 1/1 Running 0 35d
只安裝 Nvidia Device Plugin
在 AKS 上,可以選擇只安裝 Nvidia Device Plugin,以下官方提供的 nvidia-device-plugin.yaml,透過 kubectl apply -f nvidia-device-plugin.yaml 指令安裝就可以部署 Device Plugin 的 DaemonSet,在
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: kube-system
spec:
selector:
matchLabels:
name: nvidia-device-plugin-ds
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
name: nvidia-device-plugin-ds
spec:
tolerations:
- key: "sku"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
# Mark this pod as a critical add-on; when enabled, the critical add-on
# scheduler reserves resources for critical add-on pods so that they can
# be rescheduled after a failure.
# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
priorityClassName: "system-node-critical"
containers:
- image: nvcr.io/nvidia/k8s-device-plugin:v0.15.0
name: nvidia-device-plugin-ctr
env:
- name: FAIL_ON_INIT_ERROR
value: "false"
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
另外如果想要設定 Time-Slicing 則可以利用以下的 time-slicing.yaml 檔先創建一個 ConfigMap,在參考連結之後,稍微修改以上的 nvidia-device-plugin.yaml 得到新的 Daemonsets 部署設定。
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: kube-system
spec:
selector:
matchLabels:
name: nvidia-device-plugin-ds
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
name: nvidia-device-plugin-ds
spec:
tolerations:
- key: nvidia.com/gpu
operator: Exists
effect: NoSchedule
# Mark this pod as a critical add-on; when enabled, the critical add-on
# scheduler reserves resources for critical add-on pods so that they can
# be rescheduled after a failure.
# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
priorityClassName: "system-node-critical"
containers:
- image: nvcr.io/nvidia/k8s-device-plugin:v0.14.0
name: nvidia-device-plugin-ctr
env:
- name: CONFIG_FILE
value: "/opt/config/config.yaml"
securityContext:
privileged: true
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: config
mountPath: "/opt/config"
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: config
configMap:
name: nvidia-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: nvidia-config
namespace: kube-system
labels:
app: nvidia
data:
config.yaml: |-
version: v1
flags:
migStrategy: "none"
failOnInitError: false
nvidiaDriverRoot: "/"
plugin:
passDeviceSpecs: true
sharing:
timeSlicing:
resources:
- name: nvidia.com/gpu
replicas: 10