Kubernetes

Kubernetes GPU 서버 설치(CUDA, cuDNN, Docker, k8s)

김 정출 2021. 11. 17. 10:23

Kubernetes GPU 서버 설치(CUDA, cuDNN, Docker, k8s)

 

 

Kubernetes Master

Docker

$ apt install apt-transport-https ca-certificates curl software-properties-common

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

$ apt-key fingerprint 0EBFCD88

$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

$ sudo apt update

$ sudo apt install docker-ce

$ docker -v

$ sudo usemod -aG docker dev

docker systemd 설정

$ sudo vi /lib/systemd/system/docker.service

ExecStart=/usr/bin/dockerd --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd

$ sudo systemctl daemon-reload

$ sudo systemctl restart docker

$ sudo docker info | grep -i cgroup

$ swapoff -a

$ sed -i '/swap.img/s/^/#/' /etc/fstab

Kubernetes

$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

$ echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list

$ sudo apt-get update

$ sudo apt-get install -y kubelet kubeadm kubectl kubernetes-cni

$ mkdir -p $HOME/.kube

$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

k8s Master 초기화 컨트롤 플레인 초기화

$ kubeadm init --pod-network-cidr=10.244.0.0/16 

kubeadm join 192.168.1.76:6443 --token otsgef.wuowl1gpss0yklu7 --discovery-token-ca-cert-hash sha256:7373b196d1f447098e4b411cad73bf372308c0e2b5e98a905bbce06c069d4e9d 

 

Kubernetes Worker

NVIDIA Driver

$ sudo apt install ubuntu-drivers-common

$ sudo ubuntu-drivers devices

$ sudo apt-get install nvidia-driver-470

$ sudo reboot

$ nvidia-smi

CUDA

$ dpkg -l | grep nvidia

$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin

$ sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600

$ wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda-repo-ubuntu1804-11-4-local_11.4.0-470.42.01-1_amd64.deb

$ sudo dpkg -i cuda-repo-ubuntu1804-11-4-local_11.4.0-470.42.01-1_amd64.deb

$ sudo apt-key add /var/cuda-repo-ubuntu1804-11-4-local/7fa2af80.pub

$ sudo apt-get update

$ sudo apt-get -y install cuda

nvidia-cuda-toolkit

$ sudo apt install nvidia-cuda-toolkit

$ nvcc --version

cuDNN

$ tar -xvf cudnn-11.4-linux-x64-v8.2.4.15.tgz

$ sudo cp ./cuda/include/* /usr/local/cuda-11.4/include

$ sudo cp -P ./cuda/lib64/* /usr/local/cuda-11.4/lib64 # -P 옵션으로 symoblic link 유지

$ sudo chmod a+r /usr/local/cuda-11.4/lib64/libcudnn*

Docker

$ apt install apt-transport-https ca-certificates curl software-properties-common

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

$ apt-key fingerprint 0EBFCD88

$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"

$ sudo apt update

$ sudo apt install docker-ce

systemd

$ sudo vi /lib/systemd/system/docker.service

ExecStart=/usr/bin/dockerd --exec-opt native.cgroupdriver=systemd

$ sudo systemctl daemon-reload

$ sudo systemctl restart docker

$ sudo swapoff -a

$ sudo sed -i '/swap.img/s/^/#/' /etc/fstab

$ docker info | grep -i cgroup

 

NVIDIA Docker

$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

 

 

$ sudo apt-get install -y nvidia-docker2

$ sudo systemctl restart docker

$ docker run --gpus all --rm nvidia/cuda:11.4.2-base-ubuntu18.04 nvidia-smi

Kubernetes

$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

$ echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list

$ sudo apt-get update

$ sudo apt-get install -y kubelet kubeadm kubectl kubernetes-cni

 

 

k8s Worker로 생성

$ sudo kubeadm join 192.168.1.76:6443 --token otsgef.wuowl1gpss0yklu7 --discovery-token-ca-cert-hash sha256:7373b196d1f447098e4b411cad73bf372308c0e2b5e98a905bbce06c069d4e9d 

기존의 Token이 만료가 되었다면

$ kubeadm token create
p2em1w.bgblttk85m6z006j

$ openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
7373b196d1f447098e4b411cad73bf372308c0e2b5e98a905bbce06c069d4e9d  

 

k8s Master 노드 확인

$ kubectl get nodes

$ kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml

$ kubectl get nodes

k8s NVIDIA Device Plugin

worker-gpu-node $ sudo vi /etc/docker/daemon.json

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

$ sudo systemctl restart docker

$ docker info

Enabling GPU Support 

$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.10.0/nvidia-device-plugin.yml

$ kubectl get pods --all-namespaces

GPU Pod Test - Vector Addition using CUDA Device

$ cd /home/dev/source/

$ mkdir k8s-pod-gpu-example && cd k8s-pod-gpu-example

$ vi gpu-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: gpu-operator-test
spec:
  restartPolicy: OnFailure
  containers:
  - name: cuda-vector-add
    image: "nvidia/samples:vectoradd-cuda10.2"
    resources:
      limits:
        nvidia.com/gpu: 1

$ kubectl apply -f /home/dev/source/k8s-pod-gpu-example/gpu-pod.yaml

 

감사합니다.