Kubernetes GPU 서버 설치(CUDA, cuDNN, Docker, k8s)
Kubernetes GPU 서버 설치(CUDA, cuDNN, Docker, k8s)
Kubernetes Master
Docker
$ apt install apt-transport-https ca-certificates curl software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ apt-key fingerprint 0EBFCD88
$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
$ sudo apt update
$ sudo apt install docker-ce
$ docker -v
$ sudo usemod -aG docker dev
docker systemd 설정
$ sudo vi /lib/systemd/system/docker.service
ExecStart=/usr/bin/dockerd --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker
$ sudo docker info | grep -i cgroup
$ swapoff -a
$ sed -i '/swap.img/s/^/#/' /etc/fstab
Kubernetes
$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
$ echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
$ sudo apt-get update
$ sudo apt-get install -y kubelet kubeadm kubectl kubernetes-cni
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
k8s Master 초기화 컨트롤 플레인 초기화
$ kubeadm init --pod-network-cidr=10.244.0.0/16
kubeadm join 192.168.1.76:6443 --token otsgef.wuowl1gpss0yklu7 --discovery-token-ca-cert-hash sha256:7373b196d1f447098e4b411cad73bf372308c0e2b5e98a905bbce06c069d4e9d
Kubernetes Worker
NVIDIA Driver
$ sudo apt install ubuntu-drivers-common
$ sudo ubuntu-drivers devices
$ sudo apt-get install nvidia-driver-470
$ sudo reboot
$ nvidia-smi
CUDA
$ dpkg -l | grep nvidia
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
$ sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ sudo dpkg -i cuda-repo-ubuntu1804-11-4-local_11.4.0-470.42.01-1_amd64.deb
$ sudo apt-key add /var/cuda-repo-ubuntu1804-11-4-local/7fa2af80.pub
$ sudo apt-get update
$ sudo apt-get -y install cuda
nvidia-cuda-toolkit
$ sudo apt install nvidia-cuda-toolkit
$ nvcc --version
cuDNN
$ tar -xvf cudnn-11.4-linux-x64-v8.2.4.15.tgz
$ sudo cp ./cuda/include/* /usr/local/cuda-11.4/include
$ sudo cp -P ./cuda/lib64/* /usr/local/cuda-11.4/lib64 # -P 옵션으로 symoblic link 유지
$ sudo chmod a+r /usr/local/cuda-11.4/lib64/libcudnn*
Docker
$ apt install apt-transport-https ca-certificates curl software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ apt-key fingerprint 0EBFCD88
$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
$ sudo apt update
$ sudo apt install docker-ce
systemd
$ sudo vi /lib/systemd/system/docker.service
ExecStart=/usr/bin/dockerd --exec-opt native.cgroupdriver=systemd
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker
$ sudo swapoff -a
$ sudo sed -i '/swap.img/s/^/#/' /etc/fstab
$ docker info | grep -i cgroup
NVIDIA Docker
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
$ sudo apt-get install -y nvidia-docker2
$ sudo systemctl restart docker
$ docker run --gpus all --rm nvidia/cuda:11.4.2-base-ubuntu18.04 nvidia-smi
Kubernetes
$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
$ echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
$ sudo apt-get update
$ sudo apt-get install -y kubelet kubeadm kubectl kubernetes-cni
k8s Worker로 생성
$ sudo kubeadm join 192.168.1.76:6443 --token otsgef.wuowl1gpss0yklu7 --discovery-token-ca-cert-hash sha256:7373b196d1f447098e4b411cad73bf372308c0e2b5e98a905bbce06c069d4e9d
기존의 Token이 만료가 되었다면
$ kubeadm token create
p2em1w.bgblttk85m6z006j
$ openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
7373b196d1f447098e4b411cad73bf372308c0e2b5e98a905bbce06c069d4e9d
k8s Master 노드 확인
$ kubectl get nodes
$ kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
$ kubectl get nodes
k8s NVIDIA Device Plugin
worker-gpu-node $ sudo vi /etc/docker/daemon.json
{ "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } } |
$ sudo systemctl restart docker
$ docker info
Enabling GPU Support
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.10.0/nvidia-device-plugin.yml
$ kubectl get pods --all-namespaces
GPU Pod Test - Vector Addition using CUDA Device
$ cd /home/dev/source/
$ mkdir k8s-pod-gpu-example && cd k8s-pod-gpu-example
$ vi gpu-pod.yaml
apiVersion: v1 kind: Pod metadata: name: gpu-operator-test spec: restartPolicy: OnFailure containers: - name: cuda-vector-add image: "nvidia/samples:vectoradd-cuda10.2" resources: limits: nvidia.com/gpu: 1 |
$ kubectl apply -f /home/dev/source/k8s-pod-gpu-example/gpu-pod.yaml
감사합니다.