CentOS7 Tensorflow GPU开发环境配置

December 13, 2018 511 words 2 minutes

Contents

之所以安装centOS7，也是因为它支持CUDA10.0. 之前一直听说过tensorflow和GPU编程，也一直没有机会实践。
现在就尝试一下搭建支持GPU的tensorflow的环境吧。

1.安装DOCKER

Uninstall old versions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


sudo yum remove docker \
                  docker-client \
                  docker-client-latest \
                  docker-common \
                  docker-latest \
                  docker-latest-logrotate \
                  docker-logrotate \
                  docker-selinux \
                  docker-engine-selinux \
                  docker-engine 

1
2
3
4
5
6
7
8
9


sudo yum install -y yum-utils \
  device-mapper-persistent-data \
  lvm2 
sudo yum-config-manager \
    --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install docker-ce
sudo systemctl start docker
sudo docker run hello-world

Manage Docker as a non-root user

1
2
3
4
5


# Create the docker group.
 sudo groupadd docker
#Add your user to the docker group.
sudo usermod -aG docker $USER
#Log out and log back in so that your group membership is re-evaluated.

2. 安装CUDA（其实只用安装CUDA-driver就行）

2.1 下载CUDA rpm

2.2 Verify You Have a CUDA-Capable GPU

1

lspci | grep -i nvidia

2.3 Verify You Have a Supported Version of Linux

1

uname -m && cat /etc/*release

2.4 Verify the System Has gcc Installed

1

gcc --version

2.5 Verify the System has the Correct Kernel Headers and Development Packages Installed

1
2


uname -r
sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)

2.6 Download the NVIDIA CUDA Toolkit

The NVIDIA CUDA Toolkit is available at http://developer.nvidia.com/cuda-downloads.

1
2
3
4
5
6
7


# Install repository meta-data
sudo rpm --install cuda-repo-<distro>-<version>.<architecture>.rpm
## Clean Yum repository cache
sudo yum clean expire-cache
## Install CUDA
sudo yum install cuda
export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}

3.安装nvidia-docker

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo yum remove nvidia-docker

# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \
  sudo tee /etc/yum.repos.d/nvidia-docker.repo

# Install nvidia-docker2 and reload the Docker daemon configuration
sudo yum install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

##  docker GPU
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
docker run --runtime=nvidia -it --rm tensorflow/tensorflow:latest-gpu \
   python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"

参考

Docker 教程 http://www.runoob.com/docker/docker-tutorial.html
Tensorflow https://tensorflow.google.cn/
nvidia-docker https://github.com/NVIDIA/nvidia-docker