为啥要自己编译
- 因为官方包不支持ubuntu22.04(系统自带gcc版本与glibc版本太高)
ImportError: /home/ubuntu/anaconda3/envs/paddle/lib/python3.9/site-packages/paddle/fluid/core_avx.so: undefined symbol: _dl_sym, version GLIBC_PRIVATE
环境
- 自带python环境(其实没啥影响,只是展示一下)
$ /usr/bin/python3 --version
Python 3.10.4
- cmake环境(建议版本装高一下,貌似要3.19以上)
cmake --version
cmake version 3.22.1
CMake suite maintained and supported by Kitware (kitware.com/cmake).
$ gcc --version
gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04 LTS"
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0
- cudnn环境
- 由于conda安装的gcc不会读取系统环境的c/c++ include,所以cudnn只能用tar包的方式安装。
- 选择的tar.xz的包为:cudnn-linux-x86_64-8.4.1.50_cuda11.6-archive.tar.xz
- 简易安装教程如下:
tar -xvf cudnn-linux-x86_64-8.4.1.50_cuda11.6-archive.tar.xz
cd cudnn-linux-x86_64-8.4.1.50_cuda11.6-archive
sudo cp -r include/* /usr/local/cuda/include/
sudo cp -r lib/* /usr/local/cuda/lib64
# 刷新库缓存,并查看安装结果
sudo ldconfig -v | grep libcudnn
# 结果如下,已能识别到cudnn, 8.4.1
/sbin/ldconfig.real: Path `/usr/lib' given more than once
(from <builtin>:0 and <builtin>:0)
libcudnn_ops_train.so.8 -> libcudnn_ops_train.so.8.4.1
libcudnn_cnn_train.so.8 -> libcudnn_cnn_train.so.8.4.1
libcudnn_ops_infer.so.8 -> libcudnn_ops_infer.so.8.4.1
libcudnn_adv_infer.so.8 -> libcudnn_adv_infer.so.8.4.1
libcudnn.so.8 -> libcudnn.so.8.4.1
libcudnn_adv_train.so.8 -> libcudnn_adv_train.so.8.4.1
libcudnn_cnn_infer.so.8 -> libcudnn_cnn_infer.so.8.4.1
tar -xvf nccl_2.13.4-1+cuda11.7_x86_64.txz
cd nccl_2.13.4-1+cuda11.7_x86_64
sudo cp -r include/* /usr/local/cuda/include/
sudo cp -r lib/* /usr/local/cuda/lib64
- 源码安装(推荐,毕竟用自己的cuda编译出来的兼容性更好一些)
git clone https://github.com/NVIDIA/nccl.git
cd nccl
git checkout v2.13.4-1
make pkg.txz.build -j12
# 如果出现大量sm35弃用警告,可以删除makefiles/common.mk中-gencode=arch=compute_35,code=sm_35,不删也没关系。
# 修改前
CUDA8_GENCODE = -gencode=arch=compute_35,code=sm_35 \
-gencode=arch=compute_50,code=sm_50 \
-gencode=arch=compute_60,code=sm_60 \
-gencode=arch=compute_61,code=sm_61
# 修改后
CUDA8_GENCODE = -gencode=arch=compute_50,code=sm_50 \
-gencode=arch=compute_60,code=sm_60 \
-gencode=arch=compute_61,code=sm_61
# 编译大概需要20分钟左右。
cd build/pkg/txz
tar -xvf nccl_2.13.4-1+cuda11.6_x86_64.txz
cd nccl_2.13.4-1+cuda11.6_x86_64
sudo cp -r include/* /usr/local/cuda/include/
sudo cp -r lib/* /usr/local/cuda/lib64
准备工作
conda create -n paddle python==3.9.12
conda activate paddle
find `dirname $(dirname $(which python3))` -name "libpython3.so" > /tmp/temp1 && export PYTHON_LIBRARY=$(cat /tmp/temp1 | xargs -L 1)
export PATH=${PYTHON_LIBRARY}:$PATH
find `dirname $(dirname $(which python3))`/include -name "python3.9" > /tmp/temp2 && export PYTHON_INCLUDE_DIRS=$(cat /tmp/temp2 | xargs -L 1)
export PYTHON3_EXECUTABLE=$(for dirname in `whereis python3`; do echo $dirname > /tmp/tmp3 | cat /tmp/tmp3 | grep env ; done;)
echo PYTHON_LIBRARY=${PYTHON_LIBRARY}
echo PYTHON_INCLUDE_DIRS=${PYTHON_INCLUDE_DIRS}
echo PYTHON3_EXECUTABLE=${PYTHON3_EXECUTABLE}
# 结果如下
PYTHON_LIBRARY=/home/tlntin/anaconda3/envs/paddle/lib/libpython3.so
PYTHON_INCLUDE_DIRS=/home/tlntin/anaconda3/envs/paddle/include/python3.9
PYTHON3_EXECUTABLE=/home/tlntin/anaconda3/envs/paddle/bin/python3
pip install numpy
export PYTHON3_NUMPY_INCLUDE_DIRS=`python -c "import numpy as np; print(np.__path__[0] + '/core/include')"`
echo PYTHON3_NUMPY_INCLUDE_DIRS=$PYTHON3_NUMPY_INCLUDE_DIRS
pip install protobuf==3.20.0
pip install patchelf
- 安装gcc-8,g++-8,glibc-2.17(因为paddle用的protobuf最高只支持gcc-8编译器)
# 建议用代理运行,不然比较慢
# 设置代理
conda config --set proxy_servers.http http://xxxx
# 安装
conda install -c conda-forge gcc=8 gxx=8 sysroot_linux-64=2.17
- 重新检查你的gcc/g++版本(只影响虚拟环境,不影响系统环境)
$ gcc --version
gcc (conda-forge gcc 8.5.0-16) 8.5.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ g++ --version
g++ (conda-forge gcc 8.5.0-16) 8.5.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
- 安装yaml(编译过程中提示找不到yaml模块,所以安装一下)
pip install pyyaml
编译过程
- 拉取源码,切换最新分支
git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
git checkout release/2.3
- 创建并进入build
mkdir build && cd build
- 设置目标paddle版本
export PADDLE_VERSION="2.3.1"
- 准备编译(未开启TensorRT)
cmake .. \
-DWITH_CONTRIB=OFF \
-DWITH_MKL=ON \
-DWITH_MKLDNN=ON \
-DWITH_TESTING=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DWITH_INFERENCE_API_TEST=OFF \
-DWITH_GPU=ON \
-DCUDNN_ROOT=/usr/local/cuda \
-DON_INFER=ON \
-DWITH_PYTHON=ON \
-D PYTHON3_EXECUTABLE=${PYTHON3_EXECUTABLE} \
-D PYTHON3_INCLUDE_DIR=${PYTHON3_INCLUDE_DIR} \
-D PYTHON3_LIBRARY=${PYTHON3_LIBRARY} \
-D PYTHON3_NUMPY_INCLUDE_DIRS=${PYTHON3_NUMPY_INCLUDE_DIRS} \
-D WITH_GPU=ON \
-D WITH_TENSORRT=OFF
- 正式编译(注意,该步骤需要科学上网,因为make的时候需要从github拉取第三方库源码),大概等待个1-2小时左右,差不多就可以了。
make -j10
- 编译到一半报错,
error too many open files
,需要修改最大打开文件限制,默认是1024
# 修改前为1024
$ ulimit -Sn
1024
# 修改为9192
ulimit -n 9192
# 修改后
$ ulimit -Sn
9192
make -j10
- 获取安装包,安装包在build目录下面的python/dist目录下,文件属性如下:
cd python/dist
ls -lh
.rw-r--r-- ubuntu ubuntu 167 MB Wed Jul 27 17:33:44 2022 paddlepaddle_gpu-0.0.0-cp39-cp39-linux_x86_64.whl
- 安装安装包(理论上和我相同cuda/cudnn/nccl版本,且cudnn/nccl都为zip安装,30系列显卡的ubuntu22.04/20.04都能用该包)
pip install paddlepaddle_gpu-2.3.1-cp39-cp39-linux_x86_64.whl
- 测试效果
$ python3
Python 3.9.12 (main, Jun 1 2022, 11:38:51)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
>>> paddle.utils.run_check()
Running verify PaddlePaddle program ...
W0727 17:46:03.775210 12918 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.7, Runtime API Version: 11.6
W0727 17:46:03.796252 12918 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.
PaddlePaddle works well on 1 GPU.
I0727 17:46:06.006351 12918 parallel_executor.cc:486] Cross op memory reuse strategy is enabled, when build_strategy.memory_optimize = True or garbage collection strategy is disabled, which is not recommended
PaddlePaddle works well on 1 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
- 跑一下官方测试代码,貌似也正常,可以正常用GPU进行训练。
$ python3 test_paddle.py
数据集标签共有10种, 分别为:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
W0727 17:54:46.313586 13232 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.7, Runtime API Version: 11.6
W0727 17:54:46.325191 13232 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.
The loss value printed in the log is the current step, and the metric is the average value of previous steps.
Epoch 1/5
step 938/938 [==============================] - loss: 0.1149 - acc: 0.9398 - 14ms/step
Epoch 2/5
step 938/938 [==============================] - loss: 0.0688 - acc: 0.9760 - 13ms/step
Epoch 3/5
step 938/938 [==============================] - loss: 0.0354 - acc: 0.9809 - 11ms/step
Epoch 4/5
step 938/938 [==============================] - loss: 0.0052 - acc: 0.9833 - 13ms/step
Epoch 5/5
step 938/938 [==============================] - loss: 0.0110 - acc: 0.9855 - 12ms/step
import paddle
# 设置使用GPU
paddle.device.set_device("gpu:0")
from paddle.vision.transforms import Normalize
from paddle.vision.datasets import MNIST
from paddle.vision.models import LeNet
import numpy as np
# ### 拉取数据集
transform = Normalize(mean=[127.5], std=[127.5], data_format="CHW")
train_dataset = MNIST(mode="train", transform=transform)
valid_dataset = MNIST(mode="test", transform=transform)
# ### 获取数据集类别
y_list = [da[1][0] for da in train_dataset]
num_list = list(set(y_list))
num_classes = len(num_list)
print(f"数据集标签共有{num_classes}种, 分别为:{num_list}")
# 构建模型
pre_mdoel = LeNet(num_classes=num_classes)
model = paddle.Model(pre_mdoel)
adam = paddle.optimizer.Adam(learning_rate=1e-3, parameters=model.parameters())
model.prepare(adam, loss=paddle.nn.CrossEntropyLoss(), metrics=paddle.metric.Accuracy())
# 训练模型
model.fit(train_data=train_dataset, batch_size=64, verbose=1, epochs=5)
$ nvidia-smi
Wed Jul 27 17:55:58 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.57 Driver Version: 516.59 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 36% 34C P2 120W / 370W | 3228MiB / 24576MiB | 4% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 13353 C /python3.9 N/A |
+-----------------------------------------------------------------------------+