百度Paddle速查_CPU和GPU的mnist预测训练_模型导出_模型导入再预测_导出onnx并预测

需要做点什么 方便广大烟酒生研究生、人工智障炼丹师算法工程师快速使用百度PaddelPaddle,所以特写此文章,默认使用者已有基本的深度学习概念、数据集概念。
系统环境 【百度Paddle速查_CPU和GPU的mnist预测训练_模型导出_模型导入再预测_导出onnx并预测】python 3.7.4
paddlepaddle-gpu 2.2.2
paddle2onnx 0.9.1
onnx 1.9.0
onnxruntime-gpu 1.9.0
数据准备 MNIST数据集csv文件是一个42000x785的矩阵
42000表示有42000张图片
785中第一列是图片的类别(0,1,2,..,9),第二列到最后一列是图片数据向量 (28x28的图片张成784的向量), 数据集长这个样子:
1 0 0 0 0 0 0 0 0 0 ..
0 0 0 0 0 0 0 0 0 0 ..
1 0 0 0 0 0 0 0 0 0 ..
1. 导入需要的包

import os import onnx import paddle import numpy as np import pandas as pd import onnxruntime as ort import paddle.nn.functional as F from paddle.metric import Accuracy from paddle.static import InputSpec from sklearn.metrics import accuracy_score

2. 参数准备
N_EPOCH = 2 N_BATCH = 64 N_BATCH_NUM = 250 S_DATA_PATH = r"mnist_train.csv" S_PADDLE_MODEL_PATH = r"cnn_model" S_ONNX_MODEL_PATH = r"cnn_model_batch%d.onnx" % N_BATCH S_DEVICE, N_DEVICE_ID, S_DEVICE_FULL = "gpu", 0, "gpu:0" # S_DEVICE, N_DEVICE_ID, S_DEVICE_FULL = "cpu", 0, "cpu" paddle.set_device(S_DEVICE_FULL)

运行输出:
CUDAPlace(0)

3. 读取数据
df = pd.read_csv(S_DATA_PATH, header=None) print(df.shape) np_mat = np.array(df) print(np_mat.shape)X = np_mat[:, 1:] Y = np_mat[:, 0] X = X.astype(np.float32) / 255X_train = X[:N_BATCH * N_BATCH_NUM] X_test = X[N_BATCH * N_BATCH_NUM:] Y_train = Y[:N_BATCH * N_BATCH_NUM] Y_test = Y[N_BATCH * N_BATCH_NUM:] X_train = X_train.reshape(X_train.shape[0], 1, 28, 28) X_test = X_test.reshape(X_test.shape[0], 1, 28, 28) print(X_train.shape) print(Y_train.shape) print(X_test.shape) print(Y_test.shape)class MnistDataSet(paddle.io.Dataset): def __init__(self, X, Y): self.l_data, self.l_label = [], [] for i in range(X.shape[0]): self.l_data.append(X[i, :, :, :]) self.l_label.append(Y[i])def __getitem__(self, index): return self.l_data[index], self.l_label[index]def __len__(self): return len(self.l_data)train_loader = paddle.io.DataLoader(MnistDataSet(X_train, Y_train),batch_size=N_BATCH, shuffle=True) test_loader = paddle.io.DataLoader(MnistDataSet(X_test, Y_test), batch_size=N_BATCH, shuffle=False)

运行输出
(42000, 785) (42000, 785) (16000, 1, 28, 28) (16000,) (26000, 1, 28, 28) (26000,)

4. 模型构建
class Net(paddle.nn.Layer): def __init__(self): super(Net, self).__init__() self.encoder = paddle.nn.Sequential(paddle.nn.Conv2D(1, 16, 3, 1), paddle.nn.MaxPool2D(2), paddle.nn.Flatten(1), paddle.nn.Linear(2704, 128), paddle.nn.ReLU(), paddle.nn.Linear(128, 10))def forward(self, x): out = self.encoder(x) return out

5. 模型训练和保存
print("model train") model = paddle.Model(Net(), InputSpec([None, 1, 28, 28], 'float32', 'x'), InputSpec([None, 10], 'float32', 'x')) model.prepare(paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters()), paddle.nn.CrossEntropyLoss(), Accuracy()) model.fit(train_loader, test_loader, epochs=N_EPOCH, batch_size=N_BATCH, save_dir=S_PADDLE_MODEL_PATH + "_iter", verbose=1) model.save(S_PADDLE_MODEL_PATH + "_final_model") print() # model.save(S_PADDLE_MODEL_PATH) # Model save

运行输出
model train The loss value printed in the log is the current step, and the metric is the average value of previous steps. Epoch 1/2 step30/250 [==>...........................] - loss: 0.3036 - acc: 0.7531 - ETA: 1s - 6ms/step step 250/250 [==============================] - loss: 0.3151 - acc: 0.9073 - 4ms/step save checkpoint at D:\Document\_Code_Py\ai_fast_handbook\cnn_model_iter\0 Eval begin... step 407/407 [==============================] - loss: 0.0230 - acc: 0.9330 - 2ms/step- loss: 0.1698 - acc: 0.9315 - ETA: 0s - 2ms/ - loss: 0.3643 - acc: 0.9326 - ETA: 0s - 2m Eval samples: 26000 Epoch 2/2 step 250/250 [==============================] - loss: 0.0744 - acc: 0.9642 - 3ms/step save checkpoint at D:\Document\_Code_Py\ai_fast_handbook\cnn_model_iter\1 Eval begin... step 407/407 [==============================] - loss: 0.0614 - acc: 0.9575 - 2ms/step Eval samples: 26000 save checkpoint at D:\Document\_Code_Py\ai_fast_handbook\cnn_model_iter\final

6.模型预测
print("model pred") model.evaluate(test_loader, batch_size=N_BATCH, verbose=1) print()

运行输出
model pred Eval begin... step 407/407 [==============================] - loss: 0.0614 - acc: 0.9575 - 2ms/step- loss: 0.2162 - acc: 0.9559 - ETA Eval samples: 26000

7.模型加载和加载模型使用
print("load model and pred test data") model_load = paddle.Model(Net(), InputSpec([None, 1, 28, 28], 'float32', 'x'), InputSpec([None, 10], 'float32', 'x')) # model_load.load(S_PADDLE_MODEL_PATH + "_iter/final") model_load.load(S_PADDLE_MODEL_PATH + "_final_model") model_load.prepare(paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters()), paddle.nn.loss.CrossEntropyLoss(), Accuracy()) model_load.evaluate(test_loader, batch_size=N_BATCH, verbose=1) print()

运行输出
load model and pred test data Eval begin... step 407/407 [==============================] - loss: 0.0614 - acc: 0.9575 - 2ms/step- loss: 0.1656 - acc: 0.9561 - ETA: Eval samples: 26000

8.导出ONNX
x_spec = InputSpec([None, 1, 28, 28], 'float32', 'x') paddle.onnx.export(Net(), S_ONNX_MODEL_PATH, input_spec=[x_spec])

运行输出
2022-03-24 08:08:21 [INFO] ONNX model saved in cnn_model_batch64.onnx.onnx

8. 加载ONNX并运行
S_DEVICE = "cuda" if S_DEVICE == "gpu" else S_DEVICE model = onnx.load(S_ONNX_MODEL_PATH + ".onnx") print(onnx.checker.check_model(model))# Check that the model is well formed print(onnx.helper.printable_graph(model.graph))# Print a human readable representation of the graph ls_input_name, ls_output_name = [input.name for input in model.graph.input], [output.name for output in model.graph.output] print("input name ", ls_input_name) print("output name ", ls_output_name) s_input_name = ls_input_name[0]x_input = X_train[:N_BATCH * 2, :, :, :].astype(np.float32) ort_val = ort.OrtValue.ortvalue_from_numpy(x_input, S_DEVICE, N_DEVICE_ID) print("val device ", ort_val.device_name()) print("val shape ", ort_val.shape()) print("val data type ", ort_val.data_type()) print("is_tensor ", ort_val.is_tensor()) print("array_equal ", np.array_equal(ort_val.numpy(), x_input)) providers = 'CUDAExecutionProvider' if S_DEVICE == "cuda" else 'CPUExecutionProvider' print("providers ", providers) ort_session = ort.InferenceSession(S_ONNX_MODEL_PATH + ".onnx", providers=[providers])# gpu运行 ort_session.set_providers([providers]) outputs = ort_session.run(None, {s_input_name: ort_val}) print("sess env ", ort_session.get_providers()) print(type(outputs)) print(outputs[0]) ''' For example ['CUDAExecutionProvider', 'CPUExecutionProvider'] means execute a node using CUDAExecutionProvider if capable, otherwise execute using CPUExecutionProvider. '''

运行输出
None graph paddle-onnx ( %x[FLOAT, -1x1x28x28] ) { %conv2d_2.w_0 = Constant[value = https://www.it610.com/article/]() %conv2d_2.b_0 = Constant[value = https://www.it610.com/article/]() %linear_4.w_0 = Constant[value = https://www.it610.com/article/]() %linear_4.b_0 = Constant[value = https://www.it610.com/article/]() %linear_5.w_0 = Constant[value = https://www.it610.com/article/]() %linear_5.b_0 = Constant[value = https://www.it610.com/article/]() %conv2d_3.tmp_0 = Conv[dilations = [1, 1], group = 1, kernel_shape = [3, 3], pads = [0, 0, 0, 0], strides = [1, 1]](%x, %conv2d_2.w_0) %Constant_0 = Constant[value = https://www.it610.com/article/]() %Reshape_0 = Reshape(%conv2d_2.b_0, %Constant_0) %conv2d_3.tmp_1 = Add(%conv2d_3.tmp_0, %Reshape_0) %pool2d_0.tmp_0 = MaxPool[kernel_shape = [2, 2], pads = [0, 0, 0, 0], strides = [2, 2]](%conv2d_3.tmp_1) %Shape_0 = Shape(%pool2d_0.tmp_0) %Slice_0 = Slice[axes = [0], ends = [1], starts = [0]](%Shape_0) %Constant_1 = Constant[value = https://www.it610.com/article/]() %Concat_0 = Concat[axis = 0](%Slice_0, %Constant_1) %flatten_3.tmp_0 = Reshape(%pool2d_0.tmp_0, %Concat_0) %linear_6.tmp_0 = MatMul(%flatten_3.tmp_0, %linear_4.w_0) %linear_6.tmp_1 = Add(%linear_6.tmp_0, %linear_4.b_0) %relu_0.tmp_0 = Relu(%linear_6.tmp_1) %linear_7.tmp_0 = MatMul(%relu_0.tmp_0, %linear_5.w_0) %linear_7.tmp_1 = Add(%linear_7.tmp_0, %linear_5.b_0) return %linear_7.tmp_1 } input name['x'] output name['linear_7.tmp_1'] val devicecuda val shape[128, 1, 28, 28] val data typetensor(float) is_tensorTrue array_equalTrue providersCUDAExecutionProvider sess env['CUDAExecutionProvider', 'CPUExecutionProvider'] [[ 0.763783-0.16668957 -0.16518936 ...0.07235195 -0.01643395 0.06049304] [ 1.8068395-0.745522140.3836273...0.75880224 -0.88902843 0.32921085] [ 0.2381373-0.14879732 -0.21634206 ... -0.06579521 -0.461351 0.15305203] ... [ 0.970046160.076938410.05774391 ...0.219912950.07179791 -0.22383693] [ 0.5787286-0.34370935 -0.12914304 ... -0.03083546 -0.01817408 -0.5147962 ] [ 0.60808766 -0.12549599 -0.32095248 ... -0.32175955 -0.03176413 -0.06790417]]

你甚至不愿意Start的Github ai_fast_handbook

    推荐阅读