gradient descent
- Stochastic Gradient Descent, or SGD for short, is an optimization algorithm used to train machine learning algorithms;The job of the algorithm is to find a set of internal model parameters that perform well against some performance measure such as logarithmic loss or mean squared error.
- The optimization algorithm is called “gradient descent“, where “gradient” refers to the calculation of an error gradient or slope of error and “descent” refers to the moving down along that slope towards some minimum level of error
- The algorithm is iterative. This means that the search process occurs over multiple discrete steps, each step hopefully slightly improving the model parameters。
- Each step involves using the model with the current set of internal parameters to make predictions on some samples, comparing the predictions to the real expected outcomes, calculating the error, and using the error to update the internal model parameters.
- This update procedure is different for different algorithms, but in the case of artificial neural networks, the backpropagation update algorithm is used.
- The batch size is a hyperparameter of gradient descent that controls the number of training samples to work through before the model’s internal parameters are updated.
- The number of epochs is a hyperparameter of gradient descent that controls the number of complete passes through the training dataset.
- One epoch means that each sample in the training dataset has had an opportunity to update the internal model parameters. An epoch is comprised of one or more batches. For example, as above, an epoch that has one batch is called the batch gradient descent learning algorithm.
- It is common to create line plots that show epochs along the x-axis as time and the error or skill of the model on the y-axis. These plots are sometimes called learning curves. These plots can help to diagnose whether the model has over learned, under learned, or is suitably fit to the training dataset.
- When all training samples are used to create one batch, the learning algorithm is called batch gradient descent. When the batch is the size of one sample, the learning algorithm is called stochastic gradient descent. When the batch size is more than one sample and less than the size of the training dataset, the learning algorithm is called mini-batch gradient descent.
- In the case of mini-batch gradient descent, popular batch sizes include 32, 64, and 128 samples. You may see these values used in models in the literature and in tutorials.
params = {'batch_size': 64,
'shuffle': True,
'num_workers': 6}# 先转换成 torch 能识别的 Dataset
train_dataset = Data.TensorDataset(torch.from_numpy(train_triples))
valid_dataset = Data.TensorDataset(torch.from_numpy(val_edges), torch.from_numpy(val_ground_truth))
# 把 dataset 放入 DataLoader
train_loader = Data.DataLoader(dataset=train_dataset, **params)
valid_loader = Data.DataLoader(dataset=valid_dataset, **params)
train_loaderdef batch_train(epoch):
Net.train()
for batch_train_triples in train_loader:
Optimizer.zero_grad()
batch_train_triples = batch_train_triples.to(device)
loss_train = Net(A, batch_train_triples)
loss_train.backward()
Optimizer.step()
【Batch Training】参考:
https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/
推荐阅读
- pytorch|YOLOX 阅读笔记
- Pytorch学习|sklearn-SVM 模型保存、交叉验证与网格搜索
- 前沿论文|论文精读(Neural Architecture Search without Training)
- 深度学习|深度学习笔记总结
- Pytorch图像分割实践|Pytorch自定义层或者模型类
- 算法|使用OpenCV对运动员的姿势进行检测
- 卷积|12篇论文看尽深度学习目标检测史
- 神经网络|2012年至今,细数深度学习领域这些年取得的经典成果
- 神经网络|使用Keras对多个模型进行拼接
- 神经网络|基于自动驾驶车辆的NVIDIA-TensorRT推理实时优化