Batch Training

gradient descent

  • Stochastic Gradient Descent, or SGD for short, is an optimization algorithm used to train machine learning algorithms;The job of the algorithm is to find a set of internal model parameters that perform well against some performance measure such as logarithmic loss or mean squared error.
  • The optimization algorithm is called “gradient descent“, where “gradient” refers to the calculation of an error gradient or slope of error and “descent” refers to the moving down along that slope towards some minimum level of error
  • The algorithm is iterative. This means that the search process occurs over multiple discrete steps, each step hopefully slightly improving the model parameters。
  • Each step involves using the model with the current set of internal parameters to make predictions on some samples, comparing the predictions to the real expected outcomes, calculating the error, and using the error to update the internal model parameters.
  • This update procedure is different for different algorithms, but in the case of artificial neural networks, the backpropagation update algorithm is used.
batches and epochs in stochastic gradient descent
  • The batch size is a hyperparameter of gradient descent that controls the number of training samples to work through before the model’s internal parameters are updated.
  • The number of epochs is a hyperparameter of gradient descent that controls the number of complete passes through the training dataset.
  • One epoch means that each sample in the training dataset has had an opportunity to update the internal model parameters. An epoch is comprised of one or more batches. For example, as above, an epoch that has one batch is called the batch gradient descent learning algorithm.
learning curves
  • It is common to create line plots that show epochs along the x-axis as time and the error or skill of the model on the y-axis. These plots are sometimes called learning curves. These plots can help to diagnose whether the model has over learned, under learned, or is suitably fit to the training dataset.
batch gradient descent、stochastic gradient descent、mini-batch gradient
  • When all training samples are used to create one batch, the learning algorithm is called batch gradient descent. When the batch is the size of one sample, the learning algorithm is called stochastic gradient descent. When the batch size is more than one sample and less than the size of the training dataset, the learning algorithm is called mini-batch gradient descent.
  • In the case of mini-batch gradient descent, popular batch sizes include 32, 64, and 128 samples. You may see these values used in models in the literature and in tutorials.
use mini-batch gradient descent in torch
params = {'batch_size': 64, 'shuffle': True, 'num_workers': 6}# 先转换成 torch 能识别的 Dataset train_dataset = Data.TensorDataset(torch.from_numpy(train_triples)) valid_dataset = Data.TensorDataset(torch.from_numpy(val_edges), torch.from_numpy(val_ground_truth)) # 把 dataset 放入 DataLoader train_loader = Data.DataLoader(dataset=train_dataset, **params) valid_loader = Data.DataLoader(dataset=valid_dataset, **params) train_loaderdef batch_train(epoch): Net.train() for batch_train_triples in train_loader: Optimizer.zero_grad() batch_train_triples = batch_train_triples.to(device) loss_train = Net(A, batch_train_triples) loss_train.backward() Optimizer.step()

【Batch Training】参考:
https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/

    推荐阅读