pytorch: grad can be implicitly created only for scalar outputs

这个错误很早就遇到过但是没看到网上叙述清楚的,这里顺便写一下。
这里贴一下autograd.grad()的注释

grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False) Computes and returns the sum of gradients of outputs w.r.t. the inputs. ``grad_outputs`` should be a sequence of length matching ``output`` containing the pre-computed gradients w.r.t. each of the outputs. If an output doesn't require_grad, then the gradient can be ``None``). If ``only_inputs`` is ``True``, the function will only return a list of gradients w.r.t the specified inputs. If it's ``False``, then gradient w.r.t. all remaining leaves will still be computed, and will be accumulated into their ``.grad`` attribute.Arguments: outputs (sequence of Tensor): outputs of the differentiated function. inputs (sequence of Tensor): Inputs w.r.t. which the gradient will be returned (and not accumulated into ``.grad``). grad_outputs (sequence of Tensor): Gradients w.r.t. each output. None values can be specified for scalar Tensors or ones that don't require grad. If a None value would be acceptable for all grad_tensors, then this argument is optional. Default: None. retain_graph (bool, optional): If ``False``, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to ``True`` is not needed and often can be worked around in a much more efficient way. Defaults to the value of ``create_graph``. create_graph (bool, optional): If ``True``, graph of the derivative will be constructed, allowing to compute higher order derivative products. Default: ``False``. allow_unused (bool, optional): If ``False``, specifying inputs that were not used when computing outputs (and therefore their grad is always zero) is an error. Defaults to ``False``.

如下代码
>>> a=Variable(torch.FloatTensor([1,2,3]),requires_grad=True) >>> b=3*a >>> autograd.grad(outputs=b,inputs=a)# 这里b为向量 RuntimeError: grad can be implicitly created only for scalar outputs

因为计算梯度时outputs需为标量(未指明grad_outputs或grad_outputs为None时),所以上面的代码会报错,而如下代码可以正常运行:
>>> a=Variable(torch.FloatTensor([1,2,3]),requires_grad=True) >>> b=3*a >>> z=b.sum() >>> autograd.grad(outputs=z,inputs=a) # 这里z为标量 (tensor([ 3.,3.,3.]),)

【pytorch: grad can be implicitly created only for scalar outputs】也可以通过指定grad_outputs,这时计算梯度就不再需要outputs为标量了,如下
>>> a=Variable(torch.FloatTensor([1,2,3]),requires_grad=True) >>> b=3*a >>> autograd.grad(outputs=b,inputs=a,grad_outputs=torch.ones_like(a)) (tensor([ 3.,3.,3.]),)

grad_outputs在GPU下时可写作以下形式
grad_outputs = Variable(torch.Tensor(torch.ones_like(a)),requires_grad=False)

    推荐阅读