这个错误很早就遇到过但是没看到网上叙述清楚的,这里顺便写一下。
这里贴一下autograd.grad()的注释
grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False)
Computes and returns the sum of gradients of outputs w.r.t. the inputs.
``grad_outputs`` should be a sequence of length matching ``output``
containing the pre-computed gradients w.r.t. each of the outputs. If an
output doesn't require_grad, then the gradient can be ``None``).
If ``only_inputs`` is ``True``, the function will only return a list of gradients
w.r.t the specified inputs. If it's ``False``, then gradient w.r.t. all remaining
leaves will still be computed, and will be accumulated into their ``.grad``
attribute.Arguments:
outputs (sequence of Tensor): outputs of the differentiated function.
inputs (sequence of Tensor): Inputs w.r.t. which the gradient will be
returned (and not accumulated into ``.grad``).
grad_outputs (sequence of Tensor): Gradients w.r.t. each output.
None values can be specified for scalar Tensors or ones that don't require
grad. If a None value would be acceptable for all grad_tensors, then this
argument is optional. Default: None.
retain_graph (bool, optional): If ``False``, the graph used to compute the grad
will be freed. Note that in nearly all cases setting this option to ``True``
is not needed and often can be worked around in a much more efficient
way. Defaults to the value of ``create_graph``.
create_graph (bool, optional): If ``True``, graph of the derivative will
be constructed, allowing to compute higher order derivative products.
Default: ``False``.
allow_unused (bool, optional): If ``False``, specifying inputs that were not
used when computing outputs (and therefore their grad is always zero)
is an error. Defaults to ``False``.
如下代码
>>> a=Variable(torch.FloatTensor([1,2,3]),requires_grad=True)
>>> b=3*a
>>> autograd.grad(outputs=b,inputs=a)# 这里b为向量
RuntimeError: grad can be implicitly created only for scalar outputs
因为计算梯度时outputs需为标量(未指明grad_outputs或grad_outputs为None时),所以上面的代码会报错,而如下代码可以正常运行:
>>> a=Variable(torch.FloatTensor([1,2,3]),requires_grad=True)
>>> b=3*a
>>> z=b.sum()
>>> autograd.grad(outputs=z,inputs=a) # 这里z为标量
(tensor([ 3.,3.,3.]),)
【pytorch: grad can be implicitly created only for scalar outputs】也可以通过指定grad_outputs,这时计算梯度就不再需要outputs为标量了,如下
>>> a=Variable(torch.FloatTensor([1,2,3]),requires_grad=True)
>>> b=3*a
>>> autograd.grad(outputs=b,inputs=a,grad_outputs=torch.ones_like(a))
(tensor([ 3.,3.,3.]),)
grad_outputs在GPU下时可写作以下形式
grad_outputs = Variable(torch.Tensor(torch.ones_like(a)),requires_grad=False)
推荐阅读
- Python - Search Insert Position
- Leetcode35 搜索插入位置
- Data|单链表的增删查改
- 软件编程|STL使用总结
- Probabilistic|一次遍历等概率选取字符串中的某个字符
- 欧几里得算法(即辗转相除法)的时间复杂度log(N)的简洁证明
- 八皇后问题 回溯递归 C语言版
- memcopy
- HMM与序列标注
- 计算复杂性理论