扩展 torch.autograd
如果你想要添加一个新的 Operation 到autograd的话,你的Operation需要继承 class Function。autograd使用Function计算结果和梯度,同时编码 operation的历史。每个新的 operation(function) 都需要实现三个方法:
- init (optional) - 如果你的operation包含非Variable参数,那么就将其作为init的参数传入到operation中。例如:AddConstant Function加一个常数,Transpose Function需要指定哪两个维度需要交换。如果你的operation不需要额外的参数,你可以忽略init。
- forward() - 在里面写执行此operation的代码。可以有任意数量的参数。如果你对某些参数指定了默认值,则这些参数是可传可不传的。记住:forward()的参数只能是Variable。函数的返回值既可以是 Variable也可以是Variables的tuple。同时,请参考 Function[function]的 doc,查阅有哪些 方法是只能在forward中调用的。
- backward() - 梯度计算公式。 参数的个数和forward返回值的个数一样,每个参数代表传回到此operation的梯度. backward()的返回值的个数应该和此operation输入的个数一样,每个返回值对应了输入值的梯度。如果operation的输入不需要梯度,或者不可导,你可以返回None。 如果forward()存在可选参数,你可以返回比输入更多的梯度,只是返回的是None。
下面是 Linear 的实现代码:
Inherit from Functionclass Linear(Function): # bias is an optional argument def forward(self, input, weight, bias=None): self.save_for_backward(input, weight, bias) output = input.mm(weight.t()) if bias is not None: output += bias.unsqueeze(0).expand_as(output) return output # This function has only a single output, so it gets only one gradient def backward(self, grad_output): # This is a pattern that is very convenient - at the top of backward # unpack saved_tensors and initialize all gradients w.r.t. inputs to # None. Thanks to the fact that additional trailing Nones are # ignored, the return statement is simple even when the function has # optional inputs. input, weight, bias = self.saved_tensors grad_input = grad_weight = grad_bias = None # These needs_input_grad checks are optional and there only to # improve efficiency. If you want to make your code simpler, you can # skip them. Returning gradients for inputs that don't require it is # not an error. if self.needs_input_grad[0]: grad_input = grad_output.mm(weight) if self.needs_input_grad[1]: grad_weight = grad_output.t().mm(input) if bias is not None and self.needs_input_grad[2]: grad_bias = grad_output.sum(0).squeeze(0) return grad_input, grad_weight, grad_bias
现在,为了可以更简单的使用自定义的operation,我们建议将其用一个简单的 helper function 包装起来。 functions:
def linear(input, weight, bias=None): # First braces create a Function object. Any arguments given here # will be passed to init. Second braces will invoke the call # operator, that will then use forward() to compute the result and # return it. return Linear()(input, weight, bias)
你可能想知道你刚刚实现的 backward方法是否正确的计算了梯度。你可以使用 小的有限的差分进行数值估计。
from torch.autograd import gradcheck# gradchek takes a tuple of tensor as input, check if your gradient# evaluated with these tensors are close enough to numerical# approximations and returns True if they all verify this condition.input = (Variable(torch.randn(20,20).double(), requires_grad=True),)test = gradcheck.gradcheck(Linear(), input, eps=1e-6, atol=1e-4)print(test)