pytorch|pytorch 损失函数

pytorch 权值初始化与损失函数 梯度爆炸和梯度消失 为什么会产生以上问题



若,



每一次传递就会变为原来的N倍,batch_size的大小,梯度爆炸,要避免这个问题


增加激活函数后,权重越来越小,会出现梯度消失的问题
方差一致性:保持数据尺度维持在恰当范围,通常方差为1 饱和函数,如sigmod ,Tanh

通常采用均匀分布,~

得出
对于非饱和函数 ,等变种
变种,在负半轴有斜率的

tanh_gain = nn.init.calculate_gain('tanh')

nn.init.xavier_uniform_(m.weight.data, gain=tanh_gain) # 以上方法只适应于饱和激活函数,并不适合reLU# relu 等变种激活函数 nn.init.kaiming_normal_(m.weight.data)

损失函数 单个样本叫损失,一般用这个
计算整个样本集损失平均值
目标函数,正则项
pytorch中的损失继承nn.Module 主要是redaction none :每个神经元进行操作,sum average
1.nn.CrossEntropyLoss() 交叉熵损失函数
这个是nn.LogSoftmax() nn.NLLLoss()两个函数计算的
衡量两个概率分布的差异,信息熵,相对熵
交叉熵 = 信息熵 + 相对熵 (熵表示一个信息的不确定性)
自信息
【pytorch|pytorch 损失函数】熵:
相对熵:散度
交叉熵:
所以越小越好
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float) target = torch.tensor([0, 1, 1], dtype=torch.long)# 同target = torch.tensor([[1,0],[0,1],[0,1]], dtype=torch.float) # 样本3个 标签0 标签从0开始 其实每一个标签都是和样本维度一样输入[x1,x2,x3,x4] 标签[0,1,0,0] 这就是标签是1 # loss函数作为神经网络的最后一层算是,举得例子就是类似最后一层的输出和真实标签的差距,所以target是long类型的, # 类似与索引weights = torch.tensor([1, 2], dtype=torch.float) # 负样本权重1 正样本权重2label1 的样本权重1 label2 的样本权重2# weights = torch.tensor([0.7, 0.3], dtype=torch.float) loss_f_none_w = nn.CrossEntropyLoss(weight=weights, reduction='none') loss_f_sum = nn.CrossEntropyLoss(weight=weights, reduction='sum') loss_f_mean = nn.CrossEntropyLoss(weight=weights, reduction='mean') #forward loss_none_w = loss_f_none_w(inputs, target) loss_sum = loss_f_sum(inputs, target) loss_mean = loss_f_mean(inputs, target) #'none':就是每个神经元都进行一对一计算 #'sum':计算出来每个神经元进行相加 #'average':求平均,如果设置了weight,相当于把相应的样本进行拷贝,此例子中就是label1是1个,label2相应的样本增多两个,在计算weight时,把weight加起来# 输出 tensor([1.3133, 0.2539, 0.2539]) tensor(1.8210) tensor(0.3642)

2.nn.NLLLoss() 取反损失函数
target位置取反
inputs = torch.tensor([[1, 2], [1, 3], [1, 3]], dtype=torch.float) target = torch.tensor([0, 1, 1], dtype=torch.long)weights = torch.tensor([1, 1], dtype=torch.float)loss_f_none_w = nn.NLLLoss(weight=weights, reduction='none') loss_f_sum = nn.NLLLoss(weight=weights, reduction='sum') loss_f_mean = nn.NLLLoss(weight=weights, reduction='mean') # forward loss_none_w = loss_f_none_w(inputs, target) #输出: tensor([-1., -3., -3.]) tensor(-7.) tensor(-2.3333)

3.nn.BCELoss() 二分类交叉熵函数

输入的target与为型
输入样本的各个属性值必须在[0,1],需要使用 sigmod()进行转换
inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float) target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float) # 每个神经元一一对应计算loss target_bce = target # itarget inputs = torch.sigmoid(inputs) weights = torch.tensor([1, 1], dtype=torch.float)loss_f_none_w = nn.BCELoss(weight=weights, reduction='none') loss_f_sum = nn.BCELoss(weight=weights, reduction='sum') loss_f_mean = nn.BCELoss(weight=weights, reduction='mean') #forward 同上 # 输出: BCE Loss tensor([[0.3133, 2.1269], [0.1269, 2.1269], [3.0486, 0.0181], [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)

4.nn.BCEWithLogitsLoss() 结合Sigmod与二分类交叉熵
网络最后不能加sigmod函数,自己带有sigmod的功能

inputs = torch.tensor([[1, 2], [2, 2], [3, 4], [4, 5]], dtype=torch.float) target = torch.tensor([[1, 0], [1, 0], [0, 1], [0, 1]], dtype=torch.float)target_bce = target# itarget # inputs = torch.sigmoid(inputs)weights = torch.tensor([1], dtype=torch.float) pos_w = torch.tensor([3], dtype=torch.float)# 3 正样本*3 # 参数:pos_weight:正样本[0,1]类权值weight 各类别loss设置权重 loss_f_none_w = nn.BCEWithLogitsLoss(weight=weights, reduction='none', pos_weight=pos_w) loss_f_sum = nn.BCEWithLogitsLoss(weight=weights, reduction='sum', pos_weight=pos_w) loss_f_mean = nn.BCEWithLogitsLoss(weight=weights, reduction='mean', pos_weight=pos_w)# forward loss_none_w = loss_f_none_w(inputs, target_bce) #输出 weights:tensor([1., 1.]) BCE Loss tensor([[0.3133, 2.1269], [0.1269, 2.1269], [3.0486, 0.0181], [4.0181, 0.0067]]) tensor(11.7856) tensor(1.4732)pos_weights:tensor([3.]) tensor([[0.9398, 2.1269], [0.3808, 2.1269], [3.0486, 0.0544], [4.0181, 0.0201]]) tensor(12.7158) tensor(1.5895)

5.nn.L1Loss () 计算inputs和target之差的绝对值

inputs = torch.ones((2, 2)) target = torch.ones((2, 2)) * 3 loss_f = nn.L1Loss(reduction='none') loss = loss_f(inputs, target) #输出: input:tensor([[1., 1.], [1., 1.]]) target:tensor([[3., 3.], [3., 3.]]) L1 loss:tensor([[2., 2.], [2., 2.]])

6.nn.MSELoss() 计算inputs与target之差的平方
reduction:计算模式,可为none/sum/mean
inputs = torch.ones((2, 2)) target = torch.ones((2, 2)) * 3 loss_f_mse = nn.MSELoss(reduction='none') loss_mse = loss_f_mse(inputs, target) MSE loss:tensor([[4., 4.],[4., 4.]])

7.nn.SmoothL1Loss() 平滑的L1Loss,

在底端更加平滑
pytorch|pytorch 损失函数
文章图片

inputs = torch.linspace(-3, 3, steps=500) target = torch.zeros_like(inputs) loss_f = nn.SmoothL1Loss(reduction='none') loss_smooth = loss_f(inputs, target) loss_l1 = np.abs(inputs.numpy())

8.nn.PoissonNLLLoss() 泊松分布(二项分布)的负数对数似然损失函数
log_input:输入是否为对数形式,决定计算公式

full:计算所有loss,默认false
eps:修正项,避免
inputs = torch.randn((2, 2)) target = torch.randn((2, 2)) loss_f = nn.PoissonNLLLoss(log_input=True, full=False, reduction='none') loss = loss_f(inputs, target) print("input:{}\ntarget:{}\nPoisson NLL loss:{}".format(inputs, target, loss)) #输出: input:tensor([[0.6614, 0.2669],[0.0617, 0.6213]]) target:tensor([[-0.4519, -0.1661],[-1.5228,0.3817]]) Poisson NLL loss:tensor([[2.2363, 1.3503],[1.1575, 1.6242]])

idx = 0 loss_1 = torch.exp(inputs[idx, idx]) - target[idx, idx]*inputs[idx, idx]

9.nn.KLDivLoss() KL散度,相对熵,计算两个分布的相似度

此函数中计算

这个时候的已经是概率分布了(目标概率分布),就是是第一类的概率为0.9 第二类0.0. 第三类0.05,[0.9,0.05,0.05],是经过多层神经元输出的一个概率分布[0.8,0.1,0.1]
上公式解释:对一个样本计算去掉,需要先计算一个,或使用nn.logsoftmax(),处理多分类时的概率分布
batchmean:batchsize维度求平均值
inputs = torch.tensor([[0.5, 0.3, 0.2], [0.2, 0.3, 0.5]])##这是个分布 inputs_log = torch.log(inputs) target = torch.tensor([[0.9, 0.05, 0.05], [0.1, 0.7, 0.2]], dtype=torch.float)loss_f_none = nn.KLDivLoss(reduction='none') loss_f_mean = nn.KLDivLoss(reduction='mean') loss_f_bs_mean = nn.KLDivLoss(reduction='batchmean')loss_none = loss_f_none(inputs, target) loss_mean = loss_f_mean(inputs, target) loss_bs_mean = loss_f_bs_mean(inputs, target)print("loss_none:\n{}\nloss_mean:\n{}\nloss_bs_mean:\n{}".format(loss_none, loss_mean, loss_bs_mean))#输出: loss_none:tensor([[-0.5448, -0.1648, -0.1598],[-0.2503, -0.4597, -0.4219]]) warnings.warn("reduction: 'mean' divides the total loss by both the batch size and the support size." loss_mean:-0.3335360586643219 loss_bs_mean:-1.000608205795288

idx = 0 loss_1 = target[idx, idx] * (torch.log(target[idx, idx]) - inputs[idx, idx])

10.nn.MarginRankingLoss() 两个N维向量之间的相似度,用于排序任务,该方法计算两组数据之间的差异,返回一个的Loss矩阵

y=1,希望x1比x2大,当x1>x2时,不产生loss
y=-1,希望x1比x2小,当x2>x1时,不产生loss
margin :边界值
reduction:计算模式
x1 = torch.tensor([[1], [2], [3]], dtype=torch.float) x2 = torch.tensor([[2], [2], [2]], dtype=torch.float)target = torch.tensor([1, 1, -1], dtype=torch.float)#这是y loss_f_none = nn.MarginRankingLoss(margin=0, reduction='none') loss = loss_f_none(x1, x2, target) # y=-1 x1[2]=33-2 3-2 3-2 1 1 -1 --->0 0 1 #输出: loss:tensor([[1., 1., 0.], [0., 0., 0.], [0., 0., 1.]])

11.nn.MultiLabelMarginLoss() 多标签边界损失函数,多标签:一个样本有多个标签,比如一张图片对应多个类别
举例:四分类任务,样本x属于0类和3类,标签,不是

其中,
公式含义是标签所在神经元减去不是标签所在的神经元,
只有标签所在神经元大于不是标签所在的神经元的值小于1时,两类差越大越好,才有意义
x = torch.tensor([[0.1, 0.2, 0.4, 0.8]]) y = torch.tensor([[0, 3, -1, -1]], dtype=torch.long) #样本属于第0类和第3类,数据为long类型 loss_f = nn.MultiLabelMarginLoss(reduction='none') loss = loss_f(x, y) #输出: tensor([0.8500])

# 计算步骤 x = x[0] item_1 = (1-(x[0] - x[1])) + (1 - (x[0] - x[2]))# [0] item_2 = (1-(x[3] - x[1])) + (1 - (x[3] - x[2]))# [3] loss_h = (item_1 + item_2) / x.shape[0]

12.nn.SoftMarginLoss() 二分类logistic损失

是平均值,
inputs = torch.tensor([[0.3, 0.7], [0.5, 0.5]]) target = torch.tensor([[-1, 1], [1, -1]], dtype=torch.float)loss_f = nn.SoftMarginLoss(reduction='none') loss = loss_f(inputs, target) #输出: SoftMargin:tensor([[0.8544, 0.4032],[0.4741, 0.9741]])

idx = 0 inputs_i = inputs[idx, idx] target_i = target[idx, idx]loss_h = np.log(1 + np.exp(-target_i * inputs_i)) #输出:tensor(0.8544)

13.nn.MultiLabelSoftMarginLoss() 的多标签版本

是类别数量?
假设4分类,取值,属于第1类和第4类,这个和多标签边界损失函数标签不一样
inputs = torch.tensor([[0.3, 0.7, 0.8]]) target = torch.tensor([[0, 1, 1]], dtype=torch.float) #标签是float型 loss_f = nn.MultiLabelSoftMarginLoss(reduction='none') loss = loss_f(inputs, target) #输出:MultiLabel SoftMargin:tensor([0.5429])

#手算 i_0 = torch.log(torch.exp(-inputs[0, 0]) / (1 + torch.exp(-inputs[0, 0]))) i_1 = torch.log(1 / (1 + torch.exp(-inputs[0, 1]))) i_2 = torch.log(1 / (1 + torch.exp(-inputs[0, 2]))) loss_h = (i_0 + i_1 + i_2) / -3

14.nn.MultiMarginLoss() 计算多分类的折页损失


y取值为类型,表示第1个样本为第1类,第2个样本为第2类,第3个样本为第1类
标签值减去非标签的值,不能等于标签所在项
主要参数:weight 各类别的loss设置权重,margin边界值,默认1,p:可选1或者2,默认1
x = torch.tensor([[0.1, 0.2, 0.7], [0.2, 0.5, 0.3]]) y = torch.tensor([1, 2], dtype=torch.long) #label类型为longloss_f = nn.MultiMarginLoss(reduction='none') loss = loss_f(x, y) #输出:Multi Margin Loss:tensor([0.8000, 0.7000])

x = x[0] margin = 1 i_0 = margin - (x[1] - x[0]) #i_1 = margin - (x[1] - x[1]) #0 i_2 = margin - (x[1] - x[2]) loss_h = (i_0 + i_2) / x.shape[0] print(loss_h) #tensor(0.8000)

15.nn.TripletMarginLoss() 三元组损失,人脸识别中常用

计算点与点之间的距离,之间的距离要比之间的距离小,anchor是自己的头像,pos是自己的头像,neg是别人的头像
anchor = torch.tensor([[1.]]) pos = torch.tensor([[2.]]) neg = torch.tensor([[0.5]])loss_f = nn.TripletMarginLoss(margin=1.0, p=1) loss = loss_f(anchor, pos, neg) #输出:Triplet Margin Loss tensor(1.5000)

margin = 1 a, p, n = anchor[0], pos[0], neg[0]d_ap = torch.abs(a-p) d_an = torch.abs(a-n)loss = d_ap - d_an + margin

16.nn.HingeEmbeddingLoss() 计算两个输入的相似性,常用于非线性嵌入和半监督学习,输入x应该是两个输入之差的绝对值

inputs = torch.tensor([[1., 0.8, 0.5]]) target = torch.tensor([[1, 1, -1]])#int型loss_f = nn.HingeEmbeddingLoss(margin=1, reduction='none') loss = loss_f(inputs, target) # Hinge Embedding Loss tensor([[1.0000, 0.8000, 0.5000]])

margin = 1. loss = max(0, margin - inputs.numpy()[0, 2]) print(loss)#0.5

17.nn.CosineEmbeddingLoss() 采用余弦相似度计算两个输入的相似性,embading中使用,计算方向上的差异


margin取值,推荐取值
x1 = torch.tensor([[0.3, 0.5, 0.7], [0.3, 0.5, 0.7]]) x2 = torch.tensor([[0.1, 0.3, 0.5], [0.1, 0.3, 0.5]]) target = torch.tensor([[1, -1]], dtype=torch.float)loss_f = nn.CosineEmbeddingLoss(margin=0., reduction='none') loss = loss_f(x1, x2, target) print("Cosine Embedding Loss", loss)#Cosine Embedding Loss tensor([[0.0167, 0.9833]])

margin = 0. def cosine(a, b): numerator = torch.dot(a, b) denominator = torch.norm(a, 2) * torch.norm(b, 2) return float(numerator/denominator) #norm 函数就是求范数,默认是2,就是求向量的模 l_1 = 1 - (cosine(x1[0], x2[0]))l_2 = max(0, cosine(x1[0], x2[0]))print(l_1, l_2)

18.nn.CTCLoss() 时序类数据分类

    推荐阅读