以调用的BERT预训练模型为例:
outputs = self.bert(input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids)
outputs 包含4个:sequence_output, pooled_output, (hidden_states), (attentions)
【自然语言处理|【BERT】模型返回值解析】BERT返回值官方解释:
Return:
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.BertConfig`) and inputs:
last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
Sequence of hidden-states at the output of the last layer of the model.
pooler_output (:obj:`torch.FloatTensor`: of shape :obj:`(batch_size, hidden_size)`):
Last layer hidden-state of the first token of the sequence (classification token)
further processed by a Linear layer and a Tanh activation function. The Linear
layer weights are trained from the next sentence prediction (classification)
objective during pre-training.
This output is usually *not* a good summary
of the semantic content of the input, you're often better with averaging or pooling
the sequence of hidden-states for the whole input sequence.
hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):
Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
of shape :obj:`(batch_size, sequence_length, hidden_size)`.
Hidden-states of the model at the output of each layer plus the initial embedding outputs.
attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_attentions=True``):
Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
:obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
heads.
- last_hidden_state:shape是(batch_size, sequence_length, hidden_size),hidden_size=768,类型为tensor,它是模型最后一层输出的隐藏状态。
- pooler_output:shape是(batch_size, hidden_size),类型为tensor,这是序列的第一个token:[CLS]的最后一层的隐藏状态,它是由线性层和Tanh激活函数进一步处理的。这个输出不是对输入的语义内容的一个很好的总结,对于整个输入序列的隐藏状态序列的平均化或池化通常更好。
- hidden_states:这是输出的一个可选项,如果输出,需要指定config.output_hidden_states=True。它是一个元组,第一个元素是embedding,其余元素是各层的输出,每个元素的形状是(batch_size, sequence_length, hidden_size)。
- attentions:这也是输出的一个可选项,如果输出,需要指定config.output_attentions=True。它也是一个元组,它的元素是每一层的注意力权重,用于计算self-attention heads的加权平均值。
推荐阅读
- 论文阅读|bert 源码解读(基于gluonnlp finetune-classifier)
- bert|BERT模型的深度解读
- NLP|【Bert】(十一)简易问答系统--源码解析(bert基础模型)
- dl|bert源码解析-modeling.py
- 人工智能|【自然语言处理(NLP)】基于ERNIE语言模型的文本语义匹配
- 深度学习|90+深度学习开源数据集整理|包括目标检测、工业缺陷、图像分割等多个方向
- 日常|你的外卖为什么他来送——聊一聊外卖订单的生命周期
- Transformer|Swin Transformer原理(新手入门级理解)
- 自然语言处理|机器学习_TF-IDF逆文本频率指数