【NLP|hugging face transformer文本分类运行】hugging face 团队的transformer又更新了,现在里面有distilroberta和distilbert和albert模型,这三个模型值得我们对比其他模型的差异。那么如何运行呢?
首先进入GitHub,搜索transformer
https://github.com/huggingface/transformers
进入这个repo
git clone 或者下载下来
接着用pycharm或其他编辑器打开这个repo
https://github.com/huggingface/transformers/tree/master/examples
选择examples里的run_gule.py
找到最下面的__main__,把所有代码剪切出来单独封装一个函数为main(),参数有两个model和dataset。
dataset是数据集的名字也是数据所在文件夹名称,model是model type。在这里,最重要的是命令行的argument,由于我们不想用命令行输入参数,这里可以在parser.add_argument中加入参数default,并设置required为False,这样就有了一个默认值。
接着我们设置data dir和训练batch大小和epoch次数。
def main(model,task):parser = argparse.ArgumentParser()
model_dir = model_to_dir[model]
## Required parameters
data_dir = '/home/socialbird/Downloads/transformers-master/examples/glue_data/{}'.format(task)
#task = 'RTE'
train_bs = 8
eps = 3.0
parser.add_argument("--data_dir", default=data_dir, type=str, required=False,
help="The input data dir. Should contain the .tsv files (or other data files) for the task.")
parser.add_argument("--model_type", default=model, type=str, required=False,
help="Model type selected in the list: " + ", ".join(MODEL_CLASSES.keys()))
parser.add_argument("--model_name_or_path", default=model_dir, type=str, required=False,
help="Path to pre-trained model or shortcut name selected in the list: " + ", ".join(ALL_MODELS))
parser.add_argument("--task_name", default=task, type=str, required=False,
help="The name of the task to train selected in the list: " + ", ".join(processors.keys()))
parser.add_argument("--output_dir", default='output', type=str, required=False,
help="The output directory where the model predictions and checkpoints will be written.")## Other parameters
parser.add_argument("--config_name", default="", type=str,
help="Pretrained config name or path if not the same as model_name")
parser.add_argument("--tokenizer_name", default="", type=str,
help="Pretrained tokenizer name or path if not the same as model_name")
parser.add_argument("--cache_dir", default="", type=str,
help="Where do you want to store the pre-trained models downloaded from s3")
parser.add_argument("--max_seq_length", default=128, type=int,
help="The maximum total input sequence length after tokenization. Sequences longer "
"than this will be truncated, sequences shorter will be padded.")
parser.add_argument("--do_train", action='store_true', default=True,
help="Whether to run training.")
parser.add_argument("--do_eval", action='store_true',default=True,
help="Whether to run eval on the dev set.")
parser.add_argument("--evaluate_during_training", action='store_true',default=True,
help="Rul evaluation during training at each logging step.")
parser.add_argument("--do_lower_case", action='store_true',
help="Set this flag if you are using an uncased model.")parser.add_argument("--per_gpu_train_batch_size", default=train_bs, type=int,
help="Batch size per GPU/CPU for training.")
parser.add_argument("--per_gpu_eval_batch_size", default=8, type=int,
help="Batch size per GPU/CPU for evaluation.")
parser.add_argument('--gradient_accumulation_steps', type=int, default=1,
help="Number of updates steps to accumulate before performing a backward/update pass.")
parser.add_argument("--learning_rate", default=5e-5, type=float,
help="The initial learning rate for Adam.")
parser.add_argument("--weight_decay", default=0.0, type=float,
help="Weight deay if we apply some.")
parser.add_argument("--adam_epsilon", default=1e-8, type=float,
help="Epsilon for Adam optimizer.")
parser.add_argument("--max_grad_norm", default=1.0, type=float,
help="Max gradient norm.")
parser.add_argument("--num_train_epochs", default=eps, type=float,
help="Total number of training epochs to perform.")
parser.add_argument("--max_steps", default=-1, type=int,
help="If > 0: set total number of training steps to perform. Override num_train_epochs.")
parser.add_argument("--warmup_steps", default=0, type=int,
help="Linear warmup over warmup_steps.")parser.add_argument('--logging_steps', type=int, default=200,
help="Log every X updates steps.")
parser.add_argument('--save_steps', type=int, default=500,
help="Save checkpoint every X updates steps.")
parser.add_argument("--eval_all_checkpoints", action='store_true',
help="Evaluate all checkpoints starting with the same prefix as model_name ending and ending with step number")
parser.add_argument("--no_cuda", default=False,required=False,
help="Avoid using CUDA when available")
parser.add_argument('--overwrite_output_dir', action='store_true',default=True,
help="Overwrite the content of the output directory")
parser.add_argument('--overwrite_cache', action='store_true',
help="Overwrite the cached training and evaluation sets")
parser.add_argument('--seed', type=int, default=42,
help="random seed for initialization")parser.add_argument('--fp16', action='store_true',
help="Whether to use 16-bit (mixed) precision (through NVIDIA apex) instead of 32-bit")
parser.add_argument('--fp16_opt_level', type=str, default='O1',
help="For fp16: Apex AMP optimization level selected in ['O0', 'O1', 'O2', and 'O3']."
"See details at https://nvidia.github.io/apex/amp.html")
parser.add_argument("--local_rank", type=int, default=-1,
help="For distributed training: local_rank")
parser.add_argument('--server_ip', type=str, default='', help="For distant debugging.")
parser.add_argument('--server_port', type=str, default='', help="For distant debugging.")
args = parser.parse_args()
然后定义model to dirs,这个是我用来设置model path的字典,你完全可以不添加这个。运行这个脚本会载入一个bert或其他模型,所以要指定model的名字类型,和model的路径位置(model_name_or_path),如果没有预先下载模型可以填模型的具体名字如roberta-base。
model_to_dir = {
'distilbert':'distilbert-base-uncased',
'distilroberta': MODEL_DIRS['distilroberta'],
'albert': 'albert-base-v2',
'bert': MODEL_DIRS['bert-base'],
'roberta': 'roberta-base',
'camembert': 'camembert-base',
'xlm':'xlm-mlm-ende-1024',
'xlnet':'xlnet-base-cased'
}
最后我们还需要设置processor。在data/processors/glue.py脚本中已经有了一些processor,我们可以直接使用。如RTE processors。
使用rte需要知道rte的标注task name是什么,也就是dataset的标准名字是什么。在脚本的最下面能看到。
glue_processors = {
"cola": ColaProcessor,
"mnli": MnliProcessor,
"mnli-mm": MnliMismatchedProcessor,
"mrpc": MrpcProcessor,
"sst-2": Sst2Processor,
"sts-b": StsbProcessor,
"qqp": QqpProcessor,
"qnli": QnliProcessor,
"rte": RteProcessor,
"wnli": WnliProcessor,}
最后我们还需要这些数据,需要一个download glue data这个脚本,在github 的W4ngatang这个repo可以找到。
最后我们运行run glue.py就可以了
推荐阅读
- 人工智能|hugginface-introduction 案例介绍
- 深度学习|论文阅读(《Deep Interest Evolution Network for Click-Through Rate Prediction》)
- nlp|Keras(十一)梯度带(GradientTape)的基本使用方法,与tf.keras结合使用
- transformer|Swin-Transformer代码讲解-Video Swin-Transformer
- NER|[论文阅读笔记01]Neural Architectures for Nested NER through Linearization
- 深度学习|2019年CS224N课程笔记-Lecture 17:Multitask Learning
- AI|bert实现端到端继续预训练
- 深度学习|[深度学习] 一篇文章理解 word2vec
- 论文|预训练模型综述2020年三月《Pre-trained Models for Natural Language Processing: A Survey》
- BERT模型的输入