文本摘要相关论文汇总

awesome-text-summarization github地址:https://github.com/luopeixiang/awesome-text-summarization
这个仓库有对文本摘要的简单介绍,包括任务定义,类型,摘要评估方法等等,
同时整理收集了当前文本摘要方面常用的数据集以及相关最新论文,
适合想要快速上手了解文本摘要领域的人。
Table of Contents

  • awesome-text-summarization
    • Basic Concept
      • Definition
      • Types of summarization
      • Summary Informativeness evaluation
    • DataSet
    • Papers
      • Survey
      • Abstractive Document summarization
        • Based Reinforcement Learning
      • Extractive Document summarization
        • Based Reinforcement Learning
    • Sentence Summarization
    • Unsupervised Abstractive Summarization
    • Multi Document Summarization
    • Evaluation Metrics
    • Other Resources
Basic Concept Definition
Summarization is the task of producing a shorter version of one or several documents that preserves most of the input’s meaning.
Types of summarization
Extractive summaries (extracts) are produced by concatenating
several sentences taken exactly as they appear in the materials being
summarized.
Abstractive summaries (abstracts), are written to convey
the main information in the input and may reuse phrases or clauses
from it, but the summaries are overall expressed in the words of the
summary author.
Summary Informativeness evaluation
  • ROUGE-N: measures the N-gram units common between a particular summary and a col-
    lection of reference summaries where N determines the N-gram’s length. E.g., ROUGE-1
    for unigrams and ROUGE-2 for bi-grams.
  • ROUGE-L: computes Longest Common Subsequence (LCS) metric.
  • BLUE : BLEU is basically calculated on the n-gram co-occerance between the generated summary and the gold (You don’t need to specify the “n” unlike ROUGE).
  • METEOR : based on the harmonic mean of unigram precision and recall, with recall weighted higher than precision.
DataSet
  • Annotated English Gigaword
    • for sentence summarization
  • CNN/Daily Mail dataset
    • for document summatization
  • DUC 2004
  • CORNELL NEWSROOM
    • is a large dataset for training and evaluating summarization systems. It contains 1.3 million articles and summaries written by authors and editors in the newsrooms of 38 major publications. The summaries are obtained from search and social metadata between 1998 and 2017 and use a variety of summarization strategies combining extraction and abstraction.
  • Google Dataset
    • 【文本摘要相关论文汇总】Large corpus of uncompressed and compressed sentences from news articles.
      ?
Papers Survey
Recent automatic text summarization techniques:a survey
Automatic summarization
Abstractive Document summarization
1.words-lvt2k-temp-att (Nallapti et al., 2016) : Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
2.Graph-Based Attn : Abstractive Document Summarization with a Graph-Based Attentional Neural Model
3.Pointer-generator + coverage (See et al., 2017) : Get To The Point: Summarization with Pointer-Generator Networks
4.KIGN+Prediction-guide : Guiding Generation for Abstractive Text Summarization based on Key Information Guide Network
5.Explicit Info Selection Modeling(Li et al., 2018a) : Improving Neural Abstractive Document Summarization with Explicit Information Selection Modeling
6.Structural Regularization(Li et al., 2018b) : Improving Neural Abstractive Document Summarization with Structural Regularization
7.end2end w/ inconsistency loss (Hsu et al., 2018): A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss
8.Pointer + Coverage + EntailmentGen + QuestionGen (Guo et al., 2018) : Soft Layer-Specific Multi-Task Summarization with Entailment and Question Generation
Based Reinforcement Learning: 1.ML+RL ROUGE+Novel, with LM (Kryscinski et al., 2018) : Improving Abstraction in Text Summarization
2.RL + pg + cbdec (Jiang and Bansal, 2018): Closed-Book Training to Improve Summarization Encoder Memory
3.rnn-ext + abs + RL + rerank (Chen and Bansal, 2018): Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting
4.ML+RL, with intra-attention : A Deep Reinforced Model for Abstractive Summarization
5.ML+RL ROUGE+Novel, with LM : Improving Abstraction in Text Summarization
6.GAN : Generative Adversarial Network for Abstractive Text Summarization
7.DCA (Celikyilmaz et al., 2018) : Summarization
8.ROUGESal+Ent RL (Pasunuru and Bansal, 2018): Multi-Reward Reinforced Summarization with Saliency and Entailment
Extractive Document summarization
1.TEXTRANK(graph based): TextRank: Bringing Order intoTexts
2.SWAP-NET : Extractive Summarization with SWAP-NET: Sentences and Words from Alternating Pointer Networks
3.NN-SE : [Neural summarization by extracting sentences and words
4.HSASS : A Hierarchical Structured Self-Attentive Model for Extractive Document Summarization (HSSAS)
5.NeuSUM (Zhou et al., 2018) : Neural Document Summarization by Jointly Learning to Score and Select Sentences
6.Latent (Zhang et al., 2018) : Neural Latent Extractive Document Summarization
Based Reinforcement Learning 1.rnn-ext + RL (Chen and Bansal, 2018): Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting
2.Bottom-Up Summarization (Gehrmann et al., 2018): Bottom-Up Abstractive Summarization
3.BANDITSUM :BANDITSUM: Extractive Summarization as a Contextual Bandit
4.SummaRuNNer: A recurrent neural network based sequence model for extractive summarization of documents
5.Refrech: Ranking sentences for extractive summarization with reinforcement learning
6.DQN: Deep reinforcement learning for extractive document summarization:
7.RNES w/o coherence :Learning to Extract Coherent Summary via Deep Reinforcement Learning
Sentence Summarization 1.Re^3 Sum (Cao et al., 2018) : Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization
2.FTSum_g (Cao et al., 2018) : Faithful to the Original: Fact Aware Neural Abstractive Summarization
3.Seq2seq + E2T_cnn (Amplayo et al., 2018) : Abstractive Sentence Summarization with Attentive Recurrent Neural Networks
4.EndDec+WFE (Suzuki and Nagata, 2017) : Cutting-off Redundant Repeating Generations for Neural Abstractive Summarization
5.DRGD (Li et al., 2017) : Deep Recurrent Generative Decoder for Abstractive Text Summarization
6.BiRNN + LM Evaluator (Zhao et al. 2018) : A Language Model based Evaluator for Sentence Compression
Unsupervised Abstractive Summarization 1.MeanSum : MeanSum: A Neural Model for Unsupervised Multi-document Abstractive Summarization
2.Semantic Abstractive Sum based AMR(2018 Dohare): Unsupervised Semantic Abstractive Summarization
3.Paraphrastic Sentence Fusion Model(2018 Nayeem): Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion
Multi Document Summarization 1.(Z Cao 2017) : Improving Multi-Document Summarization via Text Classification
2.Based AMR : Abstract Meaning Representation for Multi-Document Summarization.
3 Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion.
4 Adapting the Neural Encoder-Decoder Framework from Single to Multi-Document Summarization.
5 Salience Estimation via Variational Auto-Encoders for Multi-Document Summarization.
6 Supervised Learning of Automatic Pyramid for Optimization-Based Multi-Document Summarization.
7 Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps
Evaluation Metrics 1.ROUGE(2004) : Rouge: A package for automatic evaluation of summaries
2.BLUE(2002) : BLEU: a Method for Automatic Evaluation of Machine Translation
3.BE(2006) : Automated Summarization Evaluation with Basic Elements
4.Pyramid Method(2007) : Evaluating Content Selection in Summarization: The Pyramid Method
5.(2018 Shaflei) : Summarization Evaluation in the Absence of Human Model Summaries Using the Compositionality of Word Embeddings
6.(2018 Honda) : Pruning Basic Elements for Better Automatic Evaluation of Summaries
Other Resources awesome-text-summatization :
  • The guide to tackle with the Text Summarization
  • A curated list of resources dedicated to text summarization
SOTA in summarizaiton : The current state-of-the-art

    推荐阅读