文本检测模型概览(下)

Textboxes
Textboxes是基于SSD框架的图文检测模型,训练方式是端到端的,运行速度也较快。如下图所示,为了适应文字行细长型的特点,候选框的长宽比增加了1,2,3,5,7,10这样初始值。为了适应文本行细长型特点,特征层也用长条形卷积核代替了其他模型中常见的正方形卷积核。为了防止漏检文本行,还在垂直方向增加了候选框数量。为了检测大小不同的字符块,在多个尺度的特征图上并行预测文本框, 然后对预测结果做NMS过滤。【1】
文本检测模型概览(下)
文章图片

Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu. TextBoxes: A Fast Text Detector with a Single Deep Neural Network, AAAI 2017
TextBoxes++
Textboxes++是Textboxes的升级版本,目的是增加对倾斜文本的支持。为此,将标注数据改为了旋转矩形框和不规则四边形的格式;对候选框的长宽比例、特征图层卷积核的形状都作了相应调整。【1】
文本检测模型概览(下)
文章图片

Minghui Liao, Baoguang Shi, Xiang Bai. TextBoxes++: A Single-Shot Oriented Scene Text Detector, TIP 2018.
WordSup
这是百度的工作,做法也比较直接:通过一个弱监督的框架使用单词级别的标注来训练字符检测器,然后通过结构分析将检测到的字符组合成单词。【3】先用合成数据集来做mask的预训练。
Han Hu, Chengquan Zhang, Yuxuan Luo, Yuzhuo Wang, Junyu Han, Errui Ding. WordSup: Exploiting Word Annotations for Character based Text Detection, 2017 International Conference on Computer Vision.
DDR
①为了解决倾斜场景文本的检测,作者提出了将现有检测方法分类间接回归和直接回归两大类。下图中左边的图就是间接回归的示意图,所谓间接回归,意思就是网络预测的不是直接的bounding box,而是首先需要提proposal,然后预测的是和这个proposal之间的偏移距离。右图是直接回归的示意图,直接回归直接从每一个像素点回归出这个文本的四个角点。②为了不让网络confuse,作者没有把text region中的所有像素点都作为正样本,而是多加了一个dont care的过渡区域。【2】③融合不同层的特征,通过多任务学习进行文本分割以及文本框的回归。【3】

Wenhao He, Xu-Yao Zhang, Fei Yin, Cheng-Lin Liu. Deep Direct Regression for Multi-Oriented Scene Text Detection, 2017
BorderLearning
1) We analyze the insufficiencies of the classic non-text and text settings for text detection. 2) We introduce the border class to the text detection problem for the first time, and validate that the decoding process is largely simplified with the help of text border. 3) We collect and release a new text detection PPT dataset containing 10,692 images with non-text, border, and text annotations. 4) We develop a lightweight (only 0.28M parameters), fully convolutional network (FCN) to effectively learn borders in text images.
文本检测模型概览(下)
文章图片

【文本检测模型概览(下)】Yue Wu ; Prem Natarajan. Self-Organized Text Detection with Minimal Post-processing via Border Learning, ICCV 2017.

Single Shot Text Detector with Regional Attention
在SSD基础上加了一个模块,这个模块引入了attention的机制即预测text mask,通过文本和非文本的判别让检测更加关注到文本区域上。【3】
文本检测模型概览(下)
文章图片

Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, Xiaolin Li. Single Shot Text Detector with Regional Attention, ICCV 2017.
RRD
作者认为,Detect Box的回归依赖于文本框的旋转敏感特征而text/non-text分类则依赖于旋转不敏感特征。传统的方法对于这两个任务共享特征,由于这两个任务的不兼容导致了性能的下降。作者提出了以下方法,在应对长文本和方向变化比较剧烈的情况,效果尤为显著。
To address this issue, we propose to perform classification and regression on features of different characteristics, extracted by two network branches of different designs. Concretely, the regression branch extracts rotation-sensitive features by actively rotating the convolutional filters, while the classification branch extracts rotation-invariant features by pooling the rotation-sensitive features.

  • Rotation-Sensitive Regression: Different from standard CNN features, RRD extracts rotation-sensitive features with active rotating filters (ARF). An ARF convolves a feature map with a canonical filter and its rotated clones.ARF makes N ? 1 clones of the canonical filter by rotating it to different angles.It produces a response map of N channels, each corresponding to the response of the canonical filter or its rotated clone.Besides, since the parameters between the N filters are shared, learning ARF requires much less training examples.
  • Rotation-Invariant Classification: ORN achieves rotation invariance by pooling responses of all N response maps.the rotationsensitive feature maps are pooled along their depth axis.Since the pooling operation is orderless and applied to all N response maps, the resulting feature map is locally invariant to object rotation.
文本检测模型概览(下)
文章图片

文本检测模型概览(下)
文章图片

Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-song Xia, Xiang Bai. Rotation-Sensitive Regression for Oriented Scene Text Detection, CVPR 2018.


参考文献
【1】自然场景文本检测识别技术综述
【2】论文阅读与实现--DDR
【3】深度学习大讲堂

    推荐阅读