论文阅读|论文阅读《Hamming embedding and weak geometric consistency for large scale image search》图片搜索|descriptor|BOW|图像检索

Reference

原文
PPT:重要
Hamming embedding similarity-based image classification
机器学习中的各种距离

我的理解

图片搜索：利用BOW进行图片搜索的一般思路是，对database中的每张图片的关键点所在区域计算SIFT，然后对SIFT进行k-means聚类，若k较小，量化误差会较大，相关性较弱的descriptor会落在同一个cell；若k较大，则出现descriptor noise，相关性较强的descriptor落入不同cell。聚类和得到一个有visual word组成的codebook，对每张图片的每个SIFT进行最近邻统计从而得到每张图片关于codebook中visual word 出现频率的统计直方图。图片搜索的最终要求是判断两张图片的距离，这样直接通过比较两个图片之间SIFT集合之间的距离就可以了，可以简化为计算计算直方图之间的距离嘛。（说得真是不清不楚，我都看不下去了~表达能力太差。）

In order to deal with large image datasets, Sivic and Zisserman [4] introduced the bag-of-features (BOF) image representation in the context of image search. Descriptors are quantized into visual words with the k-means algorithm. An image is then represented by the frequency histogram of visual words obtained by assigning each descriptor of the image to the closest visual word.
Hamming embedding: “embedding”，指嵌入，就是将某种技术或者框架嵌入到该任务中。这里是将hamming距离计算这一方法加入到图片搜索中。本文取用较小的K，这样量化误差增大，需要对每个cluster中的descriptor进行第二次区分。为了保证计算效率，首先随机生成一个db*d的正交矩阵用来对descriptor进行投影得到db长度的向量。然后对于同一cluster中的descriptor的db维度中的每个维度求其中值，然后得到db个阈值，按照这db个阈值将所有descriptor用db个bit量化。这样在进行query descriptor的匹配时，必须满足落在同一个cluster中而且其hamming距离（db个长度）必须小于一定的值。实际上是两次量化实现了较高精度的匹配。
Weak geometry consistency：核心思想，两个相似的物体，在进行旋转缩放后，其对应区域的旋转角度和尺度变化具有一致性，这样descriptor对应. details not clear.

Example figure 【论文阅读|论文阅读《Hamming embedding and weak geometric consistency for large scale image search》】