论文研读:R-CNN

目录

R-CNN属于两阶段目标检测器,也就是会首先生成可能包含物体的候选区域(region proposal),然后再对候选区域进一步分类和校准,最终得到检测结果。R-CNN是首次把CNN引入目标检测领域,极大地提高了目标检测的精度,后续的Fast R-CNN以及Faster R-CNN都继承于它,算是开山鼻祖吧。
Rich feature hierarchies for accurate object detection and semantic segmentation
我们先看一下论文的摘要:

Object detection performance, as measured on the
canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex en- semble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that im- proves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012—achieving a mAP of53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural net- works (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at http://www.cs.berkeley.edu/˜rbg/rcnn.


解决的问题

  • 用有监督预训练解决表示学习需要大量标注数据的问题。
  • 仅生成少量候选区域来解决模型计算速度慢的问题。

基本结构

RNN结构

更详细:

FCN

步骤

1. 使用Selective Search算法生成大约 2k 个候选区域(region propasals)。

2. 使用CNN提取区域特征。

  • step1 : 在 ImageNet上进行监督预训练。
    监督预训练

              - 直接用Alexnet网络,使用已经训练过的参数。
             (ImageNet数据量大,有1000类,包含120万张图像)
    
  • step2 : 对目标任务进行微调(Fine-tuning)。
    微调
    - 将分类层的1000类改为21类(20类+1个背景类,网络优化求解时采用SGD,学习率设置为0.001,IoU设置为0.5,大于为正样本,小于为负样本))
    (Pascal VOC 数据量相对比较少,有20类,仅包含数千或上万张图像)

3. 对区域进行分类(采用线性SVM分类器)。

为每个物体类别训练一个svm线性二分类器。将特征向量送入每一类的SVM 分类器,判别是否属于该类,此时的IoU设置为0.3(论文中指出,测试了IOU分别为0,0.1,0.2,0.3,0.4,0.5的情况,0.3效果最好,如果设置为0.5,mAP会减低5个百分点,选择0,会下降4个百分点)

4. 使用回归器对边框进行校准。

位置精修

取得的结果

PASCAL VOC 2010测试集(20类)上实现了53.7%的mAP。
PASCAL VOC 2012测试集(20类)上实现了53.3%的mAP。
rcnn结果

ILSVRC2013 detection 数据集(200类)上实现了 31.4% 的mAP。

计算Region Proposals和features所花时间:13s/image on a GPU;53s/image on a CPU。

参考链接

Rich feature hierarchies for accurate object detection and semantic segmentation
https://zhuanlan.zhihu.com/p/23006190
https://blog.csdn.net/shenxiaolu1984/article/details/51066975
https://www.cnblogs.com/skyfsm/p/6806246.html


笔者目前主要从事自动驾驶研发工作,本公众号主要关注机器人和自动驾驶方向的技术以及最新进展,闲暇时也和大家聊聊读书、生活以及最新的电子科技产品,希望大家喜欢,共同进步!
欢迎关注

打赏作者