论文研读：R-CNN - 岁月轻狂

内容目录

R-CNN属于两阶段目标检测器，也就是会首先生成可能包含物体的候选区域(region proposal)，然后再对候选区域进一步分类和校准，最终得到检测结果。R-CNN是首次把CNN引入目标检测领域，极大地提高了目标检测的精度，后续的Fast R-CNN以及Faster R-CNN都继承于它，算是开山鼻祖吧。
Rich feature hierarchies for accurate object detection and semantic segmentation
我们先看一下论文的摘要：

Object detection performance, as measured on the
canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex en- semble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that im- proves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012—achieving a mAP of53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural net- works (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at http://www.cs.berkeley.edu/˜rbg/rcnn.

解决的问题

用有监督预训练解决表示学习需要大量标注数据的问题。
仅生成少量候选区域来解决模型计算速度慢的问题。

基本结构

更详细：

步骤

1. 使用Selective Search算法生成大约 2k 个候选区域（region propasals）。

2. 使用CNN提取区域特征。

step1 : 在 ImageNet上进行监督预训练。

          - 直接用Alexnet网络，使用已经训练过的参数。
         （ImageNet数据量大，有1000类，包含120万张图像）

step2 : 对目标任务进行微调（Fine-tuning）。

- 将分类层的1000类改为21类(20类+1个背景类，网络优化求解时采用SGD，学习率设置为0.001，IoU设置为0.5，大于为正样本，小于为负样本))
(Pascal VOC 数据量相对比较少，有20类，仅包含数千或上万张图像)