What Is the Best Practice for CNNs Applied to Visual Instance Retrieval?

摘要

Previous work has shown that feature maps of deep convolutional neural networks (CNNs) can be interpreted as feature representation of a particular image region. Features aggregated from these feature maps have been exploited for image retrieval tasks and achieved state-of-the-art performances in recent years. The key to the success of such methods is the feature representation. However, the different factors that impact the effectiveness of features are still not explored thoroughly. There are much less discussion about the best combination of them. The main contribution of our paper is the thorough evaluations of the various factors that affect the discriminative ability of the features extracted from CNNs. Based on the evaluation results, we also identify the best choices for different factors and propose a new multi-scale image feature representation method to encode the image effectively. Finally, we show that the proposed method generalises well and outperforms the state-of-the-art methods on four typical datasets used for visual instance retrieval.

出版物
arXiv preprint arXiv:1611.01640
董晶
董晶
研究员、硕导

主要从事多媒体内容安全、人工智能安全、多模态内容分析与理解等方面的研究工作。详情访问:http://cripac.ia.ac.cn/people/jdong

王伟
王伟
副研究员、硕导

主要从事多媒体内容安全、人工智能安全、多模态内容分析与理解等方面的研究工作。

谭铁牛
谭铁牛
研究员,博导

主要从事图像处理、计算机视觉和模式识别等相关领域的研究工作,目前的研究主要集中在生物特征识别、图像视频理解和信息内容安全等三个方向。