A New Ensemble Method for Concessively Targeted Multi-Model Attack

摘要

It is well known that deep learning models are vulnerable to adversarial examples crafted by maliciously adding perturbations to original inputs. There are two types of attacks: targeted attack and non-targeted attack, and most researchers often pay more attention to the targeted adversarial examples. However, targeted attack has a low success rate, especially when aiming at a robust model or under a black-box attack protocol. In this case, non-targeted attack is the last chance to disable AI systems. Thus, in this paper, we propose a new attack mechanism which performs the non-targeted attack when the targeted attack fails. Besides, we aim to generate a single adversarial sample for different deployed models of the same task, e.g. image classification models. Hence, for this practical application, we focus on attacking ensemble models by dividing them into two groups: easy-to-attack and robust models. We alternately attack these two groups of models in the non-targeted or targeted manner. We name it a bagging and stacking ensemble (BAST) attack. The BAST attack can generate an adversarial sample that fails multiple models simultaneously. Some of the models classify the adversarial sample as a target label, and other models which are not attacked successfully may give wrong labels at least. The experimental results show that the proposed BAST attack outperforms the state-of-the-art attack methods on the new defined criterion that considers both targeted and non-targeted attack performance.

出版物
arXiv preprint arXiv:1912.10833
何子文
何子文
在读博士、联合指导

主要从事人工智能安全、对抗样本等方面的研究。

王伟
王伟
副研究员、硕导

主要从事多媒体内容安全、人工智能安全、多模态内容分析与理解等方面的研究工作。

轩心升
轩心升
硕士

主要从事人工智能安全、图像取证等方面的研究。

董晶
董晶
研究员、硕导

主要从事多媒体内容安全、人工智能安全、多模态内容分析与理解等方面的研究工作。详情访问:http://cripac.ia.ac.cn/people/jdong

谭铁牛
谭铁牛
研究员,博导

主要从事图像处理、计算机视觉和模式识别等相关领域的研究工作,目前的研究主要集中在生物特征识别、图像视频理解和信息内容安全等三个方向。