MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation

摘要

Pose-guided person image generation usually involves using paired source-target images to supervise the training, which significantly increases the data preparation effort and limits the application of the models. To deal with this problem, we propose a novel multi-level statistics transfer model, which disentangles and transfers multi-level appearance features from person images and merges them with pose features to reconstruct the source person images themselves. So that the source images can be used as supervision for self-driven person image generation. Specifically, our model extracts multi-level features from the appearance encoder and learns the optimal appearance representation through attention mechanism and attributes statistics. Then we transfer them to a pose-guided generator for re-fusion of appearance and pose. Our approach allows for flexible manipulation of person appearance and pose properties to perform pose transfer and clothes style transfer tasks. Experimental results on the DeepFashion dataset demonstrate our method’s superiority compared with state-of-the-art supervised and unsupervised methods. In addition, our approach also performs well in the wild.

出版物
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Contribution

figure1
Figure 1. Self-driven person image generation and visualization of pose transfer with clothes style transfer.

Our model can be trained in a self-driven way without paired source-target images and flexibly controls the appearance and pose attributes to achieve pose transfer and clothes style transfer in inference. The images in (c) show the generated results using this model for simultaneous pose and cloths style transfer. Source A is transferred to the target pose, and its clothes are replaced by source B’s.

Proposed method

figure2
Figure 2. Overview of our MUST-GAN model for self-driven person image generation.

Appearance encoder extracts the features of the person image parts Ia parts by semantic segmentation map Sa. Pose encoder encodes the pose image Pa and pose connection map Pa_con and guides the Generator to synthesize the source posture. The MUST module disentangles and transfers multi-level appearance features, and the Generator fuses the multi-level appearance features and pose codes for reconstruction of the source person image Ia.

Experimental results

figure3
Figure 3. The results of our method in the pose transfer task.
figure4
Figure 4. The results of our method in the clothes style transfer task.

Video

MUST-GAN: Multi-level Statistics Transfer for Self-driven Person Image Generation

马天翔
马天翔
在读硕士、协助指导

主要从事计算机视觉、深度学习、内容安全等方面研究工作。个人主页:https://tianxiangma.github.io/

彭勃
彭勃
副研究员
王伟
王伟
副研究员、硕导

主要从事多媒体内容安全、人工智能安全、多模态内容分析与理解等方面的研究工作。

董晶
董晶
研究员、硕导

主要从事多媒体内容安全、人工智能安全、多模态内容分析与理解等方面的研究工作。详情访问:http://cripac.ia.ac.cn/people/jdong