Recently, deep learning has shown its power in steganalysis. However, the proposed deep models have been often learned from pre-calculated noise residuals with fixed high-pass filters rather than from raw images. In this paper, we propose a new end-to-end learning framework that can learn steganalytic features directly from pixels. In the meantime, the high-pass filters are also automatically learned. Besides class labels, we make use of additional pixel level supervision of cover-stego image pair to jointly and iteratively train the proposed network which consists of a residual calculation network and a steganalysis network. The experimental results prove the effectiveness of the proposed architecture.