TY - JOUR
T1 - CSformer: Bridging Convolution and Transformer for Compressive Sensing
AU - YE, Dongjie
AU - NI, Zhangkai
AU - WANG, Hanli
AU - ZHANG, Jian
AU - WANG, Shiqi
AU - KWONG, Sam
N1 - Publisher Copyright:
© 1992-2012 IEEE.
PY - 2023
Y1 - 2023
N2 - Convolutional Neural Networks (CNNs) dominate image processing but suffer from local inductive bias, which is addressed by the transformer framework with its inherent ability to capture global context through self-attention mechanisms. However, how to inherit and integrate their advantages to improve compressed sensing is still an open issue. This paper proposes CSformer, a hybrid framework to explore the representation capacity of local and global features. The proposed approach is well-designed for end-to-end compressive image sensing, composed of adaptive sampling and recovery. In the sampling module, images are measured block-by-block by the learned sampling matrix. In the reconstruction stage, the measurements are projected into an initialization stem, a CNN stem, and a transformer stem. The initialization stem mimics the traditional reconstruction of compressive sensing but generates the initial reconstruction in a learnable and efficient manner. The CNN stem and transformer stem are concurrent, simultaneously calculating fine-grained and long-range features and efficiently aggregating them. Furthermore, we explore a progressive strategy and window-based transformer block to reduce the parameters and computational complexity. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing, which achieves superior performance compared to state-of-the-art methods on different datasets. Our codes is available at: https://github.com/Lineves7/CSformer.
AB - Convolutional Neural Networks (CNNs) dominate image processing but suffer from local inductive bias, which is addressed by the transformer framework with its inherent ability to capture global context through self-attention mechanisms. However, how to inherit and integrate their advantages to improve compressed sensing is still an open issue. This paper proposes CSformer, a hybrid framework to explore the representation capacity of local and global features. The proposed approach is well-designed for end-to-end compressive image sensing, composed of adaptive sampling and recovery. In the sampling module, images are measured block-by-block by the learned sampling matrix. In the reconstruction stage, the measurements are projected into an initialization stem, a CNN stem, and a transformer stem. The initialization stem mimics the traditional reconstruction of compressive sensing but generates the initial reconstruction in a learnable and efficient manner. The CNN stem and transformer stem are concurrent, simultaneously calculating fine-grained and long-range features and efficiently aggregating them. Furthermore, we explore a progressive strategy and window-based transformer block to reduce the parameters and computational complexity. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing, which achieves superior performance compared to state-of-the-art methods on different datasets. Our codes is available at: https://github.com/Lineves7/CSformer.
KW - CNN
KW - Compressive sensing
KW - image reconstruction
KW - transformer
UR - https://www.mendeley.com/catalogue/08cd33ae-14ac-33b0-ab48-17060a4597c3/
UR - http://www.scopus.com/inward/record.url?scp=85159773152&partnerID=8YFLogxK
U2 - 10.1109/TIP.2023.3274988
DO - 10.1109/TIP.2023.3274988
M3 - Journal Article (refereed)
C2 - 37186533
SN - 1057-7149
VL - 32
SP - 2827
EP - 2842
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -