Abstract
视频快照压缩成像是一种基于计算的成像技术,通过在时间域和空间域上的混合压缩来实现高效的成像。在视频快照压缩成像中,可以利用信号的稀疏性以及它在时间和空间域中的相关性,通过采用合适的视频快照压缩成像算法,从而有效地重建出原始视频信号。尽管最近基于深度学习的重建算法在大多数任务中取得了良好的结果,但还是存在过高的模型复杂度和较慢的重建速度。为解决这些问题,本文提出了一个基于三元自注意力的视频快照压缩成像重建网络模型 (Spatial-Channel-Temporal Snapshot Compressive Imaging,SCT-SCI),利用多分支分组自注意力机制来利用空间和时域的相关
性。SCT-SCI 网络由一个特征提取模块、一个视频重建模块和多个三元自注意力模块(Spatial-Channel-Temporal Block,SCT-Block) 组成。每个 SCT-Block 由一个窗口自注意力分支、一个通道自注意力分支和一个时序自注意力分支组成,同时引入空间聚合模块 (Spatial-Channel 2D Fusion,SC-2DFusion) 和全局聚合模块 (Spatial-Channel-Temporal 3D Fusion,SCT-3DFusion) 模块加强特征融合。结果显示,在模拟视频数据集上,该算法表现出了低复杂度的优势,同时在保证接近的重建质量的前提下,它比最先进的算法快了 31.58%的重建时间。这个显著的性能优势为视频快照压缩成像重建带来了新的技术突破,提升了实时性能。
Video Snapshot Compressive Imaging (SCI) is a computational imaging technique that achieves efficient imaging through hybrid compression in both the time and spatial domains. In video snapshot SCI, the sparsity of the signal and its correlation in the time and spatial domains can be exploited, effectively reconstructing the original video signal through the use of appropriate video snapshot SCI algorithms. Although recent deep learning-based reconstruction algorithms have achieved good results in most tasks, they still suffer from excessive model complexity and slower reconstruction speeds. To address these issues, this paper proposes a Spatial-Channel-Temporal Snapshot Compressive Imaging (SCT-SCI) reconstruction network model based on ternary self-attention. It employs a multi-branch grouped self-attention mechanism to leverage the correlation in the spatial and temporal domains. The SCT-SCI network consists of a feature extraction module, a video reconstruction module, and multiple Spatial-Channel-Temporal Blocks (SCT-Blocks). Each SCT-Block consists of a window self-attention branch, a channel self-attention branch, and a temporal self-attention branch, while also introducing a Spatial-Channel 2D Fusion (SC-2DFusion) and a Spatial-Channel-Temporal 3D Fusion (SCT-3DFusion) module to enhance feature fusion.The results show that on a simulated video dataset, this algorithm demonstrates the advantage of low complexity, while ensuring comparable reconstruction quality, it is 31.58% faster in reconstruction time than the most advanced algorithms. This significant performance advantage brings a new technological breakthrough to video snapshot compressive imaging reconstruction, enhancing real-time performance.
性。SCT-SCI 网络由一个特征提取模块、一个视频重建模块和多个三元自注意力模块(Spatial-Channel-Temporal Block,SCT-Block) 组成。每个 SCT-Block 由一个窗口自注意力分支、一个通道自注意力分支和一个时序自注意力分支组成,同时引入空间聚合模块 (Spatial-Channel 2D Fusion,SC-2DFusion) 和全局聚合模块 (Spatial-Channel-Temporal 3D Fusion,SCT-3DFusion) 模块加强特征融合。结果显示,在模拟视频数据集上,该算法表现出了低复杂度的优势,同时在保证接近的重建质量的前提下,它比最先进的算法快了 31.58%的重建时间。这个显著的性能优势为视频快照压缩成像重建带来了新的技术突破,提升了实时性能。
Video Snapshot Compressive Imaging (SCI) is a computational imaging technique that achieves efficient imaging through hybrid compression in both the time and spatial domains. In video snapshot SCI, the sparsity of the signal and its correlation in the time and spatial domains can be exploited, effectively reconstructing the original video signal through the use of appropriate video snapshot SCI algorithms. Although recent deep learning-based reconstruction algorithms have achieved good results in most tasks, they still suffer from excessive model complexity and slower reconstruction speeds. To address these issues, this paper proposes a Spatial-Channel-Temporal Snapshot Compressive Imaging (SCT-SCI) reconstruction network model based on ternary self-attention. It employs a multi-branch grouped self-attention mechanism to leverage the correlation in the spatial and temporal domains. The SCT-SCI network consists of a feature extraction module, a video reconstruction module, and multiple Spatial-Channel-Temporal Blocks (SCT-Blocks). Each SCT-Block consists of a window self-attention branch, a channel self-attention branch, and a temporal self-attention branch, while also introducing a Spatial-Channel 2D Fusion (SC-2DFusion) and a Spatial-Channel-Temporal 3D Fusion (SCT-3DFusion) module to enhance feature fusion.The results show that on a simulated video dataset, this algorithm demonstrates the advantage of low complexity, while ensuring comparable reconstruction quality, it is 31.58% faster in reconstruction time than the most advanced algorithms. This significant performance advantage brings a new technological breakthrough to video snapshot compressive imaging reconstruction, enhancing real-time performance.
Translated title of the contribution | Reconstruction Method of Video Snapshot Compressive Imaging Based on Trilateral Self-Attention |
---|---|
Original language | Chinese (Simplified) |
Journal | 计算机工程 = Computer Engineering |
DOIs | |
Publication status | E-pub ahead of print - 16 Aug 2024 |
Keywords
- 视频快照压缩成像
- 压缩感知
- Transformer
- 深度学习
- 特征融合
- Video snapshot compressive imaging
- Compressive sensing
- deep learning
- feature fusion