Mansformer: Efficient Transformer of Mixed Attention for Image Deblurring and Beyond

Kuo, Pin-Hung; Pan, Jinshan; Chien, Shao-Yi; Yang, Ming-Hsuan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2404.06135 (cs)

[Submitted on 9 Apr 2024]

Title:Mansformer: Efficient Transformer of Mixed Attention for Image Deblurring and Beyond

Authors:Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang

View PDF HTML (experimental)

Abstract:Transformer has made an enormous success in natural language processing and high-level vision over the past few years. However, the complexity of self-attention is quadratic to the image size, which makes it infeasible for high-resolution vision tasks. In this paper, we propose the Mansformer, a Transformer of mixed attention that combines multiple self-attentions, gate, and multi-layer perceptions (MLPs), to explore and employ more possibilities of self-attention. Taking efficiency into account, we design four kinds of self-attention, whose complexities are all linear. By elaborate adjustment of the tensor shapes and dimensions for the dot product, we split the typical self-attention of quadratic complexity into four operations of linear complexity. To adaptively merge these different kinds of self-attention, we take advantage of an architecture similar to Squeeze-and-Excitation Networks. Furthermore, we make it to merge the two-staged Transformer design into one stage by the proposed gated-dconv MLP. Image deblurring is our main target, while extensive quantitative and qualitative evaluations show that this method performs favorably against the state-of-the-art methods far more than simply deblurring. The source codes and trained models will be made available to the public.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2404.06135 [cs.CV]
	(or arXiv:2404.06135v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2404.06135

Submission history

From: Pin-Hung Kuo [view email]
[v1] Tue, 9 Apr 2024 09:02:21 UTC (49,137 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Mansformer: Efficient Transformer of Mixed Attention for Image Deblurring and Beyond

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Mansformer: Efficient Transformer of Mixed Attention for Image Deblurring and Beyond

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators