Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Controlling Vision-Language Models for Multi-Task Image Restoration
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Systems and Control. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Artificial Intelligence.ORCID iD: 0000-0003-3334-8655
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Systems and Control. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Artificial Intelligence.ORCID iD: 0000-0001-5456-5515
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Systems and Control. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Artificial Intelligence.ORCID iD: 0000-0002-0368-786X
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Systems and Control. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Artificial Intelligence.ORCID iD: 0000-0002-9099-3522
Show others and affiliations
2024 (English)Conference paper, Poster (with or without abstract) (Other academic)
Abstract [en]

Vision-language models such as CLIP have shown great impact on diverse downstream tasks for zero-shot or label-free predictions. However, when it comes to low-level vision such as image restoration their performance deteriorates dramatically due to corrupted inputs. In this paper, we present a degradation-aware vision-language model (DA-CLIP) to better transfer pretrained vision-language models to low-level vision tasks as a universal framework for image restoration. More specifically, DA-CLIP trains an additional controller that adapts the fixed CLIP image encoder to predict high-quality feature embeddings. By integrating the embedding into an image restoration network via cross-attention, we are able to pilot the model to learn a high-fidelity image reconstruction. The controller itself will also output a degradation feature that matches the real corruptions of the input, yielding a natural classifier for different degradation types. In addition, we construct a mixed degradation dataset with synthetic captions for DA-CLIP training. Our approach advances state-of-the-art performance on both degradation-specific and unified image restoration tasks, showing a promising direction of prompting image restoration with large-scale pretrained vision-language models. Our code is available at https://github. com/Algolzw/daclip-uir.

Place, publisher, year, edition, pages
Vienna, Austria: The International Conference on Learning Representations (ICLR) , 2024.
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:uu:diva-544058OAI: oai:DiVA.org:uu-544058DiVA, id: diva2:1916833
Conference
The Twelfth International Conference on Learning Representations, Vienna, Austria, May 7, 2024
Available from: 2024-11-28 Created: 2024-11-28 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

fulltext(15104 kB)333 downloads
File information
File name FULLTEXT01.pdfFile size 15104 kBChecksum SHA-512
8d602db632e8d9f4d15193e8f8f820c2369d91a6292b9eeeb4368f840c5a1de0403a4cb26690c6a10bd070608ebcfe7e96fc3b45656ea706bbfcc2536feb1e92
Type fulltextMimetype application/pdf

Other links

Preprint at ArXiv

Authority records

Luo, ZiweiGustafsson, Fredrik K.Zhao, ZhengSjölund, JensSchön, Thomas B.

Search in DiVA

By author/editor
Luo, ZiweiGustafsson, Fredrik K.Zhao, ZhengSjölund, JensSchön, Thomas B.
By organisation
Division of Systems and ControlArtificial IntelligenceAutomatic control
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar
Total: 333 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 487 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf