Wait, all my eye-rolling at the TV/film trope of "Computer, Enhance!" de-blurring is now redundant, and that stuff is real?!
This looks incredibly impressive as a result, but I'm wary of the use of metrics like FID to evaluate performance. I can take a high-res image, downsample it, then use the method and measure performance very easily: what percentage of pixels were correctly restored? Instead they're using metrics like FID which - while useful for purely generative techniques - seem a little vague for this purpose.
The data processing inequality holds regardless of how many layers are in your neural net (processing data does not increase it's information content). You can impute missing data, and with something very regular text it could work pretty well, but that way lies hallucination.
This looks incredibly impressive as a result, but I'm wary of the use of metrics like FID to evaluate performance. I can take a high-res image, downsample it, then use the method and measure performance very easily: what percentage of pixels were correctly restored? Instead they're using metrics like FID which - while useful for purely generative techniques - seem a little vague for this purpose.