We start from a high-res image, from the div2k dataset.
And extract bursts of 8 frames, with translation only. The frames have lower-res. They are resampled from the high-res image with a Gaussian kernel σ = 1 in new pixel size.
We assume exact alignement, and bin the samples at a higher pixel resolution. We also apply the Bayer filter to discard 2/3 of the color information.
The groundtruth:
3 models: - ours: permutation invariant + splatting kernels, softmax-normalized per-output pixel - ours unnormalized: same without normalization (kernels can be negative) - direct: permutation invariant architecture, but output pixel value directly instead of kernels - single frame: same as direct but using only 1 out of 8 frames
Evaluation on images from flickr2k dataset
Ours k = 21 | Ours unnormalized k = 21 | Direct | Single frame |
---|---|---|---|
35.1 dB | 38.9 dB | 40.1 dB | 32.3 dB |
Left to right: ground-truth, result, diff map. Top to bottom: Ours, Ours unnormalized, Direct
All models are permutation-invariant in the frames.
Zeroing out the sample coordinates, note the shift:
Randomizing the coord in [0,1]