2019-12-08

Data

We start from a high-res image, from the div2k dataset.

And extract bursts of 8 frames, with translation only. The frames have lower-res. They are resampled from the high-res image with a Gaussian kernel σ = 1 in new pixel size.

We assume exact alignement, and bin the samples at a higher pixel resolution. We also apply the Bayer filter to discard 2/3 of the color information.

The groundtruth:

Variants

3 models: - ours: permutation invariant + splatting kernels, softmax-normalized per-output pixel - ours unnormalized: same without normalization (kernels can be negative) - direct: permutation invariant architecture, but output pixel value directly instead of kernels - single frame: same as direct but using only 1 out of 8 frames

Results

Evaluation on images from flickr2k dataset

Ours k = 21 Ours unnormalized k = 21 Direct Single frame
35.1 dB 38.9 dB 40.1 dB 32.3 dB

Left to right: ground-truth, result, diff map. Top to bottom: Ours, Ours unnormalized, Direct

kernel, normalized kernel, unnormalized direct

More results (ours, unnormalized)

Blurry output with normalized kernels

Take-away

Stability w.r.t. permutation

All models are permutation-invariant in the frames.

Utilization of sub-pixel sample coordinate

Zeroing out the sample coordinates, note the shift:

Randomizing the coord in [0,1]