2019-12-08

Data

We start from a high-res image, from the div2k dataset.

And extract bursts of 8 frames, with translation only. The frames have lower-res. They are resampled from the high-res image with a Gaussian kernel σ = 1 in new pixel size.

We assume exact alignement, and bin the samples at a higher pixel resolution. We also apply the Bayer filter to discard 2/3 of the color information.

The groundtruth:

Variants

3 models: - ours: permutation invariant + splatting kernels, softmax-normalized per-output pixel - ours unnormalized: same without normalization (kernels can be negative) - direct: permutation invariant architecture, but output pixel value directly instead of kernels - single frame: same as direct but using only 1 out of 8 frames

Results

Evaluation on images from flickr2k dataset

Ours k = 21	Ours unnormalized k = 21	Direct	Single frame
35.1 dB	38.9 dB	40.1 dB	32.3 dB

Left to right: ground-truth, result, diff map. Top to bottom: Ours, Ours unnormalized, Direct

kernel, unnormalized $direct$

More results (ours, unnormalized)

Blurry output with normalized kernels

Take-away

Currently Direct > Ours (unnormalized) > Ours (normalized) > single-frame
Will need to investigate why normalization overblurs, maybe saturation of the softmax weights / bug?
Task may be too easy to see a difference:
- sparser sampling?
- more complex motion?
- add noise?
- add inaccuracy in registration?
- add occlusions?

Stability w.r.t. permutation

All models are permutation-invariant in the frames.

Utilization of sub-pixel sample coordinate

All models are sensitive to perturbation of the sub-pixel coordinates
So they are using this information
semi-bad news: direct pixel prediction as well!

Zeroing out the sample coordinates, note the shift:

Randomizing the coord in [0,1]