Rotationally Invariant per-sample mixing layer for AA

We have a layer that takes as input:

A tensor of samples \(I[bs, h, w, spp, n_{in}]\)

A tensor of per-pixel angles \(\theta[bs, h, w]\)

A tensor of kernel weights \(K[k_h, k_w, n_{in}, n_{out}]\)

An array of subpixel offsets (corresponding to the samples'\(xy\) positions) \(p[bs, h, w, spp, 2]\)

And outputs a tensor of sample features \(O[bs, h, w, spp, n_{out}]\).

The operator bilinearly interpolates into the kernel based on the sample's rotated spatial position. The kernel is centered at each sample's location.

\[O[b, y, x, s', c_{out}] = \sum_{c_{in}} \sum_{s=1\ldots spp} I[b, y, x, s, c_{in}]K[rotate_{s'}(p[b, y, x, s], \theta[b, y, x]), c_{in}, c_{out}] \]

This operator is pretty slow, because of the quadratic sample-sample interaction. We also have a layer that outputs per-pixel quantities instead: \(O[bs, h, w, n_{out}]\).

Here's an example of learned kernels:

Datasets

We use three independent test cases:

rotated grayscale edges

random flat triangles

rotated sine waves.

The angle input is either learnable or hardcoded as: \[\theta = \arctan\left(\frac{\frac{dI}{dy}}{\frac{dI}{dx}}\right)\]

For all the results below, we use 1 hidden layer, 32 filters per layer. The filters have a receptive field of 5 pixels, and size \(25\times25\). That is, a sub-pixel resolution of roughly \(\frac{1}{5}\).

FixedEdges

Noisy outputs, with dirty edge

Sines

Fixed frequency, random angle. I forgot to output colors... But we get cleaner outputs.