Two Stream Semantic Compression

Solomon Garber

Brandeis University

solomongarber@brandeis.edu

Ryan Marcus

MIT CSAIL

ryanmarcus@csail.mit.edu

Antonella DiLillo

Brandeis University

dilant@brandeis.edu

James Storer

Brandeis University

storer@brandeis.edu

Abstract

Video containing only oscillatory motion can be compressed and approximately reconstructed using static descriptors and global motion parameters. In this work we propose a system for generating these descriptors in a video containing semantic object motion in the foreground which may be occluding the background oscillations in different regions at different times, for example an outdoor soccer video. Our technique improves visual quality over traditional video codecs in the video background while preserving the fidelity in semantically important foreground region. These improvements are most pronounced at low bitrates.

Figure 1. Our two level video encoder pipeline. Foreground regions are separated and compressed using a standard codec with image persistence. Masks are spatially downsampled and compressed with a standard codec. Background regions are filtered and reduced to a static sprite, complex valued modes of horizontal and vertical motion, and frame-wide global motion parameters.

Figure 2. Semantic foreground subtraction pipeline. Foreground regions are removed and filled using feedback and inpainting.

Figure 3. Sample frames from four versions of a 1m48s 1080p video, 59.94 fps, 5868 frames total. (a): video encoded with AVC, original quality (397.9 MB, 0.26 bpp, approximately 29 Mbit/s). (b): same frame encoded by AVC, low quality settings (3.4 KB, .0023 bpp, approximately 255 Kbit/s). (c): same frame encoded with our method (3.4 KB or .0023 bpp, approximately 255 Kbit/s). (d): A/B comparison of our method and AVC. Click on the still image to view the corresponding video

Figure 5. The video clip is 2 minutes and 40 seconds in duration, shot at 23.976 frames per second, a total of 3714 frames. (a) The clip, when encoded with our HEVC tool at its default good quality setting, was 62.7 MB (0.065 bpp, approximately 3 Mbit/s). (b) With our HEVC tool at its lowest quality setting, the clip was 2 MB (0.0021 bits per pixel, approximately 102 Kbit/s). (c) Our encoder produced a clip of 2 MB (0.0021 bits per pixel, 102 Kbit/s). (d) A side-by-side comparison of our approach and the HEVC tool at its lowest quality.

Two Stream Semantic Compression of Videos with Dynamic Backgrounds

Abstract