Two Stream Semantic Compression of Videos with Dynamic Backgrounds

Solomon Garber
Brandeis University
Ryan Marcus
MIT CSAIL
Antonella DiLillo
Brandeis University
James Storer
Brandeis University

Abstract

Video containing only oscillatory motion can be compressed and approximately reconstructed using static descriptors and global motion parameters. In this work we propose a system for generating these descriptors in a video containing semantic object motion in the foreground which may be occluding the background oscillations in different regions at different times, for example an outdoor soccer video. Our technique improves visual quality over traditional video codecs in the video background while preserving the fidelity in semantically important foreground region. These improvements are most pronounced at low bitrates.

Figure 1. Our two level video encoder pipeline. Foreground regions are separated and compressed using a standard codec with image persistence. Masks are spatially downsampled and compressed with a standard codec. Background regions are filtered and reduced to a static sprite, complex valued modes of horizontal and vertical motion, and frame-wide global motion parameters.
Figure 2. Semantic foreground subtraction pipeline. Foreground regions are removed and filled using feedback and inpainting.
(a) Original (link)
(b) AVC (link)
(c) Ours (link)
(d) Ours / AVC (link)
Figure 3. Sample frames from four versions of a 1m48s 1080p video, 59.94 fps, 5868 frames total. (a): video encoded with AVC, original quality (397.9 MB, 0.26 bpp, approximately 29 Mbit/s). (b): same frame encoded by AVC, low quality settings (3.4 KB, .0023 bpp, approximately 255 Kbit/s). (c): same frame encoded with our method (3.4 KB or .0023 bpp, approximately 255 Kbit/s). (d): A/B comparison of our method and AVC. Click on the still image to view the corresponding video
(a) Original (link)
(b) AVC (link)
(c) Ours (link)
(d) Ours / AVC (link)
Figure 5. The video clip is 2 minutes and 40 seconds in duration, shot at 23.976 frames per second, a total of 3714 frames. (a) The clip, when encoded with our HEVC tool at its default good quality setting, was 62.7 MB (0.065 bpp, approximately 3 Mbit/s). (b) With our HEVC tool at its lowest quality setting, the clip was 2 MB (0.0021 bits per pixel, approximately 102 Kbit/s). (c) Our encoder produced a clip of 2 MB (0.0021 bits per pixel, 102 Kbit/s). (d) A side-by-side comparison of our approach and the HEVC tool at its lowest quality.