Noisy inputs | True outputs | BP-Net's top-1 sample | BP-Net's random sample | PredRNN++ [2] |
---|---|---|---|---|
Noisy inputs | True outputs | BP-Net's top-1 sample | SVG-LP's best sample [3] | PredRNN++ [2] |
---|---|---|---|---|
We study a new research problem of probabilistic future frames prediction from a sequence of noisy inputs, which is useful because it is difficult to guarantee the quality of input frames in practical spatiotemporal prediction applications. It is also challenging because it involves two levels of uncertainty: the perceptual uncertainty from noisy observations and the dynamics uncertainty in forward modeling.
Fig 1: A comparison of existing video prediction models and our model. Our model works under the Bayesian filtering framework to jointly consider both perceptual uncertainty εt and dynamics uncertainty zt.
We propose to tackle this problem with an end-to-end trainable model named Bayesian Predictive Networks (BP-Net). Unlike previous work in stochastic video prediction that assumes spatiotemporal coherence and therefore fails to deal with perceptual uncertainty, BP-Net models both levels of uncertainty in an integrated framework.
Further, unlike previous work that can only provide unsorted estimations of future frames, BP-Net leverages a differentiable sequential importance sampling (SIS) approach to make future predictions based on the inference of underlying physical states, thereby providing sorted prediction candidates in accordance with the SIS importance weights, i.e., the confidences.