Human-assisted motion annotation


(a) A frame of a video sequence	(b) User-aided layer segmentation	(c) User-annoated motion	(d) Output of a flow algorithm [2]
Figure 1. We designed a system to allow the user to specify layer configurations and motion hints (b). Our system uses these hints to calculate a dense flow field for each layer. We show that the flow (c) is repeatable and accurate. (d): The output of a representative optical flow algorithm [2], trained on the Yosemite sequence, shows many differences from the labeled ground truth for this and other realistic sequences we have labeled. This indicates the value of our database for training and evaluating optical flow algorithms.

Abstract

Obtaining ground-truth motion for arbitrary, real-world video sequences is a challenging but important task for both algorithm evaluation and model design. Existing groundtruth databases are either synthetic, such as the Yosemite sequence, or limited to indoor, experimental setups, such as the database developed in [1]. We propose a human-inloop methodology to create a ground-truth motion database for the videos taken with ordinary cameras in both indoor and outdoor scenes, using the fact that human beings are experts at segmenting objects and inspecting the match between two frames. We designed an interactive computer vision system to allow a user to efficiently annotate motion. Our methodology is cross-validated by showing that human annotated motion is repeatable, consistent across annotators, and close to the ground truth obtained by [1]. Using our system, we collected and annotated 10 indoor and outdoor real-world videos to form a ground-truth motion database. The source code, annotation tool and database is online for public evaluation and benchmarking.

What is motion?

Motion can be the physical movement of pixels or human percept. Can we rely on human perception of motion for motion annotation? Hope this page of "what is motion" can make you think about it.

Download the code and database

Layer segmentation and motion annotation tools are separated because many people only want to use the layer segmentation tool. These two systems were coded using Visual Studio 2005 and Qt 4.3 under gpl license. You can download the source code and binary (compiled and run in Windows Vista) for the two systems. I am still in the process of writing a detailed readme.

Here you can download the zip file of layer segmentation and MATLAB files for loading layer information [download].

Interface

Figure 2. A screen shot of our motion annotation system.

Motion statistics

Now that we have sufficient realistic, ground-truth motion data, as a side effect, we can learn the statistics of realistic motion fields. These statistics can lead to more accurate prior of flow fields and help to improve flow estimation algorithms [3]. We computed the marginal and joint statistics of the ground-truth flow in our database and displayed the log histograms in Figure 3. In (a) and (b), the marginal of u (horizontal flow) is flatter than that of v (vertical flow), indicating that horizontal motion dominates vertical. As shown in (b) and (i), the marginal of v is asymmetric, and there are more pixels falling down than going up (due to gravity). The marginals of the 1st-order derivatives of the flows are sparse, as shown in (c) to (f). Unlike the marginals of synthetic flow fields [3], our statistics show that the vertical flow is sparser than the horizontal flow, consistent with the fact that horizontal motion has a larger range. The temporal derivatives of the flow are not as sparse as the spatial ones, as depicted in (g) and (h). The joint histogram in (j) suggests that horizontal and vertical motion are likely to increase or decrease together temporally. The joint histograms in (k) and (l) reveal that the discontinuities of the flow are isotropic. At motion discontinuities, the change of vertical motion may dominate the change of horizontal motion, and vice versa, as shown in (m) and (n).

Figure 3. The marginal ((a)∼(h)) and joint ((i)∼(n)) statistics of the ground-truth optical flow in our database (log histogram).