Trials and Tribulations: Decoding Canon HG10 PF24 Videos

by Gerald Dalley
19 March 2008 [last updated: 25 March 2008]


Canon HG10

Canon was the first camera manufacturer to produce consumer-grade progressive-scan 1080-line HD camcorders.  Older camcorders used HDV tapes or DVDs to store the recorded videos using MPEG2 compression.  The HG10 camcorder stores AVCHD-encoded videos to an internal 40GB hard drive, allowing more than 5 hours of video at maximum quality settings.  Near-synonyms of AVCHD are H.264, MPEG4 AVC, MPEG4 Part 10, and VC-1.  I won't get into the finer distinctions here.  These codecs produce higher-quality compression for the same file size, though they generally require significantly greater computational horsepower to use.  The HG10's videos are about 4x smaller than the HDV videos recorded on its sibling camera, the HV30.

When creating its AVCHD camcorders, Canon propagated some tape-based design decisions that unfortunately mean that none of the consumer-grade software currently on the market can decode their data without reducing the output quality (as of mid-March 2008).  This document is meant to describe how Canon produces AVCHD, how codecs should handle it, and how many existing ones mishandle it.  The hope is that this can help the codec developers make Canon-aware codecs. 

Why do we care?  In our custom video analysis software, we wish to be able to extract the original data (modulo the artifacts introduced by the camera) from the file that comes directly from the camera without wasting massive amounts of disk space using a lossless intermediate codec or being forced to transcode our video and thus impose additional compression losses.  Presently, we cannot do this.

The rest of this document is organized as follows:

Interlacing and Deinterlacing Review

Figure 1: Interlacing Overview
Images produced using a pair of frames recorded on a Canon HV30 camcorder in pf30 mode.  b-c simulate recording i30 content.
(b) upper field, time 0ms
(a) progressive frame, time 0ms (c) lower field, 1/30sec later
(d) interlaced reconstruction (e) deinterlaced

Those who already know what interlacing is and why one would typically like to avoid it may safely skip this section.

Interlacing is a common technique used in videos where half the data is thrown away in each frame as a crude compression mechanism and/or as a way to effectively double frame rates.  All standard definition broadcast television uses interlacing and most consumer-grade video cameras can only record interlaced content.

In Figure 1a, we have a progressive frame captured from a tape-based Canon HV30 camcorder.  This means that all pixels were captured during the same time slice, the way a digital camera works.  

Most video cameras do not work this way; they capture interlaced data instead.  This means that lines 1, 3, 5, and so on captured at the same time as a field.  Lines 2, 4, 6, and so on are not captured at the same time.  They are captured later.  For interlaced 30 fields per second (i30) video (~half the US television frame rate), the second field containing lines 2, 4, 6, and so on is captured 1/30 second later.  A full frame's worth of data is only captured every 1/15 of a second.  Figure 1b and 1c represent the fields that correspond to an i30 capture of the same video.  Notice that in 1c the taxi has moved forward many pixels because this data was captured 1/30s later than 1b. 

If we interleave the two fields, we get an effect we'll call shearing in this document.  In Figure 1d, notice the fast moving car looks bad because the image contains data from two different points in time.  The road and gray overhanging lightpost look fine because they did not change between the times the two fields were captured.  Interlaced content does not look bad on standard television sets for various technical reasons outside the scope of this document.

Since interlaced content looks very bad outside some very specific contexts, a common trick is to deinterlace the content.  Deinterlacing is the process of trying to guess what the original progressive content actually was and/or smooth over the differences between fields captured at different times.  In Figure 1e, we show the results of a very simple deinterlacing algorithm.  It uses a somewhat special blurring operation to get rid of the jagged edges.  Note that the result is that the fast-moving car looks like two semi-transparent cars superimposed on each other.  It looks like there's a ghost car.  More sophisticated deinterlacing algorithms exist, but there are fundamental limitations to what they can do because they have to make a number of educated guesses.  One of the codecs we discuss later in this document does do deinterlacing and the results are qualitatively very similar to what we show in Figure 1e.

We next discuss how the Canon HG10 camera pretends to use interlacing and then we will show how this confuses applications that try to decode the video recorded from those cameras.

PF24 Encoding

Figure 2: PF24 Encoding
Here we depict what happens inside a Canon HG10 camera when recording in pf24 mode.  The original content is horizontally downsampled, then odd and even image rows are separated into "fields".  Finally, these fields are duplicated and swapped to simulate interlaced 60fps content.
description Image Sensor HDV-Style
Scaling
Interlaced
Embedding
Pulldown
size 1920x1080 1440x1080 1440x540 1440x540
time format p24 p24 i48 pf24 / i60
pictorial
depiction

First, some background...

For some technical reasons, tape-based camcorders cannot store progressive (non-interlaced) 24 frames per second (fps) content.  Many users want to record in this mode, so Canon chose to use something called pf24, 24pf, PsF24, or 24PsF.  This is a way of taking progressive 24fps data and storing it as if it were interlaced 60fps data.

The HG10 camera uses a hard drive instead of a tape and it uses the new AVCHD encoding instead of MPEG2.  Even though the old tape-based restrictions need not apply, Canon chose to continue using the pf24 scheme instead of actually encoding the videos using full frames at 24 frames per second.  To illustrate the effects of this decision, we will use Figure 2 in the following subsections. 

Image Sensor

When in pf24 mode, Canon's image sensor captures full 1920x1080 progressive frames.  This means that it captures all 2.1 megapixels at one single instant in time, like a normal digital camera does.  Progressive frame capture is a primary selling point of this camera and we will not concern ourselves with its alternative high frame rate interlaced modes in this discussion (e.g. it can do real i60 capture, but we don't want to use that mode because we don't want interlacing artifacts).

In Figure 2, the progressive frames are shown with light bands representing "upper" rows (1, 3, 5,...) and dark bands for lower rows (2, 4, 6, ...).  So far, all is as the marketing materials suggest.  We have true full 1920x1080 progressive frames, but not for long...

HDV-Style Horizontal Scaling

Due to some technical limitations, tape-based high-definition camcorders can only support frames up to 1440x1080 pixels, not wide 1920x1080 frames.  Even though the HG10 uses a hard drive instead of tape and it uses an MPEG4-based codec instead of MPEG2, the HG10 camcorder immediately does a horizontal downscaling (depicted in the 2nd column of Figure 2).  The 1920x1080 frame is unrecoverable, but fortunately for us, the frames are still progressive and most decoders understand non-square pixels.  Again, this is not to last...

Interlaced Embedding

When the camera prepares to send the data to the AVCHD encoder, it breaks up the frame into two fields as if it were interlaced content, even though the content is progressive.  It basically is pretending that the content is interlaced at 48 fields per second (depicted in the 3rd column of Figure 2).  It's unclear to the author why they could not encode the progressive 24 fps content directly.  Presumably the compression quality would be superior had they done so and we would avoid all the difficulties we are about to describe...

Pulldown

The final decision Canon made was to play some tricks to make the video look like it was interlaced 60 fields-per-second (i60) content.  This is because i48 content is not normal, but i60 is quite standard.  For example, good HDTVs support 1080i60 (1080 lines interlaced at 60 fields per second) natively.  To produce fake i60 content, they take every 4 fields (2 frames) and duplicate one field to make it 5 fields long.  Since 48×5/4=60, this makes the i48 content look like i60 content, at least as far as a naive decoder can tell.   If your TV or display device really requires i60 content, this is actually is the best thing to do, but this is something that could be handled by the player and in our opinion is not desirable in the original recorded content. 

Because it's important for a naive decoder to work, this new content must have upper fields in every other field slot and lower fields in the other.  This involves some field reordering.  In the 4th column of Figure 2,  we depict the results of this process, called 2:3 pulldown.  Note that the upper fields (lightly-colored) of frames B and F are duplicated, the lower fields of frames D and H are duplicated, and the temporal order of the fields of frames C and G are reversed.   

The pulled-down fields are what are actually sent to the AVCHD encoder and what get stored on the HG10's hard drive.  While no data are directly lost due to the pulldown (space is actually wasted encoding duplicate fields), we'll see that the choice to pretend the frames were interlaced tends to cause problems for decoders. 

PF24 Decoding / Software Interpretation

Figure 3: PF24 Decoding -- Common Realities vs. Correct
Here we depict various ways a decoder can handle the fake interlaced 60fps content that is produced by an HG10 camcorder when recording in pf24 mode. Only the "Correct p24 Reconstruction" recovers all the possible data without unnecessarily discarding, smoothing, or duplicating any.
description Canon's
Pulldown
Agnostic
Inter-leaving
De-
interlace
Pulldown-
Aware
Inter-leaving
Half-sizing Correct
p24
Recon-struction
size 1440x540 1440x1080 1440x1080 1440x1080 1440x540 1440x1080
time format pf24 / i60 p30 p30 p30 30 p24
pictorial
depiction

In this section, we assume we have a pf24 video recorded from a Canon HG10, produced as described in the previous section. 

Recall that pf24 is a progressive 24 frames per second video that has been converted to look like an interlaced 60 fields per second video by breaking up the frames into field pairs and then duplicating and reversing the order fields, as appropriate.  This data is depicted as the 1st column in Figure 3 (copied from the last column of Figure 2). 

Correct p24 Reconstruction

We now wish to decode this video to recover the 1440x1080 progressive frames from the second column in Figure 2.  This correct 1080p24 reconstruction is depicted in the far right column of Figure 3.  The intervening columns depict the most common results from actual codecs. 

Of all the codecs we have tried, none can reliably produce what we consider the correct reconstruction.  We are hopeful that this situation will change in the near future.  At least one company claims to have a patch that does, but the patch is not compatible with the trial version of the software, and the retail value of all software that claims to handle pf24 is literally several thousand dollars.  We're willing to purchase a reasonably-priced piece of software that actually works, but having to play roulette by purchasing packages until one finally works seems a bit wrong.

In the next few subsections, we will describe various possible suboptimal decoding strategies and point out some specific codecs that use them.

Agnostic Interleaving

One possible decoder strategy is to take i60 content and interleave field pairs to produce p30-style frames (depicted in the 2nd column of Figure 3).  Out of every resulting block of 5 frames, 3 will be reconstructed correctly, but 2 will contain fields from two adjacent frames.  Those two frames will look interlaced and contain data from two different points in time.  In our depiction, notice that the 3rd and 4th frames that have red and blue stripes in them.  The third reconstructed frame has half its data from the original B frame and half from the original C frame.  This is undesirable.

We note that if one really needs to convert from 24fps to 30fps, then you are basically stuck with doing something like this; however in our opinion, this conversion should be done only when encoding the final video, not when recording the original content on the camera.

If any decoders actually produced the type of output describe in this subsection, a smart application could carefully re-separate the fields, reverse the pulldown internally, and produce the correct p24 reconstruction.  This would impose a certain burden on the application, but no additional data would be lost this way.  Both ffdshow and CoreAVC can be configured to produce this output, but as of 25 March 2008, both have some significant decoder errors.

Deinterlaced Interleaving

Figure 4: Sample InterVideo Results
Shows part of a frame deinterlaced via the InterVideo decoder.  The deinterlaced frame was captured using GraphEdit and the progressive frame was manually extracted using Ulead's still frame extractor. 
Click on the cropped images to popup the full frames.

(a) interlaced then deinterlaced frame

(b) cropped progressive frame...life as it should be

"Agnostic Interleaving" produces interlacing artifacts.  Given that interlacing looks bad, most people will eventually want to deinterlace the content if they do not remove the pulldown.  The results of a simple deinterlacer are depicted in the 3rd column of Figure 3.   Unless the deinterlacer is really smart, it will end up destroying some information in the frames that are actually progressive.  In our depiction, note that the first frame is no longer striped with dark and light bands like the original...that detail has been destroyed.  Furthermore, since deinterlacing is essentially an ill-posed problem, it will never do a perfect job on the 2 out of each 5 frames that have been made interlaced.  Depending on the deinterlacing scheme used, ghosting and/or blurring will result, especially when there is movement in the scene.   

The "InterVideo H264 Decoder" that ships with the HG10 camcorder does deinterlacing.  It takes the progressive data that was stuffed in an interlaced format, thinks it's actually interlaced, creates interlacing artifacts when reconstructing frames, and finally does deinterlacing.  Since it deinterlaces, data is destroyed and the progressive frames cannot be reconstructed.

To further illustrate this issue, in Figure 4 we show some actual results using the InterVideo decoder on hg10-pf24-clip2.mts (see the Test Data section for details on the clip).  We loaded the video into Microsoft's GraphEdit applet (GraphEdit is a developer tool for debugging video processing pipelines). We manually paused the playback video at the selected frame.  We took a screenshot and cropped out a small area for display here.  In Figure 4a, the image looks like two copies of the SUV shifted by about 25 pixels then blended together with each other and the road.  In Figure 4b, we see a progressive frame from the same clip.  We want to be able to always extract progressive frames.  In the next subsection, we'll explain the special circumstances that allowed us to obtained that frame. 

If the InterVideo decoder allowed users to disable deinterlacing, we would have "Agnostic Interleaving" and a smart application could reverse the pulldown correctly.  Unfortunately, the codec is not configurable.

Pulldown-Aware Interleaving

In the last two subsections, we saw that interleaving data from neighboring fields in the i60 stream while pretending the content is really interlaced 60fps content leads to interlacing artifacts.  One might imagine that if a codec knew that the data stream had pulldown applied to it, one might be able to reverse the process and reconstruct progressive frames.  This is depicted in the 4th column back in Figure 3, where the pulldown is effectively reversed, the progressive frames are reconstructed, then a frame-based pulldown is re-applied to produce progressive 30fps content.  In the depiction, frames A and E are duplicated. 

If a codec worked this way, an application could relatively easily remove the duplicated frames and get correct p24 reconstruction.  No interlacing artifacts would be produced and no deinterlacing would be necessary.  In this scheme, every 5th frame is a duplicate.

In a very constrained situation, the InterVideo decoder works this way.  More specifically, it has a single-frame capture mode for manually dumping individual uncompressed frames to disk as .bmp files.  This mode can be used from within Ulead Video Studio 11.5 Plus.  Unfortunately, it requires at least four mouse clicks and 1-10 seconds per exported image.  There is no known way to efficiently automate this process and even if there were, producing one .bmp file per frame would get very unwieldy for all but very short video clips (469GB and about 108,000 files for each hour of video). 

If the InterVideo decoder allowed the user to (programmatically) configure it to do a pulldown-aware interleaving, a smart application could discard the extra frames recover the real p24 data.

Half-sizing

Yet another strategy one could employ is to simply throw away half of the data while decoding.  To display the frames, one would do a 2x vertical scaling.  While this avoids creating interlacing and the need for deinterlacing, it has the disadvantage of...throwing away half the data.  An application interested in extracting the 24fps data would also need to remember to discard every 5th frame.  This approach seems like an odd choice, but the ArcSoft Video Decoder actually does this. 

GraphEdit Decoding Results

In Figure 5 below, we supply results for all combinations of Splitter/Demultiplexers combined with all DirectShow H.264 decoders to which we have access.  The tests were performed using Microsoft's GraphEdit application and connecting the clip to the splitter/demux, then its output to the decoder, then the decoder output to a "Video Renderer".  We're using the most basic, it-should-always-work-for-compliant-filters approach.

In the body of the table in Figure 5, quoted text like "Half-sizing" refers to the subsections in the software interpretation discussion and the corresponding column in Figure 3.  Letterboxing means that the the frames have been downsampled and a black strip has been added to the top and bottom of the image.  Think of what you see when playing a widescreen DVD on a standard definition TV or monitor.  Red cells indicate no output, severely garbled output, or application crashes.  Yellow cells indicate suboptimal output of some sort.  If there were any, green cells would indicate filter combinations that produce progressive 24fps output without pulldown or deinterlacing artifacts.

Figure 5: GraphEdit Decoding Results
  Decoder

Splitter
/
Demux

  ArcSoft
Video
Decoder
CoreAVC
Video
Decoder
DivX
Decoder
Filter
(DivXDeux)
DivX
Decoder
Filter
(GUID)
ffdshow
Video
Decoder
InterVideo
Video
Decoder
InterVideo
Video
Decoder
for Ulead
PDR
H.264/AVC
Decoder
Ulead
H264
Decoder
ArcSoft
MPEG
Demux
1440x540
"Half-
sizing"
crashes when trying to
connect the filtergraph
1440x1080
blank
1440x810
letterboxed
to
1440x1080
garbled
1440x810
letterboxed
to
1440x1080
"Deinterlace"
filter
cannot
be
instantiated
1440x810
letterboxed
to
1440x1080
"Deinterlace"
crashes
when
trying to
connect the
filtergraph
Haali
Media
Splitter
locks up 1440x1080 "Agnostic Interleaving"
+ corruption on seek
1440x540
blank
1440x1080 "Agnostic Interleaving"
+ some corruption
forces other filters
to be instantiated,
then does not play
InterVideo
Demultiplexer
no output pin supplied no output pin supplied no output pin supplied
InterVideo
Demux
720x576
blank
crashes when trying to
connect the filtergraph
forces other filters
to be instantiated,
then does not play
forces other filters
to be instantiated,
then does not play
1440x1080
blank
720x576
blank
crashes when trying to
connect the filtergraph
MainConcept
Stream
Parser
1440x540
"Half-
sizing"
1440x810
letterboxed
to
1440x1080
"Deinterlace"
jerky motion
1440x810
letterboxed
to
1440x1080
"Deinterlace"
crashes
when
trying to
connect the
filtergraph
PDR
Demultiplexer
1440x1080
"Deinterlace"
Ulead
MPEG
Splitter
forces other filters
to be instantiated,
then does not play
forces other filters
to be instantiated,
then does not play
forces other filters
to be instantiated,
then does not play
crashes when
trying to
connect the
filtergraph
1440x810
letterboxed
to
1440x1080

Here we give additional details on a per-codec basis.  We also include notes about some standalone applications.

Test Data

We captured a 30-second clip with the HG10 camera and transferred it to our computer as an MPEG Transport Stream [hg10-24pf.MTS].  Using tsremux, we were able to extract the H.264 elementary stream [hg10-24pf.ts] and using Elecard's XMuxer, we embedded this in a mp4 container [hg10-24pf_remux.mp4].  A second 1 minute clip from the HG10 [hg10-pf24-clip2.mts] and two MPEG2-compressed HV30 clips are also available [30pf-hv30.mpg, hv30-pf30-clip2.mpg].

What we am looking for is a video with frames that decode like the images below:

Figure 7: HG10 Progressive Still Frames
This frame was exported from Ulead Video Studio 11.5+ from
[hg10-24pf.MTS].
thumbnail (click for full-size)
cropped detail

These images are (reasonably) sharp.   The only rescaling they have undergone is the 1920 --> 1440 scaling that happens inside the camera.  They have not be rescaled up to 1920, they have not been vertically downsampled to 810, 804, or 540 pixels.  They have not had scanlines removed.  There is no interlacing.  There is no deinterlacing.  There is no letterboxing.  It's just a decoded progressive frame.  What we're looking for is the ability to have a custom DirectShow application obtain frames like this.

Additional External Resources

Updates