2:3 Pulldown Removal Example
Contents
Buffer a chunk of frames for pulldown detection
First, let's read 50 frames of our video into memory. We'll use this chunk of frames to try to detect the phase of the pulldown. This demo script only works for 2:3:2:3 pulldown.
% On Windows, this requires the Haali Media Splitter and ffdshow, % configured as described in ../readme.html, except skip the "SelectEvery" % and "Weave" calls. This also requires my videoIO package % (http://sourceforge.net/projects/videoio). clc; clear all; close all; vr = videoReader('../hg10-pf24-clip2.mts', 'preciseFrames',-1); info = getinfo(vr); pulldownBlockSize = 10; % # of fields in a complete pulldown block ((2+3)*2 = 10) nBlocks = 5; % how many pulldown blocks to acquire fields = zeros(info.height, info.width, 3, pulldownBlockSize*nBlocks+1); assert(logical(seek(vr,-1))); for i=0:pulldownBlockSize*nBlocks fields(:,:,:,i+1) = double(getnext(vr))/255; end interactive = 0; % disable interactivity when using publish-to-html
Per-field frame differencing
Here we compute the mean square frame difference error (MSE) between pairs of adjacent frames, on a per-field basis (so we compute the difference between the top field in frame 1 with the top field in frame 2 and the bottom field in frame 1 with the bottom field in frame 2).
Some decoders (i.e. some versions of Microsoft's Windows Media 9 decoder) are not "consistent." By this I mean that if you take a video and decode some frames, then seek back to the beginning and re-decode those same frames, the pixel values of the two decodes may in fact be different. Visually they'll look the same, but numerically they can differ. Decoders that are consistent always give the same decoded output for a given frame, no matter what.
If the encoder was consistent (it encoded the duplicate fields exactly the same) and decoder is consistent, then the MSE will be 0 every time there is a duplicate field. If either the encoder or decoder is inconsistent, then the MSE should still be very small for duplicated fields.
pull2err = zeros(pulldownBlockSize*nBlocks-2,1); for i=1:pulldownBlockSize*nBlocks - 1; f1 = fields(:,:,:,i); f2 = fields(:,:,:,i+2); pull2err(i) = mean((f1(:) - f2(:)).^2); end
Plot per-field frame differencing results
Here we look at the per-field frame differencing, as described above. For this video, ffmpeg is a consistent decoder so we actually get 0-touches every 5th frame. The rest of this code does not assume a consistent decoder (it would be simpler if we did assume it).
if interactive, clf; else close all; end plot(0:length(pull2err)-1, pull2err); xlabel('field number'); ylabel('MSE'); title({'MSE between pairs of matched top/top and bottom/bottom fields',... '(we assume top frame first)'});
Find the block starting frame
The very first frame we see may not be at the beginning of a block of 5 frames (10 fields) in a pulldown sequence. This could be because the camera chose to start at some non-standard time in the sequence or because the splitter eats some frames (hint: as best we can tell, Haali's eats at least 2 frames). We expect the MSE error to dip to zero (or near zero) every 5 fields for a 2:3:2:3 pulldown sequence. For smoothly-varying motion in our image, a robust way of detecting this is to take a discrete Fourier transform and look at the phase of the frequency component corresponding to an every-5-fields undulation. By looking at the phase of that FFT coefficient, we can compute when those dips occur. Other approaches might work too...this was the first thing that we tried that worked well.
if interactive, clf; else close all; end % compute the fft p = pull2err((1:end-pulldownBlockSize+1)+0); f = p; F = fft(f,length(p)); % Zero out all elements we don't care about (since we know exactly the % frequency we want). Make sure that we keep both the positive and % negative frequency components so we get a cosine out of ifft. FF = zeros(size(F)); keeper = length(FF)/(pulldownBlockSize/2); keepers = [keeper length(FF)-keeper]+1; FF(keepers) = F(keepers); % Visualizations subplot(311); plot(0:length(f)/2-1,abs(F(1:length(F)/2)), 'x:'); xlabel('pseudo frequency'); ylabel('fft magnitude'); subplot(312); plot(0:length(f)/2-1,angle(F(1:length(F)/2)),'x:'); grid on; xlabel('pseudo frequency'); ylabel('fft phase (radians)'); subplot(313); plot(0:length(f)-1,f,'b-x', (0:length(f)-1),ifft(FF),'r:x'); grid on; xlabel('field'); ylabel('MSE'); legend('original MSE', 'reconstructed MSE for phase detection'); % Compute which field was the first to be pulled 3 times instead of 2. % This will be the same field that has (near-)zero error. firstPull3Field = round((pi*3/2 - angle(F(keeper+1))) / (pi*2) * pulldownBlockSize/2) % 1-indexed % Compute the shift that tells us what part of the pulldown block the first % field corresponds to. if (mod(firstPull3Field,2)==0) % first pull3 is a bottom field pull23shift = -(firstPull3Field-1 - 7); else % first pull3 is a top field pull23shift = -(firstPull3Field-1 - 2); end pull23shift = mod(pull23shift+pulldownBlockSize, pulldownBlockSize)
firstPull3Field =
5
pull23shift =
8
Show some decoded frames
Here we decode some frames and use our learned phase information to do the pulldown reversal. We reread our frames each time to make debugging easier. A real implementation would avoid some of the work duplication.
We show some results starting after the first frame to see some interesting movement and avoid some mistakes made by the decoder since it was not fed a few initial fields (this is the splitter's fault, not the decoder's, assuming we're correct and it's the splitter that is dropping the first 4 fields).
if interactive clf;subplot('Position',[0 0 1 1]); fieldset = 0:2:floor(info.numFrames/2)-1 - 4; else close all; fieldset = 60:2:100; % just 10 frames for publish-to-html end for i=fieldset if ~interactive figure('Position',[0 0 1440 1080]); subplot('Position',[0 0 1 1]); end % get the next 4 fields assert(logical(seek(vr,i))); at = getframe(vr); ab = getnext(vr); bt = getnext(vr); bb = getnext(vr); % remove the pulldown progressive = ivtc(at,ab,bt,bb, pull23shift+i, [2 3 2 3]); % Show results if the pulldown removal doesn't indicate we should drop % this field. if ~isempty(progressive) imshow(progressive); t = sprintf('frame %d', i/2); title(t); ylabel(t); if interactive pause; end elseif ~interactive a = axis; text(a(1)+a(2)/2, a(3)+a(4)/2, ... {sprintf('Frame %d is being dropped', i/2),'(this is good)'}, ... 'FontSize',72, ... 'VerticalAlignment','middle', 'HorizontalAlignment','center'); end end
Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space
Cleanup
close all; if ~interactive vr = close(vr); end