AV Sync


Table of Contents

Video and Audio Syncing 
Video and Audio Syncing Problem: Why and How. 
35mm Film to NTSC Video Conversion 
Telecine in MPEG-2 Video 
Deinterlacing DVDs 
An example of 3:2 pulldown encoding 
A really weird 3:2 pulldown encoding!! 
2-3 Pulldown Explained 
2-3 Pulldown 
2-3 Pulldown vs. 3-2 
MPEG-2 pulldown 
How 3:2 pulldown works in MPEG-2 

Video and Audio Syncing 

http://www.doom9.org/index.html?/synch.htm

Video and Audio Syncing Problem: Why and How. 

Since the first release of Powerip in mid-1999, people have been experiencing the problem of determining the correct speed of video and audio when converting an NTSC mpeg-2 video/audio stream to any other format possible (e.g. mpeg-1, avi, asf, or divx) to get a perfect video and audio syncing.

This video and audio syncing problem is the result of an incorrect conversion of the mpeg-2 video stream (either using Powerip, mpeg2avi or any other conversion utility out there). This document is not meant to discard Squeezer or Flask, but it is in fact can be considered as a support so PERHAPS, the explanation can be applied to perfect-ize both Squeezer or Flask — or even the AGrabber plugin. To be of note, there have been a lot of successful "synced" conversion made using utilities such as "SQUEEZER" and "FLASK". But there are some cases, where none of the conversion utilities produce a total "synced" video and audio.

Why? Let's see the process of transferring a 35mm film format to an NTSC video format, to see the root of this evil.

35mm Film to NTSC Video Conversion 

Movie is usually made on a 35mm Film Negative. This format has a 24 FRAME per second speed. A Frame is the smallest unit of a FILM format. NTSC Video is a "field-based" format of 59.94 FIELD per second. A Field is the smallest unit in Video format. 2 Fields made up into 1 FRAME. So, this 59.94 FIELD per second equals 29.97 FRAME per second. Now we can see the difference. 1 second in FILM (24 frame) is NOT equal to 1 second in NTSC Video (29.97 frame).

To be able to "match" the speed of an NTSC Video, conversion from a FILM format to an NTSC Video format undergone a process called "2:3 pulldown" or TELECINE. This process, in its simplest term, means "to add 6 frames so that a 24 fps becomes 30fps — which is VERY close to 29.97fps". The problem that rises when doing this TELECINE transfer, is to decide WHICH 6 FRAMES to be added - or REPEATED?

Some kind of community of film/moviemaker/videomaker/engineers created a STANDARDIZATION of this TELECINE conversion. Since a Video FRAME consist of 2 Fields, why not make the FILM format into Field first, so that the smallest unit of both formats is the same? Let's see the process:

Telecine in MPEG-2 Video 

In an Mpeg-2 Video, storing a 30fps frames in 1 second will create a much bigger files than storing a 24 frames. If you do your calculation, a 1 second of 24 frames is 20% SMALLER in SIZE than 1 second of 30fps. But, as we have already discussed, NTSC video should be 29.97fps. It would mean that ALL movies that's created from 35mm FILM should be TELECINED, then ENCODED to 29.97fps Mpeg-2 Video stream, right? ….. NO!

A good thing about Mpeg-2 Video is that it can contain some FLAGS or PROGRAMMING, that would tell a SOFTWARE or HARDWARE to perform a TELECINE when playing the Video. Since the INTERLACED FRAMES that made-up the 29.97fps is a REPEATED field(s), it is REDUNDANT, and TRASHABLE. Just let the FLAGS tells the player to perform the TELECINE. Really, it CAN do that ;). The benefit of this that the movie CAN be stored in its original 24 FRAME per second, and thus SAVE 20% of total filesize!.

The FLAGS related to this are: REPEAT_FIRST_FIELD, TOP_FIELD_FIRST. The rules of applying these FLAGS follows the STANDARDIZATION. So you don't have to worry about the process not meeting the standard :). Let see some example:

Adding T_F_F and R_F_F Flags 

As the we can see, a Value of 1 for both T_F_F and R_F_F will ORDER the player to DISPLAY FRAME A in a sequence of Atop Abottom Atop, and the Value of 0 both T_F_F and R_F_F will ORDER the player to display FRAME B in a sequence of Bbottom Btop.

When T_F_F is 0 and R_F_F is 1 (FRAME C), the player will display FRAME C in a sequence of Bbottom Btop Bbottom and so forth. Since it is a STANDARDIZED conversion, we can see a repeating Value of T_F_F and R_F_F as the following:

T_F_F sequence: 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
R_F_F sequence: 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

So, now we have an Mpeg-2 Video stream CONTAINING 24 FRAMES per second and TFF and RFF flags in action. This will create a CONFLICT between 24 fps versus 30fps and the VERBATIM 29.97fps NTSC Video standard. To solve this, there are 2 other advantages of Mpeg-2 Video stream than can be applied, the FPS flag and the DROP_FRAME flag.

When the FPS flag value is PROGRAMMED in the header of an Mpeg-2 Video stream, it will ORDER the player to PLAY this Video stream at an exact SPEED. So, if the FPS flag is set as 29.97fps, the Video stream will play at exactly 29.97 frames per second.

When the DROP_FRAME flag value is 1, it will ORDER the player to REMEMBER that the 00 and 01 frames are dropped at the start of each minute except minutes which are even multiples of 10. The result is much the same as applying the 29.97fps value.

So, THAT is how we make an Mpeg-2 NTSC video stream as 24 FRAME stored, but 29.97fps playback speed. Now that we understand the the process, we are ready to REVERSE it, in order to achieve total Video and Audio syncing when converting BACK from a 24-stored-29.97-fps Mpeg-2 Video stream into any video format we want.

How? Let start with "mpeg2avi", an utility that converts an Mpeg-2 Video stream into .avi format (with codecs of your choosing).