DVD Authoring, Spec-wise

AUTHORING GUIDES -> Introduction to DVD authoring (and FAQ)

http://www.digitalfaq.com/dvdguides/authorburn/intro.htm#audio

The authoring process is best described as the organizing and burning stage of making a DVD. This is when you make menus and arrange the video and audio files. The authoring software then takes all your audio/video/subtitles/ menus creates the VOB, BUP and IFO files found on DVDs. This is followed by the final burning stage.

Because most authoring programs include a burning engine, it is normally streamlined into a single process. Most authoring guides on this site also give information on how to burn in the same program. However, I prefer to author DVD files to a folder on the hard drive for testing. Testing is done in PowerDVD using DVD-on-HDD mode. After testing verifies, then the authored folder is burned to disc in Nero.

This guide has four parts: (1) Authoring Basics, (2) Video/Audio DVD Specs, (3) Audio FAQ, and (4) Multiple VTS FAQ

Authoring basics

DVD-Video MPEG-2

Resolutions

= NTSC (4:3): 352x240, 352x480, 704x480, 720x480
= NTSC (16:9 widescreen): 704x480, 720x480
= PAL (4:3) 352x288, 352x576, 704x576, 720x576
= PAL (16:9 widescreen): 704x576, 720x576

Video bit-rates

Up to 10.08Mb/s total combined bitrate. Up to 9.8Mb/s max video bit-rate. CBR, CVBR, or VBR

Audio specs

= AC3 Dolby Digital stereo or surround. Average AC3 stereo is 192-384k. Average surround is 448k or higher.
= LPCM uncompressed 1536k WAV/AIFF.
= DTS, same bit-rate as AC3.
= MPEG Layer II (MP2) stereo, 192-256k bit-rate, not officially supported in the spec

What is a VOB file? IFO/BUP files?

A DVD uses VOB files as containers that hold the video and audio data and IFO (InFOrmation files) store the navigation data. The BUP is a BackUP of the IFO files, and must be on the discs. The discs are composed of a VIDEO TS folder and an AUDIO TS folder, although the audio folder is unused. Both the audio and video are stored in the VOB files that are present in the VIDEO TS folder. Your video and audio files are often "imported" into an authoring application and referred to as "assets".

ENCODE YOUR FILES SEPARATELY!!! Authoring applications were never meant to encode audio or video. They were meant to author a disc. The encoding function found in most authoring programs is not very good and was an afterthought. It only exists for lazy newbies.

Never let an authoring package encode your video. It is an authoring application, not an encoder. While many authoring programs have the ability to encode your video for you, it is often not good quality and gives you no control over the size, resolution, bit-rates and important factors that determine the quality of the disc. Bad authoring encoders include DVDit!, DVD Workshop, NeoDVD, MyDVD, and several other Ulead products, just to name a few.

Never let an authoring package encode non-AC3 audio. With few exceptions, the same rules for video apply to audio. Again, authoring applications were meant to author, not encode. The only exception I find acceptable is when using TMPGEnc DVD Author to convert VCD to DVD, it is easier to allow the program to convert the 44.1kHz MPEG Layer-II audio to 48kHz, for fear of losing audio sync. Although TMPGEnc is not the best audio encoder, in this situation, this is the lesser of two evils. If the program has a true AC3 encoder, like DVDit! PE or DVD Workshop AC3, then feel free to let it encode the AC3.

Video and audio specifications for DVD

Video files must conform to MPEG specifications allowed by the DVD format. This means you can ONLY use MPEG-1 or MPEG-2 files that meet the specifications, not AVI or Divx or whatever else non-MPEG files you may have. And even then, the MPEG must have an allowed resolution, bit-rate, GOP size and sequences headers as required.

|            |              |              |               |               |
|Video format|File Format   |Resolutions   |Video bit-rates|Audio specs    |
|            |              |              |               |               |
|DVD-Video   |MPEG-2,       |= NTSC (4:3): |Up to 10.08Mb/ |= (1) AC3 Dolby|
|            |sequence      |352x240,      |s total        |Digital stereo |
|            |headers at    |352x480,      |combined       |or surround.   |
|            |each GOP, 4:2:|704x480,      |bitrate. Up to |Average AC3    |
|            |0, MP@ML      |720x480       |9.8Mb/s max    |stereo is 192- |
|            |              |= NTSC (16:   |video bit-rate.|384k. Average  |
|            |              |9 widescreen):|CBR, CVBR, or  |surround is    |
|            |              |704x480,      |VBR            |448k or        |
|            |              |720x480       |               |higher.        |
|            |              |= PAL (4:3)   |               |= (2) LPCM     |
|            |              |352x288,      |               |uncompressed   |
|            |              |352x576,      |               |1536k WAV/     |
|            |              |704x576,      |               |AIFF.          |
|            |              |720x576       |               |= (3) DTS, same|
|            |              |= PAL (16:    |               |bit-rate as    |
|            |              |9 widescreen):|               |AC3.           |
|            |              |704x576,      |               |=  (4) MPEG    |
|            |              |720x576       |               |Layer II (MP2) |
|            |              |              |               |stereo, 192-   |
|            |              |              |               |256k bit-rate, |
|            |              |              |               |not officially |
|            |              |              |               |supported in   |
|            |              |              |               |the spec       |
|            |              |              |               |               |
|DVD-Video   |MPEG-1,       |= NTSC (4:3): |Between        |Same audio spec|
|            |sequence      |352x240       |1.150Mb/s and  |as MPEG-       |
|            |headers at    |= PAL (4:3):  |1.856Mb/s CBR  |2 version      |
|            |each GOP, 4:2:|352x288       |video bitrate  |               |
|            |0, MP@ML      |              |               |               |

The video must also be all PAL or all NTSC. Multi-format discs are not supported, and most authoring applications prevent this mistake from being made. DVD players do not support playback of such discs.

Authorware variances. Each authoring program has it's own preferences and rules regarding the file types that are required and/or allowed for use. Your video and audio files must conform to the rules of the program, and this information is often found in the program manual or help files. It is imperative that you read the help files and manual of your authoring program.

Example One: DVDit! PE. The DVDit! line of software requires that all files be the same resolution and format. This program does not support multiple VTS (see the multiple VTS authoring guide for more information on this topic). It only support 352x240 MPEG-1 between 1150k and 1856k and MPEG-2 at either 352x480 or 720x480 with bit-rates of up to 8000k for the video. It does not allow 352x240 or 704x480 video. It does not allow MPEG audio, only AC3 and PCM (WAV or AIFF only) sound files. It also requires closed GOP for the MPEG. It will not accept non-MPEG files. It requires video and audio to be imported separately.
Example Two: TMPGEnc DVD Author: This program allows pretty much anything in the MPEG specification. It will not accept non-MPEG files. This program also allows multiple VTS.
Example Three: Ulead DVD Workshop: This program will take MPEG source without transcoding it, and will accept multiplexed MPEG streams, a rarity for mid-level to high-level authoring software. It does motion menus and multi VTS. It will accept some non-MPEG files and will transcode them to MPEG-2 format.

Audio FAQ: AC3 vs. PCM vs. MP2

The DVD format supports three types of audio: Dolby Digital (AC-3), uncompressed PCM, and MPEG Layer II. Each of them has advantages and disadvantages. All audio must be a minimum 48hz and stereo. It can also be surround sound. The most popular of them is AC3, as it is a small file that retains high quality. Dolby sound also allows for surround sound, most commonly the Dolby Digital 5.1 sound scheme.

AC3 audio

Dolby Digital Audio is a highly compressed audio format stored in an AC3 file. Dolby can be stereo or surround, and has allowable stereo bit-rates from 128k to 384k. Most commercials DVDs using stereo use 192k or 224k audio, having come from perfect sources and using hardware encoding. For home use, I suggest the 256k bit-rate in order to retain rich sound, especially if it is converted from AVI or MPEG captures.

Surround sound must have at least 6 separate source channels. Do not use 5.1 unless you have a surround source. Taking a stereo or mono audio file and forcing it into Dolby 5.1 format will only waste space and provide no advantages. Most surround systems are able to emulate surround by translating the audio and feeding it to all the speakers, which is essentially the same as converting a 2/0 file into a 5/1 file.

Advantage: size and quality.
Disadvantage: none, really.

PCM audio

Uncompressed PCM audio is often stored as WAV or AIFF sounds files. PCM is merely uncompressed audio, and is enormous in size, often 10 times the file size of MP2 or AC3. Do not let the term "uncompressed" fool you, as most of the compression is being done on frequencies and information outside the range of human hearing. Compressed AC3 and MP2 audio can sound just as good as PCM, at a lower cost (in terms of disc space). In general, much like AVI video, the PCM audio format is honestly only good for editing. Final DVD audio should be AC3 if at all possible. Only leave it as PCM if final disc size is unimportant or if unusual distortion occurs from AC3 or MP2 compression. The bit-rate of PCM sound is set at 1536k or a close approximate.

Advantage: quality
Disadvantage: size

MPEG Layer II audio

The MPEG Layer II format (commonly using the .MP2 or .MPA file extension) is not the same as the MPEG Layer III (MP3) audio format. While both forms of MPEG audio, they are not the same. DVD and CD-based video does not use MP3 audio.

For NTSC video, MPEG audio is not officially supported. This being said, also realize most DVD players can playback the MPEG audio. If the player can play S/X/VCD formats, then it will most likely playback the MPEG Layer II. Most MPEG video chips are also hard-coded to play the audio. For PAL users, MPEG Layer II is currently supported, though recent shifts in the DVD Forum has hinted that this will change in the future.

The biggest advantage to leaving the audio as MP2 is to preserve it's quality, assuming it was clean sounding from the beginning. Encoding to PCM merely makes the file larger and encoding to AC3 can potentially harm the quality, especially if the MP2 was a low bitrate. In general, using 256k 48hz is optimal. And 192k is minimum. Most X/S/VCD formats used 224k.

Advantage: size and ease of conversion from XVCD/SVCD/CVD/VCD Disadvantage: quality and player support

Multiple VTS

The VOB and IFO files on a DVD are stored with a VTS naming structure. VTS stands for Video Title Set, though it is often referred to as "tracks".

Example: VTS 01 0, VTS 01 1, VTS 01 2, VTS 02 0, VTS 02 1, VTS 03 0, VTS 03 1, etc The first number following the VTS, as in "VTS xx" is the VTS identification number, whereas VTS 01 05 would be VTS one part five. Each VTS can only hold one video format. All video within that VTS must have the same aspect ratio, resolution and MPEG type.

Why is this important, you ask? Well, most consumer authoring software only allows one VTS for the whole disc. This means you are more limited in what your project can contain. It is important to remember this if your discs will contain multiples video, as they must all have the same specs. Multiple VTS does not require such limitations.

Advantages of multiple VTS. The advantages of multiple VTS are simple: it allows greater control over the content of the disc. The main movie can be 720x480 16:9 ratio MPEG-2 video, the disc bonus can be smaller 352x480 4: 3 MPEG-2 and the trailers can be 352x240 MPEG-1 video. Or an episode disc can use several sources at different size and format. Or VCD material can be added to an existing DVD with no conversion being required. The possibilities of using multiple VTS are great. This also allows MPEG-1 and MPEG-2 to be put on the same disc.

Multiple VTS software. The general rule is that professional software supports multiple VTS and that consumer software does not. - Home software. Most home-use software is single VTS software. The only real exception is TDA, which calls these "tracks". All you have to do is add a track and then a new VTS is created. Some of the newest mid-level authoring software like DVD Workshop 2 and DVDit! 5 also supports multi VTS. - Professional software. Most professional software, like DVD Studio Pro, Sonic Scenarist, Sonic Maestro, SpruceUp and others, supports multiple VTS.

Please read the manual, the software help files or online support file in order to learn whether or not your software supports multiple VTS.

Page Last Updated: May 17th 2005

A Quick Guide to Digital Video Resolution and Aspect Ratio Conversions

http://www.iki.fi/znark/video/conversion/

Introduction
The Connection Between the Analog and the Digital
A Conversion Table for Digital Video Formats
Frequently Argued Questions
Related Links

Recent updates

28-Feb-2004

Added the 544x576 resolution (as per DVB specifications) to the 625/50 table

22-Feb-2004

Due to popular demand, I have now finished a major revamp of the conversion table as regarding to the 525/59.94 systems.

6-Apr-2002

Initial publication date

1. Introduction

There is a fair number of mind-blowing, scary oddities and secrets in the world of digital video.

One of the very first a beginner will usually encounter is the fact that in digitized video data, pixels are often not considered "square" in their form. In most real-world digital video applications pixels have a width/ height ratio — oraspect ratio, as it is more conveniently called — that can be something completely different from 1/1!

The second great revelation usually comes when one runs into the concept of anamorphic 16:9 video for the very first time. If it was initially hard to grasp the idea of pixels changing their shape when displayed in different environments, this one is even more baffling: the very same pixel resolution you have only just learned to associate with 4:3 displays can now suddenly represent another, totally different image geometry. In other words, the pixels have changed their shape again!

Unfortunately, these two are often the only things most ordinary people will ever learn about digital video and aspect ratios.

1.1 The dirty little secret revealed

Tutorials and manuals usually tend to keep very quiet and secretive about the finer technical details of digital video, particularly when it comes to the topic of (pixel) aspect ratios and image geometry.

Even if converting (resampling) video clips to other resolutions is discussed, the accompanying explanation is usually troublingly simplistic and vague — often inaccurate and misleading — and sometimes the suggested methods are just plain wrong. It is not uncommon that the examples only deal with arbitrarily chosen ("x pixels by y pixels") frame dimensions and use ideal frame aspect ratios such as 16:9 or 4:3 as the basis for calculations — not the actual pixel aspect ratios — which is usually a good indicator that the writer may not actually take the real image geometry into account at all.

It is almost as if the whole aspect ratio issue was considered some sort of dirty little secret of the video industry; black magic you could not even begin to explain to mere mortals in reasonable terms. This is a shame. In this case, there is really more to it than meets the eye. Confusing people with incomplete and watered-down explanations does not do any good to the industry.

Now that you have read this far, it is time to reward your effort with The Third Big Revelation about aspect ratios and frame sizes - the one that is usually left unsaid:

Not a single oneof the commonly used digital video resolutionsexactlyrepresents the actual 4:3 or 16:9 image frame.

Shocking, isn't it? 768x576, 720x576, 704x576, 720x480, 704x480, 640x480… none of them is exactly 4:3 or 16:9; not even the ones you may conventionally think as "square-pixel" resolutions.

So there. Now you finally know the truth. Let's find out what it actually means.

2. The Connection Between the Analog and the Digital

Digital video standards do not live outside the realm of analog world. On the contrary, all commonly used modern (SDTV) digital video formats have a well-defined relationship with their counterparts in analog video standards. You could really say they have their roots in analog soil.

And now, my friend, we are rapidly closing to The Fourth Big Revelation:

It is really theanalogvideo standards that define the image geometry and pixel aspect ratio in digital formats.

Even if you did all of your video work solely in digital domain, those pesky old analog video standards still define the shape of your images and pixels.

How come?

From the video industry's point of view, the current (SDTV, as opposed to HDTV which is another kettle of fish) digital video formats - those that actually get used in practical real-life applications such as DVD, DV, VCD, SVCD, digital television etc. - are all about interoperability. At the advent of digital video - late 1970's, when committee work was started on CCIR 601 (later to become ITU-R BT.601) - there was already a vast catalog of analog video material in formats defined solely by analog standards. What is more, enormous amounts of money had been poured in analog studio equipment such as cameras, video switchers, proc amps, tape decks and other tools of trade. What a waste it would have been if the "next generation" digital video formats were designed in a such way they had absolutely nothing in common with old analog formats, and required ditching all the analog equipment!

It was clear from the beginning that the industry wanted a smooth, well- defined transition path between the current analog systems and the brave new digital world without running into too many compatibility issues. It was also considered necessary to be able to freely mix and match digital and analog equipment. The result was that the digital (SDTV) video formats we now use are based on the concept of digitizing old, analog video signals, thus interlocking to the analog video standards.

This connection between the digital and analog domains is permanent. Some of the fundamental features of digital video, such as image geometry, are actually defined in the analog standards. Even if we go all-digital, the relationship is still there, as long as we use either ITU-R BT.601 pixels or "industry standard" square pixels.

2.1 What does it mean?

There are three basic sampling rates from which almost all modern digital video formats are derived:

13.5 MHz ITU-R BT.601 (aka CCIR 601 aka Rec. 601) non-square pixels for
         both 625/50 and 525/59.94 systems. This sampling rate was
         originally designed for digitizing component video signals.
         Now used extensively in almost all modern digital video gear.

14.75 MHz "Industry standard" square pixels for 625/50 systems.
          Originally designed for digitizing composite video signals.

12 + 3/11 MHz SMPTE 244M "industry standard" square pixels for 525/59.94 systems. Originally designed for digitizing composite video signals.

Let's see how this works out with 13.5 MHz and both 525/59.94 and 625/50 systems:

If you have the B/W (luminance) part of a component video signal in a coaxial cable, you can plug in an A/D converter and start metering (sampling) the voltage level in the cable at regular intervals.

ITU-R BT.601 defines a standard sampling rate for both 625/50 and 525/ 59.94 video signals: 13.5 MHz
13.5 MHz will give you a total of 13,500,000 samples per second, but we are only interested in sampling the parts of the signal that actually contain image information. The parts of the signal spent in horizontal or vertical blanking are of no interest to us, and can be omitted.

625/50 systems have a line length of 64 us, of which 52 us is the "active" part that contains actual image information. (The rest is reserved for horizontal blanking.)

o 52 us x 13.5 MHz = 702 samples (pixels) per scanline
o In the vertical direction, there are 574 complete scanlines and 2 half
  lines. Even the half lines get digitized as if their "missing" other
  half belonged to the active picture, giving a total of 576 scanlines.
o Thus, the active image area at 13.5 MHz sampling is 702x576 pixels. This
  is the actual area that forms the 4:3 (or anamorphic 16:9) frame.

525/59.94 systems have a line length of 63+5/9 (63.555…) us, of which 52+59/90 (52.6555…) us is the "active" part that contains actual image information. (The rest is reserved for horizontal blanking.)

o 52+59/90 us x 13.5 MHz = 710.85 samples (pixels) per scanline.
o In the vertical direction, there are 484 complete scanlines and 2 half
  lines. As above, all of them get digitized and half lines will be
  treated as if their missing other half belonged to the active picture,
  giving a total of 486 scanlines.
o Thus, the active image area at 13.5 MHz sampling is 710.85x486 pixels.
  This is the actual area that forms the 4:3 (or anamorphic 16:9) frame.
o However, we cannot use partial pixels in any practical video work.
  Therefore, the number 710.85 needs to be rounded up to 711, and we get a
  711x486 pixel frame instead.
o 711 samples equals to 52+2/3 (52.666...) us at 13.5 MHz, so the rounded-
  to-the-nearest-pixel active area is a little bit wider than it ideally
  ought to be. Fortunately, the difference of 0.0111... us is (for all
  practical purposes) insignificant, and well within the tolerances of
  NTSC-M specifications.

It also works the same way for square-pixel sampling rates. You will just get a different number of horizontal samples. The calculations are left as an exercise to the reader.

2.3 I am already lost!

If you did not understand a word of the above, you might want to take a look at the following introductory links:

A Note on CCIR / PAL-B Video Standard
Basics of Video
Conventional Analog Television - An Introduction
The 625/50 PAL Video Signal and TV Compatible Graphics Modes.

Also see the Related Links section.

3. A Conversion Table for Digital Video Formats

The following is a frame size and aspect ratio conversion table, representing many commonly used digital video formats:

|                                                                             |
|The formats related to 625-line systems with a 50 Hz field rate              |
|            |         |      |         |              |           |          |
|sampling    |         |pixel |sampling |actual active |           |          |
|matrix      |sampling |aspect|matrix   |picture size  |supports   |          |
|     |      |rate     |ratio |width in |       |      |interlacing|notes     |
|width|height|(MHz)    |(x/y) |us       |width  |height|           |          |
|     |      |         |      |         |       |      |           |          |
|     |      |         |      |         |       |      |           |          |
|     |      |         |      |         |       |      |           |"Industry |
|     |      |         |      |         |       |      |           |standard" |
| 768 |  576 |   14.75 | 768/ |52.06780 |   767 |  576 |       Y   |625/50    |
|     |      |         | 767  |         |       |      |           |square-   |
|     |      |         |      |         |       |      |           |pixel     |
|     |      |         |      |         |       |      |           |video     |
|     |      |         |      |         |       |      |           |          |
|     |      |         |      |         |       |      |           |"True"    |
|     |      |14 + 10/ |      |         |       |      |           |computer  |
| 768 |  576 |   13(2) |  1/1 |52.00000 |   768 |  576 |       Y   |square-   |
|     |      |         |      |         |       |      |           |pixel     |
|     |      |         |      |         |       |      |           |resolution|
|     |      |         |      |         |       |      |           |          |
| 768 |  560 |   14.75 | 768/ |52.06780 |   767 |  576 |       Y   |CD-i(3)   |
|     |      |         | 767  |         |       |      |           |          |
|     |      |         |      |         |       |      |           |          |
|     |      |         | 128/ |         |       |      |           |D1, DV,   |
| 720 |  576 |   13.5  | 117  |53.33333 |   702 |  576 |       Y   |DVB, DVD, |
|     |      |         |      |         |       |      |           |SVCD(3)   |
|     |      |         |      |         |       |      |           |          |
|     |      |         |      |         |       |      |           |Oddball   |
|     |      |         |      |         |       |      |           |compromise|
|     |      |         |      |         |       |      |           |format.   |
|     |      |         |      |         |       |      |           |Better to |
| 720 |  540 |ambiguous|  1/1 |ambiguous|   720 |  540 |       N   |avoid     |
|     |      |         |      |         |       |      |           |unless you|
|     |      |         |      |         |       |      |           |really    |
|     |      |         |      |         |       |      |           |know what |
|     |      |         |      |         |       |      |           |you are   |
|     |      |         |      |         |       |      |           |doing.    |
|     |      |         |      |         |       |      |           |          |
|     |      |         | 128/ |         |       |      |           |DVD, H.263|
| 704 |  576 |   13.5  | 117  |52.14815 |   702 |  576 |       Y   |(4CIF),   |
|     |      |         |      |         |       |      |           |VCD(3)    |
|     |      |         |      |         |       |      |           |          |
|     |      |         |      |         |       |      |           |Active    |
|     |      |         |      |         |       |      |           |picture   |
|     |      |         |      |         |       |      |           |frame for |
| 702 |  576 |   13.5  | 128/ |52.00000 |   702 |  576 |       Y   |625/50    |
|     |      |         | 117  |         |       |      |           |systems in|
|     |      |         |      |         |       |      |           |ITU-      |
|     |      |         |      |         |       |      |           |R BT.601- |
|     |      |         |      |         |       |      |           |4 pixels. |
|     |      |         |      |         |       |      |           |          |
|     |      |         |      |         |       |      |           |DVB (3/   |
|     |      |         | 512/ |         |       |      |           |4 of      |
| 544 |  576 |  10.125 | 351  |53.72840 |526+1/2|  576 |       Y   |BT.601    |
|     |      |         |      |         |       |      |           |sampling  |
|     |      |         |      |         |       |      |           |rate)     |
|     |      |         |      |         |       |      |           |          |
|     |      |         |      |         |       |      |           |SVCD (2/  |
|     |      |         |      |         |       |      |           |3 of      |
| 480 |  576 |      9  |128/78|53.33333 |   468 |  576 |       Y   |BT.601    |
|     |      |         |      |         |       |      |           |sampling  |
|     |      |         |      |         |       |      |           |rate)     |
|     |      |         |      |         |       |      |           |          |
|     |      |         |      |         |       |      |           |1/4 of    |
| 384 |  288 |   7.375 | 768/ |52.06780 |383+1/2|  288 |       N   |"industry |
|     |      |         | 767  |         |       |      |           |standard" |
|     |      |         |      |         |       |      |           |768x576   |
|     |      |         |      |         |       |      |           |          |
| 384 |  280 |   7.375 | 768/ |52.06780 |383+1/2|  288 |       N   |CD-i      |
|     |      |         | 767  |         |       |      |           |          |
|     |      |         |      |         |       |      |           |          |
| 352 |  576 |   6.75  | 256/ |52.14815 |   351 |  576 |       Y   |DVD       |
|     |      |         | 117  |         |       |      |           |          |
|     |      |         |      |         |       |      |           |          |
|     |      |         |      |         |       |      |           |VCD, DVD, |
| 352 |  288 |   6.75  | 128/ |52.14815 |   351 |  288 |       N   |H.261 +   |
|     |      |         | 117  |         |       |      |           |H.263     |
|     |      |         |      |         |       |      |           |(CIF)     |
|     |      |         |      |         |       |      |           |          |
|     |      |         | 128/ |         |       |      |           |H.261 +   |
| 176 |  144 |   3.375 | 117  |52.14815 |175+1/2|  144 |       N   |H.263     |
|     |      |         |      |         |       |      |           |(QCIF)    |
|                                                                             |
|The formats related to 525-line systems with a 59.94(1) Hz field rate        |
|            |         |      |         |              |           |          |
|sampling    |         |pixel |sampling |actual active |           |          |
|matrix      |sampling |aspect|matrix   |picture size  |supports   |          |
|     |      |rate     |ratio |width in |       |      |interlacing|notes     |
|width|height|(MHz)    |(x/y) |us       |width  |height|           |          |
|     |      |         |      |         |       |      |           |          |
|     |      |         |      |         |       |      |           |          |
|     |      |         |      |         |       |      |           |Oddball   |
|     |      |         |      |         |       |      |           |compromise|
|     |      |         |      |         |       |      |           |format.   |
|     |      |         |      |         |       |      |           |Better to |
| 720 |  540 |ambiguous|  1/1 |ambiguous|   720 |  540 |       N   |avoid     |
|     |      |         |      |         |       |      |           |unless you|
|     |      |         |      |         |       |      |           |really    |
|     |      |         |      |         |       |      |           |know what |
|     |      |         |      |         |       |      |           |you are   |
|     |      |         |      |         |       |      |           |doing.    |
|     |      |         |      |         |       |      |           |          |
| 720 |  486 |   13.5  |4320/ |53.33333 |710.85 |  486 |       Y   |D1        |
|     |      |         | 4739 |         |       |      |           |          |
|     |      |         |      |         |       |      |           |          |
| 720 |  480 |   13.5  |4320/ |53.33333 |710.85 |  486 |       Y   |DV, DVB,  |
|     |      |         | 4739 |         |       |      |           |DVD, SVCD |
|     |      |         |      |         |       |      |           |          |
|     |      |         |      |         |       |      |           |Active    |
|     |      |         |      |         |       |      |           |picture   |
|     |      |         |      |         |       |      |           |frame for |
| 711 |  486 |   13.5  |4320/ |52.66667 |710.85 |  486 |       Y   |525/59.94 |
|     |      |         | 4739 |         |       |      |           |systems in|
|     |      |         |      |         |       |      |           |ITU-      |
|     |      |         |      |         |       |      |           |R BT.601- |
|     |      |         |      |         |       |      |           |4 pixels. |
|     |      |         |      |         |       |      |           |          |
| 704 |  486 |   13.5  |4320/ |52.14815 |710.85 |  486 |       Y   |          |
|     |      |         | 4739 |         |       |      |           |          |
|     |      |         |      |         |       |      |           |          |
| 704 |  480 |   13.5  |4320/ |52.14815 |710.85 |  486 |       Y   |ATSC, DVD,|
|     |      |         | 4739 |         |       |      |           |VCD(3)    |
|     |      |         |      |         |       |      |           |          |
|     |      |         |      |         |       |      |           |"True"    |
|     |      |         |      |         |       |      |           |computer  |
|     |      |   12 +  |      |         |       |      |           |square-   |
| 648 |  486 |   1452/ |  1/1 |52.65556 |   648 |  486 |       Y   |pixel     |
|     |      | 4739(2) |      |         |       |      |           |resolution|
|     |      |         |      |         |       |      |           |(all 486  |
|     |      |         |      |         |       |      |           |active    |
|     |      |         |      |         |       |      |           |scanlines)|
|     |      |         |      |         |       |      |           |          |
|     |      |         |      |         |       |      |           |D2:       |
|     |      |         |      |         |       |      |           |"industry |
|     |      |         |4752/ |         |646+5/ |      |           |standard" |
| 640 |  480 |12 + 3/11| 4739 |52.14815 |  22   |  486 |       Y   |525/59.94 |
|     |      |         |      |         |       |      |           |square-   |
|     |      |         |      |         |       |      |           |pixel     |
|     |      |         |      |         |       |      |           |video     |
|     |      |         |      |         |       |      |           |          |
|     |      |         |      |         |       |      |           |"True"    |
|     |      |   12 +  |      |         |       |      |           |computer  |
| 640 |  480 |   1452/ |  1/1 |52.00549 |   648 |  486 |       Y   |square-   |
|     |      | 4739(2) |      |         |       |      |           |pixel     |
|     |      |         |      |         |       |      |           |format    |
|     |      |         |      |         |       |      |           |(cropped) |
|     |      |         |      |         |       |      |           |          |
|     |      |         |      |         |       |      |           |SVCD (2/  |
|     |      |         |6480/ |         |       |      |           |3 of      |
| 480 |  480 |      9  | 4739 |53.33333 | 473.9 |  486 |       Y   |BT.601    |
|     |      |         |      |         |       |      |           |sampling  |
|     |      |         |      |         |       |      |           |rate)     |
|     |      |         |      |         |       |      |           |          |
| 352 |  480 |   6.75  |8640/ |52.14815 |355.425|  486 |       Y   |DVD       |
|     |      |         | 4739 |         |       |      |           |          |
|     |      |         |      |         |       |      |           |          |
| 352 |  240 |   6.75  |4320/ |52.14815 |355.425|  243 |       N   |VCD, DVD  |
|     |      |         | 4739 |         |       |      |           |          |
|     |      |         |      |         |       |      |           |          |
| 320 |  240 |6 + 3/22 |4572/ |52.14815 |   324 |  243 |       N   |1/4 of    |
|     |      |         | 4739 |         |       |      |           |640x480   |
|                                                                             |
|(1) 59.94 Hz is only a conventional approximation; the mathematically exact  |
|field rate is 60 Hz * 1000/1001.                                             |
|(2) A calculated sampling rate, represented here only for completeness. Does |
|not exist in actual 525/625 video equipment.                                 |
|(3) Only used for still images.                                              |

3.1 How to use the table for conversions

Let's assume you have a video clip in one format and wish to convert it to another, so that it remains in correct aspect ratio throughout the process.

Locate your source and target formats in the table.
Calculate the vertical conversion factor by using the following formula: vertical conversion factor = target active picture height / source active picture height. (Be sure to use the active picture values from the table, not the sampling matrix size values.)
- If vertical conversion factor is 0.5 and your source material is interlaced, you will probably need to deinterlace before resampling. (I recommend using a special smart deinterlacing algorithm, such as the one found in VirtualDub's Smart Deinterlacer filter.)
- If vertical conversion factor is anything other than 0.5, 1 or 2, you are probably trying to do a standards conversion between a 625/50 system and a 525/59.94 system. Standards conversion (when done right) is a highly demanding process and outside the scope of this document. I recommend reading The Engineer's Guide to Standards Conversion and The Engineers Guide to Motion Compensation from Snell & Wilcox Reference Library to get a grasp of the related issues. In short, merely converting the frame size and image aspect ratio is not enough - you would also have to take interlacing into account and correct any aliasing problems in temporal dimension (which means synthesizing new fields out of thin air using motion compensation algorithms.)
Calculate the horizontal conversion factor: horizontal conversion factor = (source aspect ratio) / (destination aspect ratio) * (vertical conversion factor)
Calculate the new horizontal size: target sampling matrix width = horizontal conversion factor * source sampling matrix width
Calculate the new vertical size: target sampling matrix height = vertical conversion factor * source sampling matrix height
Resample the image to the new size
Check if the new size matches the target resolution's sampling matrix dimensions. If not, crop (i.e. cut at the edges) and pad (i.e., add black borders) accordingly so that it will.

3.2 Some practical examples of the above

3.2.1 640x480 to 720x480

640x480 "industry standard" square pixels to 720x480 ITU-R BT.601 pixels

Let's say I have captured a video clip from 525/59.94 source using an old M- JPEG card that only allows sampling in "industry standard" (12 + 3/11 MHz) square pixel format. The resolution of the clip is 640x480. Now I would like to incorporate this into a DV project that uses ITU-R BT.601 pixels and a resolution of 720x480.

The first step is to look up the correct source and target formats from the table.
- In this case, the source format is 640x480 in a 525/59.94 system, using the sampling rate of 12 + 3/11 MHz and a pixel aspect ratio of 4752/ 4739.
- The target format is 720x480 (likewise in 525/59.94 system), using the sampling rate of 13.5 MHz and a pixel aspect ratio of 4320/4739.
The second step is to calculate the vertical conversion factor. In our case, it is 486/486 = 1
Now we need a horizontal rescaling factor, which in this case is (4752/ 4739) / (4320/4739) * 1 which equals to 11/10.
Then we can calculate the new image width from the old one: 11/10 * 640 = 704 pixels
The image height will stay unchanged, since 1 * 480 is still 480.
Thus, we need to resample the 640x480 image to 704x480.
However, our original target resolution was 720x480. Now we need to pad the image (with black vertical bars on the side) so that the frame width will become 720 pixels. A natural conclusion is that we need to add 8 pixels black to both side edges.

3.2.2 720x576 ITU-R BT.601 pixels to 720x480 ITU-R BT.601 pixels

In other words, a "PAL" to "NTSC" conversion:

Again, the first step is to look up the correct source and target formats from the table.
- In this case, the source format is 720x576 in a 625/50 system, using the sampling rate of 13.5 MHz and a pixel aspect ratio of 128/117.
- The target format is 720x480 in 525/59.94 system, using the sampling rate of 13.5 MHz and a pixel aspect ratio of 4320/4739.
We need to alculate the vertical conversion factor. In our case, it is 486/576 = 27/32
Now we need a horizontal rescaling factor, which in our case is (128/117) / (4320/4739) * (27/32) which equals to 4739/4680.
Then we can calculate the new image width from the old one: 4739/4680 * 720 = 729+1/13 pixels
The new image height will be 27/32 * 576 = 486 pixels.
Thus, we need to resample the 720x576 image to (729+1/13)x486. As we normally cannot use subpixel sampling, we must round the figure 729+1/13 to some reasonable number - in this case probably 729.
However, our original target resolution was 720x480. Now we need to crop the 729x486 image sufficiently from the edges so that the frame width will become 720 pixels and frame height 480 pixels.

4. Frequently Argued Questions

4.1 Isn't 720 the real width of a 4:3 image?

If not, then why are 720 pixels sampled instead of 711 or 702 (or whatever)?

720 pixels are sampled to allow for little deviation from the ideal timing values for blanking and active line lenght in analog signal. In practice, analog video signal - especially if coming from a wobbly home video tape recorder - can never be that precise in timing. It is useful to have a little headroom for digitizing all of the signal even if it is of a bit shoddy quality or otherwise non-standard.

720 pixels are also sampled to make it sure that the signal-to-be-digitized has had the time to slope back to blanking level at the both ends. (This is to avoid nasty overshooting or ringing effects, comparable to the clicks and pops you can hear at the start and end of an audio sample.)

Last but not least, 720 pixels are sampled because a common sampling rate (13.5 MHz) and amount of samples per line (720) makes it easier for the hardware manufactures to design multi-standard digital video equipment.

4.2 What does this mean, considering ITU-R BT.601 compliant equipment?

It means that the sampled horizontal range of the signal is a bit wider than the actual active image frame:

On 625/50 systems, only the centermost 702x576 pixels (of 720x576) belong to the actual 4:3 (or anamorphic 16:9) frame.
On 525/59.94 systems, only the centermost 710.85x486 pixels (of 720x486) belong to the actual 4:3 (or anamorphic 16:9) frame. (For practical video applications, 710.85 will have to be rounded up to 711 pixels.)

Yes, you understood correctly. 720x576 is not exactly 4:3, and neither is 720x480. The real 4:3 frame (as defined in the analog video standards) is a bit narrower than the horizontal range of signal that actually gets digitized.

Yes, it is the same for all generally available digitizing equipment; tv tuner cards, digital video cameras and such. It is true even for all-digital systems; otherwise they would not be compatible with ITU-R BT.601.

4.3 You must be kidding!

I am pretty sure there is a mistake in your calculations. It says everywhere that 720x576 or 720x480 really is 4:3. Please stop propagating this misinformation!

I admit that the figures presented on this web site are not very well-known facts even amongst professional videographers, not to mention hobbyists. Aspect ratio is one of the most misunderstood "black magic" issue in digital video. That is precisely why I constructed the web site in the first place - to share the knowledge.

As for my calculations; feel free to prove them wrong. For starters, you might want to read the documents in the Related Links section.

4.4 I have been doing digital video projects for the last 50 years.

I know my stuff! If you were correct, everything I have done to process my precious video has always been wrong, aspect-ratio wise!

That may very well be the sad truth. Fortunately, even if you had used wrong methods for scaling/resampling the image, the difference between the correct aspect ratio and a wrong aspect ratio is often small enough to go unnoticed unless you really start looking for it.

4.5 It still does not make any sense.

For starters, all the 525/59.94 equipment I have only works in 720x480, not in 720x486 (and definitely not in 711x486)! How do you explain that?

525/59.94 video signal has 486 active (image-carrying) scanlines, but modern digital video equipment usually crops 6 of them off. Why? To get the height of the image down to 480 pixels, which is neatly divisible by 16. See for yourself:

486 / 16 = 30.375 whereas
480 / 16 = exactly 30.

Also note that 720 / 16 equals exactly to 45 so the width of the image is divisible by 16, as well!

4.5.1 Why is it important to have the height and width of the raster image divisible by 16?

Modern digital video applications such as DV, DVD and digital television (DVB, ATSC) often use MPEG-1 or MPEG-2 formats (or their derivatives) which are all based on 16x16 pixel macroblocks. Having the height and width of the image readily divisible by 16 makes it easier and more efficient for an MPEG encoder to compress video.

4.5.2 Doesn't this mean that when capturing in 720x480, I will lose six scanlines worth of valuable information that was once present in the original video signal?

Correct, but the information might not have been that valuable in the first place. Most 525/59.94 video work is already done solely in the digital domain and in the 720x480 format, so there is usually nothing to digitize on those scanlines anymore. Moreover, in the good old days (when all of those 486 scanlines were still in active use) most of the time the edges only carried flickering VCR head noise.

The video image is masked by the overscan edges of a CRT based television, so you would not normally see the "missing" scanlines, anyway.

4.5.3 You keep saying the "real" 4:3 resolution is at about 711x486 for 525/59.94 systems. OK, maybe there really are 9 extra pixels on the sides, but how do I cope with the fact my equipment only records 480 active scanlines, not 486?

Think it this way:

First, you have a frame of 720x480 pixels.
There is another frame of 710.85x486 pixels, overlaid and centered on top of the first one. This frame represents the "real" 4:3 resolution in 525/ 59.94 systems. (In any practical real-world video application we would have to use 711 pixels, but 710.85 is the ideal, mathematically exact number.)
The parts of the first frame that go over the side edges of the second frame are excess space that is outside the actual active image area. You can put picture there in digital systems, but there is no guarantee it will survive on any analog system, or display on any CRT monitor, even in underscan mode.
The parts of the second frame that go over the top and bottom edges of the first frame are the cropped 6 scanlines. As you only have 480 scanlines at your disposal, you cannot put picture there, but aspect ratio wise this imaginary area counts as a part of the "real" 4:3 image.

There is also another way of thinking it:

Disregard the notion that 525/59.94 systems have traditionally had 486 active scanlines. Instead, think that the new standard is now 480 scanlines.
Now, your ideal 4:3 frame is 480 * (4/3) / (4320/4739) = 702 + 2/27 pixels. In real world, a minimum of 703 pixels would need to be sampled to convey all the information in the active part of the scanline.
703 is a nasty uneven number for computers. 704 is much better since it is divisible by 16 (again!)
Now you have something like a frame of 704x480 pixels, inside which lives an-approximately-702x480-frame, which in turn represents the real 4: 3 image area. But wait! 704x480 is a familiar number, isn't it? See the connection? It is used in VCD high-res still images and in ATSC digital television! How convenient!

The latter way of thinking will also lead to cropping off the side edges of the image to get it inside a 4:3 rectangle (albeit a bit smaller than the "real" one), but then again, if you are restricted to using 704x480, that decision has already pretty much been made for you.

4.6 What about standards conversion?

Doesn't PAL 720x576 exactly equal to NTSC 720x480?

As can be seen from the example in section 3.2.2, the answer is no. If you simply resample from 720x576 to 720x480, the analog active areas of the source and target formats will not match. Fortunately, there is a bit fool- proofness built-in to the relationship of these two frame sizes. What you will actually get from the process is an image in which the original analog active area (702x576 centermost pixels of 720x576) has become 702x480 in the target format's pixels. This, in turn, almost represents a 4:3 area, albeit a bit smaller than what would be needed for a perfect conversion.

The area that 702x480 covers is not the same as the actual analog active image frame (which would be 710.85x486, or, in practical terms, 711x486). It is more like a smaller 4:3 frame inside it.

In other words, the result is that the active 4:3 image frame in the source format has shrunk a bit in the conversion: it has lost six (target) scanlines in vertical direction and the same relative amount of width. However, for all practical purposes, it has still retained its original aspect ratio. The easiest way to see this is converting 702x480 (in 13.5 MHz 525-line ITU-R BT.601 format) to "true" square pixels: 639 + 4419/4739 square pixels by 480 scanlines is a close enough match to 640x480, which is 4:3. Wonderful coincidence, isn't it? :)

The same peculiar relationship applies to all 525/625 "sister resolutions" derived from 13.5 MHz:

704x576 vs 704x480
480x576 vs 480x480
352x288 vs 352x240
etc.

This holds true on two conditions:

The source sampling matrix width (in microseconds) must be exactly the same as the target's.
You can only convert between a full-height 625-line resolution and a cropped-height 525-line resolution (i.e. use only those formats that represent exactly 480 scanlines worth of 525/60 data, instead of full 486.)

As direct resampling involves shrinkage (or when going in another direction, enlargement), I cannot really recommend this method for any real standards conversion work. It is more like a quick hack, suitable for use e.g. if the software does not allow proper resizing and cropping.

Note: Many people use direct resampling for all the wrong reasons: 1) They think that a 720x480 frame directly equals to a 720x576 frame. 2) They also think that both aforementioned frame sizes represent exactly the active 4:3 (or 16:9) picture area, edge to edge. As you already know from Section 2.1, both of these assumptions are wrong. The fact that direct resampling works at all is mostly a quirky coincidence

4.7 What do you mean by saying it is better to avoid 720x540?

The problem with this resolution is that while you think you are editing in a format that is both 1) 4:3 square pixels and 2) easily convertable to a standard video resolution (either 720x576 or 720x480) just by vertical resampling, you are not. See the table. There is no real world video format that would use full 720 pixel horizontal range as the width of the active 4: 3 frame.

In order to get to a standard video format from this one, you need to take in account the actual form of the sampling matrices. The 4:3 area in 625- line formats is 702x576, not 720x576. In 525-line formats it is 711x486, not 720x480. Resizing a 720 pixels wide 4:3 format directly to 720x576 or 720x480 simply won't work. You will either have to resample in both directions (unlike you originally thought, you do not get to keep the image width neatly as 720 pixels at all times), or to crop some top and bottom lines off.

If you need to construct an intermediary square-pixel resolution that is a) exactly 720 pixels wide and b) covers exactly the same area as 720x576 or 720x480 (thus only having to resample in vertical direction for conversions), you will end up with two separate resolutions, one for each video standard:

The 720 pixels wide square-pixel equivalent of 720x576 (ITU-R BT.601 pixels for 625-line systems) is 720x526.5 pixels
The 720 pixels wide square-pixel equivalent of 720x480 (ITU-R BT.601 pixels for 525-line 4739/9systems) is 720x526 + 5/9pixels

Fortunately, the numbers will nicely round up to 720x527 for both standards.

Note that the original interlaced field structure (if any) will go haywire as you mess around scaling in the vertical direction.

4.8 Why does your table list two slightly different definitions for square pixels?

"Square pixels", as digitized by a TV tuner or an M-JPEG card, are not exactly square. The "industry standard" sampling rates used in square-pixel video equipment actually give out pixels that are almost square, but not exactly. As you can see for yourself in the table, the difference is very small - for all practical purposes meaningless - but it is still useful to know that sampled "video" square-pixels differ a bit from ideal "computer" square pixels.

Converting "computer" square pixels to "video" square pixels is usually a futile effort. You will not see the difference, anyway, and probably only lose some quality in the interpolation process.

4.9 This is really scary and nasty stuff.

I thought digital video was simple! Now my head hurts!

But that's just the way video is. Fortunately, the conversions are not really that complicated once you practice them a little.

4.10 I think you're just nit-picking.

No-one will ever notice if I consider all "4:3" video formats just 4:3, without doing any complicated aspect ratio or "active image area" calculations.

Feel free to process your video just the way you like it. But there are still many people who would like to get as close to the ideal aspect ratio correctness as possible, instead of only using rough "ballpark figures" in their video work.

4.11 Help! My capture card does not seem to do it this way!

You may be correct. The professional video gear is very strict about conforming to the ITU-R BT.601 standard, and you can also generally trust DV camcorders and DVD players/recorders using the correct sampling rates and pixel clocks. However, the PC hardware market is different: cheap mass- marketed tv tuner cards and "tv out" cards ofter seem to have these design flaws and inaccuracies in their drivers: sometimes they are using the common, industry-standard frame formats (such as 720x480) with sampling rates that are just plain wrong or sufficiently off the mark to create problems.

It is usually not the hardware that is the culprit here — the chips on the card may be perfectly capable of producing images (or digitizing them) using exactly the correct sampling rates and pixel clocks, but the programmer who designed the driver that controls the hardware may have taken some special liberties and shortcuts, leading to inaccuracies. (Possibly the drivers for these problematic devices were designed by someone who has not studied the relevant video standards.)

Fortunately, you can check out your devices and, if necessary, calibrate your capture workflow by following these instructions. (The only way you can find out these flaws for sure is comparing test images as detailed in the above link, or using a test card generator and an oscilloscope.)

5. Related Links

A brief ITU-R BT.470 summary: Characteristics of B,G/PAL and M/NTSC television systems
ITU-R BT.470: Conventional television systems
ITU-R BT.601: Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios (aka CCIR 601 aka Rec. 601)
EBU Technical Recommendation R92-1999: Active picture area and picture centring in analogue and digital 625/50 television systems
Lurkertech / SGI: Square and Non-Square Pixels by Chris Pirazzi. Note that this article takes a bit different view to aspect ratio issues than this web page. This is mostly because the author of the article chose to consider the "industry standard" square pixels the same as real square pixels, whereas I wanted to make a pedantic distinction between the two. (Yes, I am an impractical fool.)
Snell & Wilcox: Your Essential Guide to Digital [page 17, chapter 4.3] by John Watkinson
Quantel Digital Factbook [the definition of "Aspect ratio… of pixels"]
Google Groups Archives: BBC engineers discussing the very subject in the Usenet: Article #1, Article #2
Google Groups Archives: The Definition of CIF image format. In order to turn this to a point-counterpoint discussion, also see the section Pixel Aspect in MPEG I and MPEG II from Square and Non-Square Pixels. It seems there really is an unresolved standardisation quirk: CIF aspect ratio is theoretically defined as if it was independent of ITU-R BT.601 pixels, but for all practical purposes it still depends on them, since the video acquisition devices — in practical world — are based on them. ;-) It seems the original intent of MPEG / H.26x committees was to use "easier" numbers and definition for aspect ratio, but this actually only serves to create confusion. The moral of the story: do not blindly trust MPEG headers on aspect ratio issues. Instead, make your own educated judgement based on which kind of equipment you use, where the video is from, and how you actually want the conversion done. Most video is sampled at 13.5 MHz and converted directly - pixel-by-pixel - to these formats, regardless of what the headers nominally claim the aspect ratio to be.
Determining the Capture Window of a Capture Card. Cheap consumer-level video capture devices (such as regular tv tuner cards — or rather, their drivers) are not always calibrated to follow the industry standards — even though the capture resolutions available in the driver would appear to suggest they do. This article explains a practical method of finding out how your card does it.