http://www.iki.fi/znark/video/conversion/
There is a fair number of mind-blowing, scary oddities and secrets in the world of digital video.
One of the very first a beginner will usually encounter is the fact that in digitized video data, pixels are often not considered "square" in their form. In most real-world digital video applications pixels have a width/ height ratio — oraspect ratio, as it is more conveniently called — that can be something completely different from 1/1!
The second great revelation usually comes when one runs into the concept of anamorphic 16:9 video for the very first time. If it was initially hard to grasp the idea of pixels changing their shape when displayed in different environments, this one is even more baffling: the very same pixel resolution you have only just learned to associate with 4:3 displays can now suddenly represent another, totally different image geometry. In other words, the pixels have changed their shape again!
Unfortunately, these two are often the only things most ordinary people will ever learn about digital video and aspect ratios.
Tutorials and manuals usually tend to keep very quiet and secretive about the finer technical details of digital video, particularly when it comes to the topic of (pixel) aspect ratios and image geometry.
Even if converting (resampling) video clips to other resolutions is discussed, the accompanying explanation is usually troublingly simplistic and vague — often inaccurate and misleading — and sometimes the suggested methods are just plain wrong. It is not uncommon that the examples only deal with arbitrarily chosen ("x pixels by y pixels") frame dimensions and use ideal frame aspect ratios such as 16:9 or 4:3 as the basis for calculations — not the actual pixel aspect ratios — which is usually a good indicator that the writer may not actually take the real image geometry into account at all.
It is almost as if the whole aspect ratio issue was considered some sort of dirty little secret of the video industry; black magic you could not even begin to explain to mere mortals in reasonable terms. This is a shame. In this case, there is really more to it than meets the eye. Confusing people with incomplete and watered-down explanations does not do any good to the industry.
Now that you have read this far, it is time to reward your effort with The Third Big Revelation about aspect ratios and frame sizes - the one that is usually left unsaid:
Not a single oneof the commonly used digital video resolutionsexactlyrepresents the actual 4:3 or 16:9 image frame.
Shocking, isn't it? 768x576, 720x576, 704x576, 720x480, 704x480, 640x480… none of them is exactly 4:3 or 16:9; not even the ones you may conventionally think as "square-pixel" resolutions.
So there. Now you finally know the truth. Let's find out what it actually means.
Digital video standards do not live outside the realm of analog world. On the contrary, all commonly used modern (SDTV) digital video formats have a well-defined relationship with their counterparts in analog video standards. You could really say they have their roots in analog soil.
And now, my friend, we are rapidly closing to The Fourth Big Revelation:
It is really theanalogvideo standards that define the image geometry and pixel aspect ratio in digital formats.
Even if you did all of your video work solely in digital domain, those pesky old analog video standards still define the shape of your images and pixels.
How come?
From the video industry's point of view, the current (SDTV, as opposed to HDTV which is another kettle of fish) digital video formats - those that actually get used in practical real-life applications such as DVD, DV, VCD, SVCD, digital television etc. - are all about interoperability. At the advent of digital video - late 1970's, when committee work was started on CCIR 601 (later to become ITU-R BT.601) - there was already a vast catalog of analog video material in formats defined solely by analog standards. What is more, enormous amounts of money had been poured in analog studio equipment such as cameras, video switchers, proc amps, tape decks and other tools of trade. What a waste it would have been if the "next generation" digital video formats were designed in a such way they had absolutely nothing in common with old analog formats, and required ditching all the analog equipment!
It was clear from the beginning that the industry wanted a smooth, well- defined transition path between the current analog systems and the brave new digital world without running into too many compatibility issues. It was also considered necessary to be able to freely mix and match digital and analog equipment. The result was that the digital (SDTV) video formats we now use are based on the concept of digitizing old, analog video signals, thus interlocking to the analog video standards.
This connection between the digital and analog domains is permanent. Some of the fundamental features of digital video, such as image geometry, are actually defined in the analog standards. Even if we go all-digital, the relationship is still there, as long as we use either ITU-R BT.601 pixels or "industry standard" square pixels.
There are three basic sampling rates from which almost all modern digital video formats are derived:
13.5 MHz ITU-R BT.601 (aka CCIR 601 aka Rec. 601) non-square pixels for both 625/50 and 525/59.94 systems. This sampling rate was originally designed for digitizing component video signals. Now used extensively in almost all modern digital video gear.
14.75 MHz "Industry standard" square pixels for 625/50 systems. Originally designed for digitizing composite video signals.
12 + 3/11 MHz SMPTE 244M "industry standard" square pixels for 525/59.94 systems. Originally designed for digitizing composite video signals.
Let's see how this works out with 13.5 MHz and both 525/59.94 and 625/50 systems:
If you have the B/W (luminance) part of a component video signal in a coaxial cable, you can plug in an A/D converter and start metering (sampling) the voltage level in the cable at regular intervals.
625/50 systems have a line length of 64 us, of which 52 us is the "active" part that contains actual image information. (The rest is reserved for horizontal blanking.)
o 52 us x 13.5 MHz = 702 samples (pixels) per scanline o In the vertical direction, there are 574 complete scanlines and 2 half lines. Even the half lines get digitized as if their "missing" other half belonged to the active picture, giving a total of 576 scanlines. o Thus, the active image area at 13.5 MHz sampling is 702x576 pixels. This is the actual area that forms the 4:3 (or anamorphic 16:9) frame.
525/59.94 systems have a line length of 63+5/9 (63.555…) us, of which 52+59/90 (52.6555…) us is the "active" part that contains actual image information. (The rest is reserved for horizontal blanking.)
o 52+59/90 us x 13.5 MHz = 710.85 samples (pixels) per scanline. o In the vertical direction, there are 484 complete scanlines and 2 half lines. As above, all of them get digitized and half lines will be treated as if their missing other half belonged to the active picture, giving a total of 486 scanlines. o Thus, the active image area at 13.5 MHz sampling is 710.85x486 pixels. This is the actual area that forms the 4:3 (or anamorphic 16:9) frame. o However, we cannot use partial pixels in any practical video work. Therefore, the number 710.85 needs to be rounded up to 711, and we get a 711x486 pixel frame instead. o 711 samples equals to 52+2/3 (52.666...) us at 13.5 MHz, so the rounded- to-the-nearest-pixel active area is a little bit wider than it ideally ought to be. Fortunately, the difference of 0.0111... us is (for all practical purposes) insignificant, and well within the tolerances of NTSC-M specifications.
It also works the same way for square-pixel sampling rates. You will just get a different number of horizontal samples. The calculations are left as an exercise to the reader.
If you did not understand a word of the above, you might want to take a look at the following introductory links:
Also see the Related Links section.
The following is a frame size and aspect ratio conversion table, representing many commonly used digital video formats:
| | |The formats related to 625-line systems with a 50 Hz field rate | | | | | | | | | |sampling | |pixel |sampling |actual active | | | |matrix |sampling |aspect|matrix |picture size |supports | | | | |rate |ratio |width in | | |interlacing|notes | |width|height|(MHz) |(x/y) |us |width |height| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |"Industry | | | | | | | | | |standard" | | 768 | 576 | 14.75 | 768/ |52.06780 | 767 | 576 | Y |625/50 | | | | | 767 | | | | |square- | | | | | | | | | |pixel | | | | | | | | | |video | | | | | | | | | | | | | | | | | | | |"True" | | | |14 + 10/ | | | | | |computer | | 768 | 576 | 13(2) | 1/1 |52.00000 | 768 | 576 | Y |square- | | | | | | | | | |pixel | | | | | | | | | |resolution| | | | | | | | | | | | 768 | 560 | 14.75 | 768/ |52.06780 | 767 | 576 | Y |CD-i(3) | | | | | 767 | | | | | | | | | | | | | | | | | | | | 128/ | | | | |D1, DV, | | 720 | 576 | 13.5 | 117 |53.33333 | 702 | 576 | Y |DVB, DVD, | | | | | | | | | |SVCD(3) | | | | | | | | | | | | | | | | | | | |Oddball | | | | | | | | | |compromise| | | | | | | | | |format. | | | | | | | | | |Better to | | 720 | 540 |ambiguous| 1/1 |ambiguous| 720 | 540 | N |avoid | | | | | | | | | |unless you| | | | | | | | | |really | | | | | | | | | |know what | | | | | | | | | |you are | | | | | | | | | |doing. | | | | | | | | | | | | | | | 128/ | | | | |DVD, H.263| | 704 | 576 | 13.5 | 117 |52.14815 | 702 | 576 | Y |(4CIF), | | | | | | | | | |VCD(3) | | | | | | | | | | | | | | | | | | | |Active | | | | | | | | | |picture | | | | | | | | | |frame for | | 702 | 576 | 13.5 | 128/ |52.00000 | 702 | 576 | Y |625/50 | | | | | 117 | | | | |systems in| | | | | | | | | |ITU- | | | | | | | | | |R BT.601- | | | | | | | | | |4 pixels. | | | | | | | | | | | | | | | | | | | |DVB (3/ | | | | | 512/ | | | | |4 of | | 544 | 576 | 10.125 | 351 |53.72840 |526+1/2| 576 | Y |BT.601 | | | | | | | | | |sampling | | | | | | | | | |rate) | | | | | | | | | | | | | | | | | | | |SVCD (2/ | | | | | | | | | |3 of | | 480 | 576 | 9 |128/78|53.33333 | 468 | 576 | Y |BT.601 | | | | | | | | | |sampling | | | | | | | | | |rate) | | | | | | | | | | | | | | | | | | | |1/4 of | | 384 | 288 | 7.375 | 768/ |52.06780 |383+1/2| 288 | N |"industry | | | | | 767 | | | | |standard" | | | | | | | | | |768x576 | | | | | | | | | | | | 384 | 280 | 7.375 | 768/ |52.06780 |383+1/2| 288 | N |CD-i | | | | | 767 | | | | | | | | | | | | | | | | | 352 | 576 | 6.75 | 256/ |52.14815 | 351 | 576 | Y |DVD | | | | | 117 | | | | | | | | | | | | | | | | | | | | | | | | |VCD, DVD, | | 352 | 288 | 6.75 | 128/ |52.14815 | 351 | 288 | N |H.261 + | | | | | 117 | | | | |H.263 | | | | | | | | | |(CIF) | | | | | | | | | | | | | | | 128/ | | | | |H.261 + | | 176 | 144 | 3.375 | 117 |52.14815 |175+1/2| 144 | N |H.263 | | | | | | | | | |(QCIF) | | | |The formats related to 525-line systems with a 59.94(1) Hz field rate | | | | | | | | | |sampling | |pixel |sampling |actual active | | | |matrix |sampling |aspect|matrix |picture size |supports | | | | |rate |ratio |width in | | |interlacing|notes | |width|height|(MHz) |(x/y) |us |width |height| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |Oddball | | | | | | | | | |compromise| | | | | | | | | |format. | | | | | | | | | |Better to | | 720 | 540 |ambiguous| 1/1 |ambiguous| 720 | 540 | N |avoid | | | | | | | | | |unless you| | | | | | | | | |really | | | | | | | | | |know what | | | | | | | | | |you are | | | | | | | | | |doing. | | | | | | | | | | | | 720 | 486 | 13.5 |4320/ |53.33333 |710.85 | 486 | Y |D1 | | | | | 4739 | | | | | | | | | | | | | | | | | 720 | 480 | 13.5 |4320/ |53.33333 |710.85 | 486 | Y |DV, DVB, | | | | | 4739 | | | | |DVD, SVCD | | | | | | | | | | | | | | | | | | | |Active | | | | | | | | | |picture | | | | | | | | | |frame for | | 711 | 486 | 13.5 |4320/ |52.66667 |710.85 | 486 | Y |525/59.94 | | | | | 4739 | | | | |systems in| | | | | | | | | |ITU- | | | | | | | | | |R BT.601- | | | | | | | | | |4 pixels. | | | | | | | | | | | | 704 | 486 | 13.5 |4320/ |52.14815 |710.85 | 486 | Y | | | | | | 4739 | | | | | | | | | | | | | | | | | 704 | 480 | 13.5 |4320/ |52.14815 |710.85 | 486 | Y |ATSC, DVD,| | | | | 4739 | | | | |VCD(3) | | | | | | | | | | | | | | | | | | | |"True" | | | | | | | | | |computer | | | | 12 + | | | | | |square- | | 648 | 486 | 1452/ | 1/1 |52.65556 | 648 | 486 | Y |pixel | | | | 4739(2) | | | | | |resolution| | | | | | | | | |(all 486 | | | | | | | | | |active | | | | | | | | | |scanlines)| | | | | | | | | | | | | | | | | | | |D2: | | | | | | | | | |"industry | | | | |4752/ | |646+5/ | | |standard" | | 640 | 480 |12 + 3/11| 4739 |52.14815 | 22 | 486 | Y |525/59.94 | | | | | | | | | |square- | | | | | | | | | |pixel | | | | | | | | | |video | | | | | | | | | | | | | | | | | | | |"True" | | | | 12 + | | | | | |computer | | 640 | 480 | 1452/ | 1/1 |52.00549 | 648 | 486 | Y |square- | | | | 4739(2) | | | | | |pixel | | | | | | | | | |format | | | | | | | | | |(cropped) | | | | | | | | | | | | | | | | | | | |SVCD (2/ | | | | |6480/ | | | | |3 of | | 480 | 480 | 9 | 4739 |53.33333 | 473.9 | 486 | Y |BT.601 | | | | | | | | | |sampling | | | | | | | | | |rate) | | | | | | | | | | | | 352 | 480 | 6.75 |8640/ |52.14815 |355.425| 486 | Y |DVD | | | | | 4739 | | | | | | | | | | | | | | | | | 352 | 240 | 6.75 |4320/ |52.14815 |355.425| 243 | N |VCD, DVD | | | | | 4739 | | | | | | | | | | | | | | | | | 320 | 240 |6 + 3/22 |4572/ |52.14815 | 324 | 243 | N |1/4 of | | | | | 4739 | | | | |640x480 | | | |(1) 59.94 Hz is only a conventional approximation; the mathematically exact | |field rate is 60 Hz * 1000/1001. | |(2) A calculated sampling rate, represented here only for completeness. Does | |not exist in actual 525/625 video equipment. | |(3) Only used for still images. |
Let's assume you have a video clip in one format and wish to convert it to another, so that it remains in correct aspect ratio throughout the process.
Calculate the vertical conversion factor by using the following formula: vertical conversion factor = target active picture height / source active picture height. (Be sure to use the active picture values from the table, not the sampling matrix size values.)
640x480 "industry standard" square pixels to 720x480 ITU-R BT.601 pixels
Let's say I have captured a video clip from 525/59.94 source using an old M- JPEG card that only allows sampling in "industry standard" (12 + 3/11 MHz) square pixel format. The resolution of the clip is 640x480. Now I would like to incorporate this into a DV project that uses ITU-R BT.601 pixels and a resolution of 720x480.
The first step is to look up the correct source and target formats from the table.
In other words, a "PAL" to "NTSC" conversion:
Again, the first step is to look up the correct source and target formats from the table.
If not, then why are 720 pixels sampled instead of 711 or 702 (or whatever)?
720 pixels are sampled to allow for little deviation from the ideal timing values for blanking and active line lenght in analog signal. In practice, analog video signal - especially if coming from a wobbly home video tape recorder - can never be that precise in timing. It is useful to have a little headroom for digitizing all of the signal even if it is of a bit shoddy quality or otherwise non-standard.
720 pixels are also sampled to make it sure that the signal-to-be-digitized has had the time to slope back to blanking level at the both ends. (This is to avoid nasty overshooting or ringing effects, comparable to the clicks and pops you can hear at the start and end of an audio sample.)
Last but not least, 720 pixels are sampled because a common sampling rate (13.5 MHz) and amount of samples per line (720) makes it easier for the hardware manufactures to design multi-standard digital video equipment.
It means that the sampled horizontal range of the signal is a bit wider than the actual active image frame:
Yes, you understood correctly. 720x576 is not exactly 4:3, and neither is 720x480. The real 4:3 frame (as defined in the analog video standards) is a bit narrower than the horizontal range of signal that actually gets digitized.
Yes, it is the same for all generally available digitizing equipment; tv tuner cards, digital video cameras and such. It is true even for all-digital systems; otherwise they would not be compatible with ITU-R BT.601.
I am pretty sure there is a mistake in your calculations. It says everywhere that 720x576 or 720x480 really is 4:3. Please stop propagating this misinformation!
I admit that the figures presented on this web site are not very well-known facts even amongst professional videographers, not to mention hobbyists. Aspect ratio is one of the most misunderstood "black magic" issue in digital video. That is precisely why I constructed the web site in the first place - to share the knowledge.
As for my calculations; feel free to prove them wrong. For starters, you might want to read the documents in the Related Links section.
I know my stuff! If you were correct, everything I have done to process my precious video has always been wrong, aspect-ratio wise!
That may very well be the sad truth. Fortunately, even if you had used wrong methods for scaling/resampling the image, the difference between the correct aspect ratio and a wrong aspect ratio is often small enough to go unnoticed unless you really start looking for it.
For starters, all the 525/59.94 equipment I have only works in 720x480, not in 720x486 (and definitely not in 711x486)! How do you explain that?
525/59.94 video signal has 486 active (image-carrying) scanlines, but modern digital video equipment usually crops 6 of them off. Why? To get the height of the image down to 480 pixels, which is neatly divisible by 16. See for yourself:
Also note that 720 / 16 equals exactly to 45 so the width of the image is divisible by 16, as well!
Modern digital video applications such as DV, DVD and digital television (DVB, ATSC) often use MPEG-1 or MPEG-2 formats (or their derivatives) which are all based on 16x16 pixel macroblocks. Having the height and width of the image readily divisible by 16 makes it easier and more efficient for an MPEG encoder to compress video.
Correct, but the information might not have been that valuable in the first place. Most 525/59.94 video work is already done solely in the digital domain and in the 720x480 format, so there is usually nothing to digitize on those scanlines anymore. Moreover, in the good old days (when all of those 486 scanlines were still in active use) most of the time the edges only carried flickering VCR head noise.
The video image is masked by the overscan edges of a CRT based television, so you would not normally see the "missing" scanlines, anyway.
Think it this way:
There is also another way of thinking it:
The latter way of thinking will also lead to cropping off the side edges of the image to get it inside a 4:3 rectangle (albeit a bit smaller than the "real" one), but then again, if you are restricted to using 704x480, that decision has already pretty much been made for you.
Doesn't PAL 720x576 exactly equal to NTSC 720x480?
As can be seen from the example in section 3.2.2, the answer is no. If you simply resample from 720x576 to 720x480, the analog active areas of the source and target formats will not match. Fortunately, there is a bit fool- proofness built-in to the relationship of these two frame sizes. What you will actually get from the process is an image in which the original analog active area (702x576 centermost pixels of 720x576) has become 702x480 in the target format's pixels. This, in turn, almost represents a 4:3 area, albeit a bit smaller than what would be needed for a perfect conversion.
The area that 702x480 covers is not the same as the actual analog active image frame (which would be 710.85x486, or, in practical terms, 711x486). It is more like a smaller 4:3 frame inside it.
In other words, the result is that the active 4:3 image frame in the source format has shrunk a bit in the conversion: it has lost six (target) scanlines in vertical direction and the same relative amount of width. However, for all practical purposes, it has still retained its original aspect ratio. The easiest way to see this is converting 702x480 (in 13.5 MHz 525-line ITU-R BT.601 format) to "true" square pixels: 639 + 4419/4739 square pixels by 480 scanlines is a close enough match to 640x480, which is 4:3. Wonderful coincidence, isn't it? :)
The same peculiar relationship applies to all 525/625 "sister resolutions" derived from 13.5 MHz:
This holds true on two conditions:
As direct resampling involves shrinkage (or when going in another direction, enlargement), I cannot really recommend this method for any real standards conversion work. It is more like a quick hack, suitable for use e.g. if the software does not allow proper resizing and cropping.
Note: Many people use direct resampling for all the wrong reasons: 1) They think that a 720x480 frame directly equals to a 720x576 frame. 2) They also think that both aforementioned frame sizes represent exactly the active 4:3 (or 16:9) picture area, edge to edge. As you already know from Section 2.1, both of these assumptions are wrong. The fact that direct resampling works at all is mostly a quirky coincidence
The problem with this resolution is that while you think you are editing in a format that is both 1) 4:3 square pixels and 2) easily convertable to a standard video resolution (either 720x576 or 720x480) just by vertical resampling, you are not. See the table. There is no real world video format that would use full 720 pixel horizontal range as the width of the active 4: 3 frame.
In order to get to a standard video format from this one, you need to take in account the actual form of the sampling matrices. The 4:3 area in 625- line formats is 702x576, not 720x576. In 525-line formats it is 711x486, not 720x480. Resizing a 720 pixels wide 4:3 format directly to 720x576 or 720x480 simply won't work. You will either have to resample in both directions (unlike you originally thought, you do not get to keep the image width neatly as 720 pixels at all times), or to crop some top and bottom lines off.
If you need to construct an intermediary square-pixel resolution that is a) exactly 720 pixels wide and b) covers exactly the same area as 720x576 or 720x480 (thus only having to resample in vertical direction for conversions), you will end up with two separate resolutions, one for each video standard:
Fortunately, the numbers will nicely round up to 720x527 for both standards.
Note that the original interlaced field structure (if any) will go haywire as you mess around scaling in the vertical direction.
"Square pixels", as digitized by a TV tuner or an M-JPEG card, are not exactly square. The "industry standard" sampling rates used in square-pixel video equipment actually give out pixels that are almost square, but not exactly. As you can see for yourself in the table, the difference is very small - for all practical purposes meaningless - but it is still useful to know that sampled "video" square-pixels differ a bit from ideal "computer" square pixels.
Converting "computer" square pixels to "video" square pixels is usually a futile effort. You will not see the difference, anyway, and probably only lose some quality in the interpolation process.
I thought digital video was simple! Now my head hurts!
But that's just the way video is. Fortunately, the conversions are not really that complicated once you practice them a little.
No-one will ever notice if I consider all "4:3" video formats just 4:3, without doing any complicated aspect ratio or "active image area" calculations.
Feel free to process your video just the way you like it. But there are still many people who would like to get as close to the ideal aspect ratio correctness as possible, instead of only using rough "ballpark figures" in their video work.
You may be correct. The professional video gear is very strict about conforming to the ITU-R BT.601 standard, and you can also generally trust DV camcorders and DVD players/recorders using the correct sampling rates and pixel clocks. However, the PC hardware market is different: cheap mass- marketed tv tuner cards and "tv out" cards ofter seem to have these design flaws and inaccuracies in their drivers: sometimes they are using the common, industry-standard frame formats (such as 720x480) with sampling rates that are just plain wrong or sufficiently off the mark to create problems.
It is usually not the hardware that is the culprit here — the chips on the card may be perfectly capable of producing images (or digitizing them) using exactly the correct sampling rates and pixel clocks, but the programmer who designed the driver that controls the hardware may have taken some special liberties and shortcuts, leading to inaccuracies. (Possibly the drivers for these problematic devices were designed by someone who has not studied the relevant video standards.)
Fortunately, you can check out your devices and, if necessary, calibrate your capture workflow by following these instructions. (The only way you can find out these flaws for sure is comparing test images as detailed in the above link, or using a test card generator and an oscilloscope.)
This page is maintained by Jukka Aho. Last updated: 1-Mar-2004