... I wouldn't mind hearing the grownup version with Mr. Nyquist included.
I found an old blog post I wrote on a similar subject some time ago, so, with generous self-plagiarism, here's my shot at your answer. Warning: it's long. Mark Twain is supposed to have said, "I didn't have time to write a short letter, so I wrote a long one instead." This post is kinda like that.
Before we talk about antialiasing, we need to understand what aliasing is. For that, we go to AT&T in 1924, where a guy named Harry Nyquist came up with a surprising idea, later generalized by the father of information theory, Claude Shannon.
Even though his ideas had broader applicability, Nyquist worked only with signals that varied with time, and I’m going to consider only those kinds of signals at first. Nyquist was interested in the voice signals that AT&T got paid to transmit from place to place, and surprisingly to me, telegraph signals. These days, almost everything you hear has been sampled: CDs, wireline and cellphone telephone calls, satellite and HD radio, Internet audio, and iPod music.
What came out of Nyquist’s insight (which, in 1924, looked at the problem backwards from the way I'm stating it here) was that, given an idealized sampling system (perfect amplitude precision, no sampling jitter, infinitely small sampling window, etc.), if you took regularly spaced samples of the signal at a rate faster than twice the highest frequency in the signal, you had enough information to perfectly reconstruct the original signal. This came to be known as the Nyquist Criterion, and although I’ve worked with it for 45 years (first in data acquisition and process control, then in telephone switching systems, and finally in image processing), I still find it pretty amazing. It turns out that the Nyquist Criterion’s path from theory to practice has been pretty smooth; real systems, which don’t obey all the idealizing assumptions (some of which are pretty severe: pick any frequency you like, and any signal of finite duration has some frequency content above it) come very close to acting like the ideal case.
What if there is significant signal content at frequencies above half the sampling frequency? Let’s imagine a system which samples the input signal 20,000 times a second. We say the sampling frequency is 20 kilo Hertz, or 20 KHz. Let’s further say that this system is connected to a system to reconstruct the signal from the samples. Now let’s put a single frequency in the input, as see what we get at the output. If we put in 5 KHz, we get the same thing out. We turn the dial up towards 10 KHz, and we still get the same signal out as we put in. As we go above 10 KHz, a strange thing happens: the output frequency begins to drop. When we put 11 KHz in, we get a 9 KHz output. 12 KHz give us an output of 8 KHz. This continues to happen all the way to 20 KHz, where we get 0 Hz (dc) signal. 21 KHz in gives us 1 KHz out, 22 KHz in gives us 2 KHz out, etc.
Engineers, trying to relate this situation to everyday life, said that, at over half the sampling frequency, the input signal appears at the output, but “under an alias”. Thus, since in English there is no noun that cannot be verbed, we get aliased signals and aliasing.
Aliasing is almost always a bad thing. If aliasing is present, it’s impossible to tell whether the signals at the output of the reconstructive device were part of the original signal, or were aliased down in frequency from somewhere else in the spectrum. Therefore, in systems that sample, transmit or store, and reproduce time-varying signals in the real world, such as CDs, the audio part of DVDs, or telephone systems, place a filter in front of the sampler to diminish (attenuate is the engineering word) signal content at frequencies above half the sampling frequency. This filter is called an antialiasing (or AA, or, if you’re an engineer, “A-squared”) filter.
Now let’s generalize the one-dimensional sampling I talked about above to instantaneous two-dimensional continuous spatial signals, such as images produced by a lens, sampled by idealized and actual image capture chips.
In order to do this, I’m going to have to get into spatial frequency. I’m going to punt on two-dimensional spectra, because I don’t know of any way to deal with them at all rigorously without a lot of math, but fortunately, for the purposes of understanding how to use digital cameras, rather than how to design them, we can think in terms of one-dimensional frequency at various places and at various angles.
If you’re a serious photographer and have a technical bent, you’ve probably been looking at modulation transfer function (MTF) charts as part of your evaluation of lenses. If you haven’t seen one in a while, go to http://us.leica-camera.com/photography/m_system/lenses/6291.html
and click on the link on the right hand side labeled “Technical Data”. Open the Acrobat-encoded data sheet for the (spectacular, as far as I’m concerned) 18mm f/3.8 Leica Super Elmar-M ASPH lens. Page down to the resolution charts. You’ll notice that they give contrast data for 5, 10, 20, and 40 line-pair per millimeter test targets at various places across the frame, at two orientations of the target. Those four targets, while nearly square waves rather than sinusoids, amount to inputs of four different sets of spatial frequencies to the imaging system. The units will seem more analogous to the units of temporal frequency, which were cycles per second before Dr. Hertz was honored, if we replace “line pair” (one black and one white line) with “cycle”, and talk about cycles per mm.
Another place you may have encountered spatial frequency is in doing your own lens testing using a test target. These targets usually have groups of lines at various pitches, angles, and positions, and you look at the image of the target and figure out what places you can’t make out the lines any more. Then you divide the spatial frequencies (measured in line pairs per mm) by the reduction ratio from the chart to the film plane, and that’s your ultimate (contrast equals zero) resolution. This kind of testing doesn’t give the richness of information obtainable from MTF curves, but you can do it at home with no special computer program. It is also more relevant to aliasing, since the frequency at which the lens just starts to deliver zero contrast is the frequency above which there will be no aliasing, no matter what the resolution of the image sensor or the presence or absence of an antialiasing filter.
In order to discuss spatial sampling frequency, we have to turn most of what you probably know about digital sensors on its head. Most manufactures and photographers talk about sensor resolution in terms of pixel pitch: the distance between the centers of adjacent pixels. The Nikon D3x has a pixel pitch of 5.94 micrometers; the Pentax D645 virtually the same; the Hasselblad H3D, 6.8; and the Nikon D3s, 8.45. We can turn the pixel pitch into the sampling frequency by inverting it, so the D3x has a sampling frequency of 169.2 K samples/meter, or 169.2 samples/mm. A crude analysis says we won’t have any aliasing if the D3x optical system (lens and antialiasing filter) zero contrast frequency is half of the sensor sampling frequency, or 85 cycles/mm.
Alas, things aren’t that simple. I have bad news, good news and very bad news.
The bad news is that the kind of sensor upon which all the above sampling discussion is based is completely impractical. Nyquist’s original time-based work assumed sampling the signal at an instant in time. The extensions discussed in this post have so far assumed sampling the image at infinitesimally small points on the sensor. In the real world, as we make the light-sensitive area in a pixel smaller and smaller, it gets slower. Turning up the amplifier gain to compensate for this reduction in sensitivity increases noise. We’ve all seen the results of tiny sensors in inexpensive high-pixel-count point and shoot cameras, and we surely don’t want that in our big cameras, which would be what we’d get if we spread tiny little light receptors thinly across a big image sensor. The result of making the image sensor larger than optimal is that we don’t get images that are as sharp as they should be; the larger receptor area causes image spatial frequencies near the Nyquist limit of half the sampling frequency to be attenuated.
The good news is that increasing the area of the sensor receptors reduces aliasing, and does it fairly efficiently. William Pratt, in his book Digital Image Processing, 2nd Edition
, on pages 110 and 111, compares a square receptor with a diffraction-limited ideal lens and finds that, for the same amount of aliasing error, the lens provides greater resolution loss. He asserts, but does not provide data, that a defocused ideal lens would perform even more poorly than the diffraction-limited lens. In digital cameras, this kind of antialiasing filtering, which comes for free, is called fill-factor filtering, since it is related to how much of the grid allocated to the sensor is sensitive to light.
In the transitional period of my digital photographic career, when I was using film capture and digital output, I used an Optronics Colorgetter drum scanner. The scanner let you control the scanning aperture independent of the pixel resolution. I started making the aperture smaller and smaller, figuring that I’d get better detail. Instead of getting better, things went rapidly downhill. It took me a while to realize that the film grain provided an immense amount of high frequency detail, and that, by making the scanning aperture smaller, I was aliasing more grain noise into the scan.
The really bad news is that, with the exception of those using monochrome sensors (think 3-chip TV cameras) and the handful employing Carver Mead’s Foveon sensors, digital cameras don’t detect color information for each pixel at the same place on the chip.
The most common way of getting color information is to put various color filters over adjacent pixels. The Bayer, or GRGB, pattern, invented by Bryce Bayer at Eastman Kodak, uses twice as many green filters as red or blue ones, on the quite reasonable theory that a broad green filter response is not too far from the way the human eye responds to luminance, and luminance resolution in the eye is greater than chroma resolution. The use of this pattern of filters requires that calculations involving neighboring pixels need to be performed to convert each monochromatic pixel on the sensor to a color pixel in the resultant file. This mathematical operation, called demosaicing, assumes that there is no image detail finer than the group of cells involved in the calculation. If there is this kind of image detail, any aliasing will cause not only luminance errors, but color shifts.
The first 21st-century digital camera without an antialiasing filter that I used was the Kodak DCS-14N. One of the first pictures I made included in the foreground a wet asphalt road. The tiny pebbles in the asphalt combined with the way the sun was reflecting off them created a lot of high-frequency detail. The demosaiced image was a riot of highly saturated noise in all the colors of the rainbow.
It’s not easy to put a number on the amount of filtering necessary to keep aliasing from happening in a Bayer or similar array, but I’m going to give it a try. I’ve noticed that, if you turn the sensor at a forty-five degree angle, the green dots form an array whose centers in turn form a grid of squares. The edge of each of those square is the square root of two times the pitch of the Bayer array. So, at the very minimum, the sampling frequency for the antialiasing calculation should be based on a pixel pitch of 1.4 times the actual pitch. To get an upper bound, note that the red-filtered photosensors and the green-filtered ones form an array whose centers are squares of twice the size of the pixel pitch.
So let’s now go back to our Nikon D3x, and note that, if it didn’t have an antialiasing filter, we’d see aliasing if there were any image content above some number between 60 and 42 cycles per millimeter. These numbers are within the zero-contrast resolving power of almost any decent lens. Note that the Leica 18mm lens referenced at the beginning of this post has about 80% contrast in the center of the field at 40 lp/mm at f/5.6 and wider.
Things are a little better than this because of the fill-factor filtering mentioned above, but note that the using a Bayer array means that the fill factor for green light can never go over 50% and for red and blue light, the maximum is 25%.
A more relevant sensor is the one in the Leica M9, which has a pixel pitch of 6.8 micrometers and no antialiasing filter. We’ll see aliasing if there’s any image content above some number between 52 and 37 cycles per millimeter. If we put the 18mm Leica lens on the M9, set it to f/5.6, hold it steady and aim it at a still subject, the only thing that’s going to keep us from seeing aliasing is lack of sharp focus or lack of detail in the subject.
Let’s now move to resampling in an image-editing program. We don’t have to worry about the Bayer pattern any more, which is good. If we take our M9 image and resample it to 10% in both the horizontal and vertical directions, in the absence of any smoothing, we’ll see aliasing if there’s any information in the image above around 4 cycles per millimeter, which any lens is capable of delivering at just about any aperture. If blew up the M9 sensor so that the pixel pitch were 68 micrometers, we’d get some filtering through the fact that each pixel would be taking in light across a 68x68 micrometer (fill factor assumed 100%. Because of the Bayer array, it's at best 50%, but the demosaicing software interpolates across neighboring pixels. Yeah, I know it's a crude approximation, but it's better than nothing.) area. To simulate that effect in our downsampled image, we’d have to average a 10x10 pixel area, or 100 pixels total, for each pixel in the output image.
Bilinear interpolation considers at most the four pixels in the input image that are the closest to where the output pixel will be. Bicubic goes farther, incorporating information from up to 16 pixels into the result, a sixth of the number of pixels we’d need to simulate our big-photosite 180 Kpixel (18 Mpixel/100) M9.