Okay, started to write this, decided halfway that it was boring the h*ll out of even myself, then decided it could still be interesting.
You read this to the end at your own risk and peril. You're better off going outside taking pictures. I didn't even bother to properly finish and redact it
Anyone following the recent aliasing discussions on these forums will have frequently read about the audio metaphor. I was wondering about the similarities and was thinking about this:
1) The Audio equivalent
Sampling with a CFA is like sampling a stereo audio signal where you only have 1 sample available per time-unit. So you have to choose which channel to sample, and you have to skip the other channel for each individual sample. A Bayer configuration then is comparable to alternating every other sample between the 2 channels.
If the original signal looks like this:
R1, R2, R3, R4, R5,
L1, L2, L3, L4, L5,
Then the resulting single signal stream will look like this:
R1, L2, R3, L4, R5, L6, R7, ...
Redistributing over left and right:
R1, skip, R3, skip, R5, skip,
skip, L2, skip, L4, skip, L6,
2) Deconstructing the Aliasing problem
For discussion sake, let's assume samples run from -1.0 to +1.0. If we only had to sample a single channel, then the maximum frequency that can be sampled is:
+1.0, -1.0, +1.0, -1.0, +1.0, -1.0,
In image terms that would be the equivalent of a white-black-white-black pattern.
However, because we skip every other sample, we end up with:
+1.0, skip, +1.0, skip, +1.0, skip,
In image terms that would equate to: white - unknown - white - unknown - ...
Clearly, by skipping samples, we have lost crucial information necessary for reproduction. It is then the task of the RAW converter to try and reconstruct the missing samples.
3) Spatial Aliasing
Clearly, by only having half the samples, the RAW converter is not going to be able to correctly reconstruct the original signal. What's more, this also shows that undersampling can completely obscure any guess about the original signal, and can in fact be detrimental to reproducing anything reasonable. It might for example reproduce the following string based on neighboring samples:
+1.0, +1.0, +1.0, +1.0, +1.0, +1.0,
which would essentially push your speaker to its maximum not allowing any more movement for other frequencies. In an image it would show pure white, not allowing any more differences for other colors or high-lights
Unfortunately, we also lack the information to reproduce a reasonable average. And that is where an anti-aliasing filter comes in. Suppose we take the original signal, and instead of just taking the values at their exact location, we also mix a little bit of the neighboring samples into the capturing sample.
For example: half of the direct neighbors.
+1.0, -1.0, +1.0, -1.0,
Sampled with partial neighbors:
0.0, skip, 0.0, skip, 0.0, skip,
For a single sample: S = 0.25x(+1.0) + 0.5x(-1.0) + 0.25x(+1.0) = 0.0
Now something interesting happens. If we again reconstruct by replicating neighboring samples we get a stream of zeros, which would leave our speaker in a reasonable mid position with full potential of still reproducing any other information. In imaging terms: if -1.0 represents black, and +1.0 represents white, we would essentially reproduce a patch of gray to represent high-frequency alternating bands of black and white.
Fortunately in real world image sampling, or sound for that matter, artificial sampling strings such as the one above do not readily occur, and the relative randomness in sampling values will show some differences and allow a slightly more reasonable, albeit still incorrect reconstruction. This will significantly mitigate the problem in most cases.
4) Channel correlation
In order to make a better guesstimate about the original signal the RAW converter will use the correlation between channels that usually exist. This is true for both color channels in images, as well as for stereo channels in sound. You can find a lot of information about this on the internet because a lot of research has gone into these correlations since a lot of compression schemes are based on these correlations.
If the original audio signal represented a pure mono signal, then the combination of successive right and left samples will fully reproduce the original samples. This is equally true for images that are purely grayscale.
If however the stereo signal shows a lot of difference (which would equate to a significant color difference in an image), then this relation is compromised. Fortunately, in color imaging our eye-brain combination is highly insensitive to color differences, which allows a lot of leeway in reproduction.