Ad
Ad
Ad
Pages: « 1 [2] 3 »   Bottom of Page
Print
Author Topic: Will jpeg XR format emancipate us from RAW conversions at last?  (Read 18324 times)
Panopeeper
Sr. Member
****
Offline Offline

Posts: 1805


« Reply #20 on: September 16, 2009, 08:53:43 PM »
ReplyReply

Quote from: GLuijk
Present JPEG is good enough to show your images on the net and even to print them. Why should users and the industry make any effort to change what already works fine enough?
Well, it is good enough with crappy presentation. As soon as more tone levels and higher dynamic range can be reproduced at prices for the masses, the old JPEG will have to go.

Think of modern TV's: contrast ration 2000:1 is already normal, and it seems to go up to 10000 or more very soon; the number of required levels too has to be increased.

Quote
I guess JPEG XR doesn't support undemosaiced data. That reason alone is enough to forget about it for quality digital captures
Would you have guessed, that the venerable, almost never used JPEG lossless compression (first version) supports mosaic data? Almost all compressed raw image data is in that format; it is a question of interpretation. JPEG XR allows for many different formats; everything, which can be done in JPEG can be done in JPEG XR as well, plus a lot more.

I saw size comparison between lossless JPEG, specifically DNG, and JPEG XR; the latter won by a wide margine.
Logged

Gabor
bradleygibson
Sr. Member
****
Offline Offline

Posts: 829


WWW
« Reply #21 on: September 17, 2009, 12:43:00 AM »
ReplyReply

As Gabor points out, it will support many things.  The real question is, (like TIFF today), which and how many of the various options will end up being widely adopted by the industry?

For example, the floating point example I pointed out earlier is not a slam-dunk for hardware manufacturers.  Floating point arithmetic is not something generally implemented in digital cameras today, so adding it represents an incremental expense.  (JPEG XR's fixed point bit format was included to address this particular issue.)  In the end, though, we simply have to hope that the industry "gets it" and supports the right flavor to ensure we as photographers get real benefits out of the new format.

Fingers crossed.

-Brad
« Last Edit: September 17, 2009, 12:44:21 AM by bradleygibson » Logged

Ray
Sr. Member
****
Offline Offline

Posts: 8847


« Reply #22 on: September 17, 2009, 03:11:15 AM »
ReplyReply

Quote from: Panopeeper
Think of modern TV's: contrast ration 2000:1 is already normal, and it seems to go up to 10000 or more very soon; the number of required levels too has to be increased.

Definitely more. Panasonic's current 12th generation plasmas have a claimed contrast ratio of 40,000:1, and a dynamic CR of 2,000,000:1.

There seems to be an implication from Brad that a JPEG XR image can have the processed, punchy appearance of an in-camera conventional jpeg, yet still retain the potential for highlight and shadow detail recovery through floating point arithmetic. How does this work?

I'm reminded of a photographic expedition with my very first digital camera, the Canon D60. I travelled with a laptop. Downloaded all my RAW files to the laptop and converted them to 16 bit tiff using the default settings in Zoombrowser. As my laptop hard drive began to fill, I deleted the RAW files thinking that surely the 16 bit tiffs would contain all the information recorded in the RAW format. What a mistake!
Logged
Panopeeper
Sr. Member
****
Offline Offline

Posts: 1805


« Reply #23 on: September 17, 2009, 11:03:04 AM »
ReplyReply

Quote from: Ray
There seems to be an implication from Brad that a JPEG XR image can have the processed, punchy appearance of an in-camera conventional jpeg, yet still retain the potential for highlight and shadow detail recovery through floating point arithmetic. How does this work?
This would not work. It is not the question of accuracy but the number or retained levels. The raw conversion, like contrast enhancement, but particularly the "gamma encoding" drastically reduces the number of tone levels; accuracy does not help on that.

JPEG XR may supersede the currently used lossless JPEG encoding for storing the raw data. However, the presentation version of an image is not an alternative for the raw data.
Logged

Gabor
Jonathan Wienke
Sr. Member
****
Offline Offline

Posts: 5759



WWW
« Reply #24 on: September 17, 2009, 11:17:00 AM »
ReplyReply

Quote from: bradleygibson
Ben, one of the differences between JPEG XR and 16-bit TIFF is the fact that JPEG-XR has a specification for dealing with high dynamic range floating point information.  What this means in English, is black is 0.0 and white is 1.0 (instead of 0 and 65535 for tradtional 16-bit TIFF).  It is possible to write numbers like 1.2 or -3.4 with JPEG XR (there's no standard way to do this with 16-bit TIFF).  This means that IF you choose to edit and rewrite your data, in floating point format, it is quite difficult to clip your information.

Pardon me, but your ignorance is showing here.

Floating-point numbers still have maximum and minimum values, and have discrete steps between any given value and the next-highest or next-lowest value that can be represented, just the same as integers. 32 bits can only represent 2^32 discrete values, regardless of whether values are stored in those bits in an integer or floating-point format.

The only difference between floating-point and integer values is the interpretation of the binary data used to represent the numeric value. With integers, the interval between adjacent binary values is defined as 1. With floating-point numbers, the interval between binary values can be less than one or greater than 1. But the number of discrete values is still limited by the number bits used to represent the numeric value in either case. It makes no difference if you interpret those 32 bits as representing a value between 0 and 1 with an interval of 0.00000000023283064365386962890625 between adjacent discrete values, or a value between 0 and 4294967296 with an interval of 1 between adjacent discrete values. Using a floating-point format to store RAW or RGB values offers no advantage over a standard integer format.

The number of bits per pixel needed to cover the dynamic range of the sensor with acceptable precision is going to be exactly the same regardless of whether an integer or floating-point format is used to store the data. The same is true of image degradation caused by editing; you'll need the same number of bits in either format to prevent level and curve adjustments from causing unacceptable amounts of banding and posterization. Implementing a floating-point image format in camera would increase the hardware cost without offering any performance benefit. This is why no camera manufacturer has done so.
Logged

ejmartin
Sr. Member
****
Offline Offline

Posts: 575


« Reply #25 on: September 17, 2009, 02:23:06 PM »
ReplyReply

Quote from: Jonathan Wienke
Pardon me, but your ignorance is showing here.

Floating-point numbers still have maximum and minimum values, and have discrete steps between any given value and the next-highest or next-lowest value that can be represented, just the same as integers. 32 bits can only represent 2^32 discrete values, regardless of whether values are stored in those bits in an integer or floating-point format.

The only difference between floating-point and integer values is the interpretation of the binary data used to represent the numeric value. With integers, the interval between adjacent binary values is defined as 1. With floating-point numbers, the interval between binary values can be less than one or greater than 1. But the number of discrete values is still limited by the number bits used to represent the numeric value in either case. It makes no difference if you interpret those 32 bits as representing a value between 0 and 1 with an interval of 0.00000000023283064365386962890625 between adjacent discrete values, or a value between 0 and 4294967296 with an interval of 1 between adjacent discrete values. Using a floating-point format to store RAW or RGB values offers no advantage over a standard integer format.

The number of bits per pixel needed to cover the dynamic range of the sensor with acceptable precision is going to be exactly the same regardless of whether an integer or floating-point format is used to store the data. The same is true of image degradation caused by editing; you'll need the same number of bits in either format to prevent level and curve adjustments from causing unacceptable amounts of banding and posterization. Implementing a floating-point image format in camera would increase the hardware cost without offering any performance benefit. This is why no camera manufacturer has done so.

This is simply incorrect; it would pay to check facts before calling someone ignorant.  

There is a big difference between floating point and integer encoding.  In integer encoding, in all parts of the range -- say 0 to 255 for 8-bit -- the spacing between levels is the same, be it the jump from 0 to 1 or the jump from 254 to 255.  The dynamic range, which is the max signal divided by the quantization step at zero signal, is 255~2^8.

In floating point encoding, some number of bits is used to store a number (the mantissa) between zero and one to a certain degree of precision, and the remaining bits are used to store the exponent.  Suppose again we take 8 bits, and devote 5 to the mantissa and 3 to the exponent.  The mantissa is a number from 0 to 31 (2^5 possible values from the 5 bits); we get a number between zero and one by dividing by 32=2^5.  Then the three bits of the exponent tell us multiply the mantissa by two to the power of the exponent, which in this example can be any number among {1,2,4,8,16,32,64,128}=2^n for n={0,1,2,3,4,5,6,7} (2^3 possible values for the exponent from the three exponent bits).  Thus the gap between levels grows exponentially with the value of the exponent.  At the low end, the spacing of levels is 1/32 between the first two encoded levels, 0 and 1/32; the gap between the two largest levels is 4 ((30/32)*2^7 and (31/32)*2^7).  The dynamic range is the max signal divided by the quantization step at zero signal, which is (31/32*2^7)/(1/32)~2^12.

Floating point is similar in spirit to Nikon's lookup-table compression (sometimes called 'lossy compression' though I don't think this appellation does justice to the idea), in that the level spacing goes up as the average level goes up.  This is justified in images, because photon shot noise rises with the illumination level, and once the noise is sufficiently larger than the level spacing, it is pointless to make the level spacing finer -- it just digitizes the noise more precisely without adding any accuracy to the data.

Floating point is useful for HDR because the exponent covers an extremely wide range of values, and the fact that level spacing grows at the high end is not so important.

If RAW data were stored in floating point, using 9 bits for the mantissa and 3 for the exponent, 12 bits could accommodate about 16 stops of DR.  The 9 mantissa bits are sufficient to cover the typical S/N ratio of DSLR's in the highest stop at the lowest ISO, which in all current cameras is less than 2^9, and therefore prevent posterization.   But then again, people would have to give up their cherished myth that all those levels in the highest stop of integer encoded data mean anything.  Michael's essay on ETTR is wrong on this point.
« Last Edit: September 17, 2009, 02:35:04 PM by ejmartin » Logged

emil
bradleygibson
Sr. Member
****
Offline Offline

Posts: 829


WWW
« Reply #26 on: September 17, 2009, 08:04:28 PM »
ReplyReply

Thank you for your comprehensive reply, Emil.  We chose s10e5 (one sign bit, 10-bit mantissa and 5-bit exponent) partly for just the reason you mention -- to cover the anticipated DR.  For very demanding (ie. future or very high dynamic range applications IEEE 754 32-bit float is supported, with 23 bits of mantissa (s23e8).

Quote from: Ray
There seems to be an implication from Brad that a JPEG XR image can have the processed, punchy appearance of an in-camera conventional jpeg, yet still retain the potential for highlight and shadow detail recovery through floating point arithmetic. How does this work?

To deal with the clipping problem, let me use integer numbers, since they are better understood.  The simple answer is to not use all the values from 0 to 65536 (in 16 bit integer) to encode all values from black to white.  If you instead decided to consider, say, 16 to be black and 240 to be white, you would be able to do some limited amount of processing on the file, and numbers going beyond 240 or below 16 would retain their distinct values (ie. Would not clip).  This is effectively what the Fixed Point formats are, although the actual technique is a bit more sophisticated.

Floating point allows us to provide a very large range of values below black and above white for a reasonable economy of bits.  For a 12 or 14-bit ADC, taking into account a real noise floor, the full range of data can be encoded in to float16 very well.  For best quality 16-bit ADC applications, float16 would still serve well, but some will undoubtedly want couple more bits in the mantissa, so for those applications float32 serves very well, and will completely encode all the information captured by forseeable devices over a HUGE dynamic range (many orders of magnitude larger than any devices available today).

How the data beyond black or white might get used is a whole 'nother can of worms, but hardware is available and on the horizon which is able to take astonishing advantage of this information.

At the very least, the information is not clipped and thus can be later recovered and utilized in subsequent edit sessions or when outputting to a different device (eg. screen vs. printer, for example).
« Last Edit: September 17, 2009, 09:59:17 PM by bradleygibson » Logged

Ray
Sr. Member
****
Offline Offline

Posts: 8847


« Reply #27 on: September 17, 2009, 11:43:25 PM »
ReplyReply

Quote from: bradleygibson
How the data beyond black or white might get used is a whole 'nother can of worms, but hardware is available and on the horizon which is able to take astonishing advantage of this information.

At the very least, the information is not clipped and thus can be later recovered and utilized in subsequent edit sessions or when outputting to a different device (eg. screen vs. printer, for example).

Interesting! On my travels I have found that a surprising number of fellow travellers sporting DSLRs shoot in jpeg mode. Once, many years ago on a trip, I found I was quickly running out of compact flash memory and switched from RAW mode to jpeg-fine mode. Despite my deliberately underexposing in order to retain full highlight detail, I found I had many shots, particularly of waterfalls, with blown highlights which I could do nothing about. I was generally disappointed with the results and have used RAW mode ever since.

On occasions I have shot RAW + JPEG just to see if there was any in-camera processing of the RAW image which was more pleasing than what I could achieve easily myself converting in ACR and further processing in Photoshop, but there wasn't, so no point except perhaps a saving of time when one wants a quick result.

A quick result that looks good, but with the potential to recover the full detail recorded by the sensor, would definitely be worth having.
Logged
Jonathan Wienke
Sr. Member
****
Offline Offline

Posts: 5759



WWW
« Reply #28 on: September 18, 2009, 07:01:08 PM »
ReplyReply

Quote from: ejmartin
This is simply incorrect; it would pay to check facts before calling someone ignorant.  

There is a big difference between floating point and integer encoding.  In integer encoding, in all parts of the range -- say 0 to 255 for 8-bit -- the spacing between levels is the same, be it the jump from 0 to 1 or the jump from 254 to 255.  The dynamic range, which is the max signal divided by the quantization step at zero signal, is 255~2^8.

In floating point encoding, some number of bits is used to store a number (the mantissa) between zero and one to a certain degree of precision, and the remaining bits are used to store the exponent.  Suppose again we take 8 bits, and devote 5 to the mantissa and 3 to the exponent.  The mantissa is a number from 0 to 31 (2^5 possible values from the 5 bits); we get a number between zero and one by dividing by 32=2^5.  Then the three bits of the exponent tell us multiply the mantissa by two to the power of the exponent, which in this example can be any number among {1,2,4,8,16,32,64,128}=2^n for n={0,1,2,3,4,5,6,7} (2^3 possible values for the exponent from the three exponent bits).  Thus the gap between levels grows exponentially with the value of the exponent.  At the low end, the spacing of levels is 1/32 between the first two encoded levels, 0 and 1/32; the gap between the two largest levels is 4 ((30/32)*2^7 and (31/32)*2^7).  The dynamic range is the max signal divided by the quantization step at zero signal, which is (31/32*2^7)/(1/32)~2^12.

I'm aware that the exponent can be varied in a floating-point number format. If such a floating-point encoding scheme is used for RAW data, those bits are completely wasted. All ADC converters have a fixed dynamic range and an integer output. Converting the RAW output to a floating-point format with a variable exponent is pointless, because there is only one optimal exponent value: 1. If the exponent value is greater than one, then you must either throwing away information by mapping multiple input values to a single output value, or else using an unnecessarily large number of bits to encode a RAW value. If the exponent value is less than one, then you are wasting bits specifying values to the right of the decimal point that are always zero--just the same as if you padded 12-bit values with zeroes to make them 16-bits.

You're also ignoring the fact that regardless of how many bits are devoted to the mantissa vs the exponent, you can't escape the fact that 32 bits can only represent 2^32 discrete values, regardless of whether those values are interpreted as integers or floating-point numbers. And as long as ADCs linearly output integer values, converting to a floating-point RAW format is at best a break-even proposition regarding the number of bits required to encode a given dynamic range, and is most likely to be less efficient--more bits needed to encode the same information. What's the point?
Logged

bradleygibson
Sr. Member
****
Offline Offline

Posts: 829


WWW
« Reply #29 on: September 18, 2009, 08:13:52 PM »
ReplyReply

Quote from: Jonathan Wienke
I'm aware that the exponent can be varied in a floating-point number format. If such a floating-point encoding scheme is used for RAW data, those bits are completely wasted. All ADC converters have a fixed dynamic range and an integer output. Converting the RAW output to a floating-point format with a variable exponent is pointless, because there is only one optimal exponent value: 1. If the exponent value is greater than one, then you must either throwing away information by mapping multiple input values to a single output value, or else using an unnecessarily large number of bits to encode a RAW value. If the exponent value is less than one, then you are wasting bits specifying values to the right of the decimal point that are always zero--just the same as if you padded 12-bit values with zeroes to make them 16-bits.

You're also ignoring the fact that regardless of how many bits are devoted to the mantissa vs the exponent, you can't escape the fact that 32 bits can only represent 2^32 discrete values, regardless of whether those values are interpreted as integers or floating-point numbers. And as long as ADCs linearly output integer values, converting to a floating-point RAW format is at best a break-even proposition regarding the number of bits required to encode a given dynamic range, and is most likely to be less efficient--more bits needed to encode the same information. What's the point?
I'm sorry, Jonathan, that is also incorrect.

When writing imaging data in floating point format, one does not simply encode the integer value using floating point format!

Floating point numbers do not work well with padding--in fact, most implementations of floating point arithmetic (including Intel's processors) will penalize you performance-wise if you try to do this--in fact, in certain circumstances, padding may not even work at all.  Floating point numbers should be normalized for best performance and precision.

Allow me to illustrate one common way imaging information is encoded into floating point numbers, applicable to this thread (ie. the way it's done for JPEG-XR floating point formats).  The saturation (max) output of the sensor is mapped to 1.0 (or 0.9999....)  When equating the integer 16-bit values you are familiar with to floating point values you would see the following:  (Note that the exponent takes on a range of values, in particular, negative values.  This does not mean the number is negative, only that the value is specified at very high precision.  This is what gives floating point representations their capability to render shadow detail at exceptionally fine fidelity.)

16-bit Integer => Floating point value / Exponent value
   65535 => ~1.0 / 0
   32768 => 0.5 (1/2) / -1
   16384 => 0.25 (1/4) / -2
     8192 => 0.125 (1/8) / -3
    ...
          4 => 0.0000610351... (1/16384) / -14
          2 => 0.0000305175... (1/32768) / -15
          1 => 0.0000152587... (1/65536) / -16
          0 => 0.0 / -127

The above works because when working in linear space, as sensors do, double the photons gives you double the numeric value, and half the photons gives you half the numeric value.

I'll leave the topic of why the exponents are these particular values for another discussion--just know that this is an international standard (IEEE 754 if you wish to look it up)--for our purposes here, it is easy to see that a variety of exponents are used to encode normal values from the sensor ranging from normal black to normal white.  There is no padding and the number of values from 0 to 1 is not limited to the number of bits in the mantissa.

When the manufacturer makes the proper calibrations for non-linearity of the ADC, the noise floor of the sensor, etc., the signal can be encoded quite nicely into a 16-bit floating point value ranging from 0.0 to 1.0 for almost every consumer device, and certainly into a 32-bit floating point value.  And the fact that 0.0 and 1.0 are not the limits of the range of numbers that can be encoded, it becomes very difficult for users to clip their data in the course of performing normal imaging operations.

As I have been saying throughout this thread, giving users the ability to manipulate their data, without needing to understand or worry about side-effects like clipping was one of the design goals of JPEG-XR.
« Last Edit: September 18, 2009, 08:58:08 PM by bradleygibson » Logged

ejmartin
Sr. Member
****
Offline Offline

Posts: 575


« Reply #30 on: September 18, 2009, 08:15:54 PM »
ReplyReply

Quote from: Jonathan Wienke
I'm aware that the exponent can be varied in a floating-point number format. If such a floating-point encoding scheme is used for RAW data, those bits are completely wasted. All ADC converters have a fixed dynamic range and an integer output. Converting the RAW output to a floating-point format with a variable exponent is pointless, because there is only one optimal exponent value: 1. If the exponent value is greater than one, then you must either throwing away information by mapping multiple input values to a single output value, or else using an unnecessarily large number of bits to encode a RAW value. If the exponent value is less than one, then you are wasting bits specifying values to the right of the decimal point that are always zero--just the same as if you padded 12-bit values with zeroes to make them 16-bits.

Apparently my previous post was not sufficiently clear.  If you are hung up on translating integer ADC output to floating point notation, think of it this way.  Returning to my example of partitioning 8 bits into 5 mantissa bits and 3 exponent bits (and correcting some math errors), simply don't divide the mantissa by 32. The idea of floating point is not contingent on the mantissa being a number between zero and one.  One now has the mantissa as an integer from 0 to 31.  Add 32 to get a number from 32 to 63.  Multiply by 2^exponent; the three bit exponent takes values from 0 to 7.  So I have just supplied an algorithm to take 8 bits and encode numbers for which the range is from 32*2^0=32 to 63*2^7=8064, or nearly 13 stops from 8 bits.  One achieves this through the gap in the quantization step growing expoentially in concert with the growth in the exponent.  For instance, the quantization step is one for the encoded values from 32 to 63; two for the encoded values from 64 to 126; four for the encoded values from 128 to 252; and so on.  This is not ideal for photographic image encoding (it does not achieve maximal data compression) but it is better than straight integer encoding, and it is sufficient if in all parts of the encoded range, the noise is sufficiently larger than the quantization step.  Ensuring that this is the case is an issue of how to partition the bits between mantissa and exponent, and the well depth of the pixels whose data is being encoded.  

The exponent is an instruction to do a bit shift by the amount specified by the exponent.  This is fine if the low order bits of large encoded values are irrelevant.  They are in digital imaging, because of photon shot noise.  Floating point encoding sets some number of least significant bits to zero, depending on the overall level, rather than the recording the random values they would take due to photon shot noise.  The difference is insignificant.

Quote
You're also ignoring the fact that regardless of how many bits are devoted to the mantissa vs the exponent, you can't escape the fact that 32 bits can only represent 2^32 discrete values, regardless of whether those values are interpreted as integers or floating-point numbers. And as long as ADCs linearly output integer values, converting to a floating-point RAW format is at best a break-even proposition regarding the number of bits required to encode a given dynamic range, and is most likely to be less efficient--more bits needed to encode the same information. What's the point?

The point, as I tried to make clear, is that those 2^32 nonlinearly encoded values are spread over a much larger linear range.  If the values in the linear range that are skipped over are not discernable due to noise in the system, it is just as well they were left out.  As I pointed out in my initial post, Nikon's "lossy" compression works to the extent that it is properly implemented because of this fact.
« Last Edit: September 19, 2009, 12:19:28 AM by ejmartin » Logged

emil
bradleygibson
Sr. Member
****
Offline Offline

Posts: 829


WWW
« Reply #31 on: September 18, 2009, 10:54:45 PM »
ReplyReply

Quote from: Panopeeper
This would not work. It is not the question of accuracy but the number or retained levels. The raw conversion, like contrast enhancement, but particularly the "gamma encoding" drastically reduces the number of tone levels; accuracy does not help on that.

JPEG XR may supersede the currently used lossless JPEG encoding for storing the raw data. However, the presentation version of an image is not an alternative for the raw data.

I'm sorry, Gabor, it is not my intent to be contrarian, but I feel I must correct you here on that bit of misinformation.

It does work--with floating point data, we leave the "gamma encoding" to the presentation profile.  Do a demosaic with a gamma of 1.0, and add your display gamma in on presentation.  The image data retains maximum fidelity and gains benefits from being edited in linear space rather than in gamma 2.2/2.4.

In short, it works just fine.
« Last Edit: September 18, 2009, 11:19:03 PM by bradleygibson » Logged

Panopeeper
Sr. Member
****
Offline Offline

Posts: 1805


« Reply #32 on: September 18, 2009, 11:34:55 PM »
ReplyReply

Quote from: bradleygibson
It does work--with floating point data, we leave the "gamma encoding" to the presentation profile.  Do a demosaic with a gamma of 1.0, and add your display gamma in on presentation.  The image data retains maximum fidelity and gains benefits from being edited in linear space rather than in gamma 2.2/2.4.
Ray's suggestion was

retain the potential for highlight and shadow detail recovery through floating point arithmetic

The gamma curve is the last one; there is "S" curve, specific contrast enhancement, highlight- and shadow recovery, black point, white point. In order to be able to do that, the entire spectrum of the demosaiced linear data must be retained.

The next step would be not to perform the color space conversion but to leave the data in the camera's color space, and let the presentation do the conversion depending on the medium. This is logical, isn't it? (The gamma curve depends on the color space, but if that is not fixed, then the color space too can be changed). Your image shot in raw could be presented much better on a monitor with ProPhoto capability; I am sure such monitors will come. This would allow even for the color adjustment (like it is being done by Picture Style or Camera Profile). Adjusting saturation and brightness directly at the presentation is only natural.

There are two problems with this:

1. When working on an image for example in PS, previous steps must be "fixed" in order to make certain adjustments. You can't make for example enhancements by curves or on selected colors without knowing the result of the previous steps.

2. I am pretty sure, that most photographers would not publish their worthy images in this quasy raw format, which is fully open for further adjustment.
Logged

Gabor
bradleygibson
Sr. Member
****
Offline Offline

Posts: 829


WWW
« Reply #33 on: September 19, 2009, 08:56:35 AM »
ReplyReply

Quote from: Panopeeper
Ray's suggestion was

retain the potential for highlight and shadow detail recovery through floating point arithmetic

The gamma curve is the last one; there is "S" curve, specific contrast enhancement, highlight- and shadow recovery, black point, white point. In order to be able to do that, the entire spectrum of the demosaiced linear data must be retained.
I don't know how to explain it any more clearly than I have.  There is no problem making contrast (or hightlight or shadow recovery or black point or...) adjustments in linear space.  One's editing program simply views the results after gamma adjustment, in real time.

Quote from: Panopeeper
The next step would be not to perform the color space conversion but to leave the data in the camera's color space, and let the presentation do the conversion depending on the medium. This is logical, isn't it? (The gamma curve depends on the color space, but if that is not fixed, then the color space too can be changed). Your image shot in raw could be presented much better on a monitor with ProPhoto capability; I am sure such monitors will come. This would allow even for the color adjustment (like it is being done by Picture Style or Camera Profile). Adjusting saturation and brightness directly at the presentation is only natural.
I can think of only two reasons to avoid a color space conversion: i) destination color space is insufficient for the required purposes ii) destination bit format would cause an unacceptable loss in fidelity (precision).  We believe that a floating point representation with the same primaries as sRGB addresses both issues.   If a users disagrees, the traditional pixel formats in which you can leave the data unmolested remain available.

"Only natural"?  I don't really know what to make of 'only natural', but let me say that in practice, the techniques I've described work just fine.  If you've viewed or edited a JPEG image file in Vista or Windows 7 you've already been using the techniques I've described.  This technique has been shipping in Windows for the past 2 1/2 years, worldwide.  I know, because I led the team that replaced the Windows XP imaging pipeline--we spent four years developing the hardware-accelerated floating point imaging pipeline and made sure it would work with JPEG-XR, and even with legacy file formats (ie. JPEG, TIFF).  So it is hard to say that these ideas don't work, because the world is already using them!

Quote from: Panopeeper
There are two problems with this:

1. When working on an image for example in PS, previous steps must be "fixed" in order to make certain adjustments. You can't make for example enhancements by curves or on selected colors without knowing the result of the previous steps.

2. I am pretty sure, that most photographers would not publish their worthy images in this quasy raw format, which is fully open for further adjustment.
1) Yes, when re-editing and baking those changes into the bits you will accumulate error based on the precision of the pixel format.  Of course!  This is true for any non-parametric edit in any format--this is no exception--except that those concerned can use pixel formats with very high precision (low error).  Further note that JPEG-XR does not prevent anyone from using parametric edits, should they wish to, but then portability issues become the primary concern.  There is no free lunch!

2) This argument sounds like you are saying that photographers won't like it because the format is too good!  To paraphrase my understanding of this point: "Having one's digital files available in all their fidelity for others to edit might be too scary for people."  Perhaps, but I don't think most photographers are relying on the limitations of a pixel format to protect their intellectual property--typically, they publish a low-resolution preview.  Indeed, by the amount of FUD being spread in this thread, it is hard to imagine that the technology is well-enough understood by the general population for "security by pixel format" to be the case.

Bottom line is this: People needed to know and understand too much to edit their images without damaging them.  We did our best to develop an improved tool for photographers which could work very well at helping people have i) better quality images and ii) a simpler workflow without any additional effort, knowledge or special training on their part.

No one is claiming the technology saves babies (though there may be applications in the medical field...), solves world hunger or is perfect in any other way.  Simply that special, third party tools should no longer be NECESSARY to get the most out of your files.

At the risk of repeating myself (again): you still want 3rd party tools? Mosaic'ed data?  Parametric edits?  16-bit integer?  Clipping?  8-bit integer for security reasons? No problem--to the best of my knowledge JPEG-XR has not taken away any of your choices, but rather offered additional, and we feel, more effective ones.

The JPEG committee met in Lausanne back in 2007 and seemed to agree--they began the process which has since made this technology an international standard.  That seems to be a pretty good endorsement in my book.

Gabor, I do respect your knowledge and expertise in the integer raw domain.  I have seen you take countless hours to help others understand with graphs, charts, screenshots, in numerous posts to help clarify misunderstandings and put solid information out there (although, at times, I personally wish the presentation was more civil).  But in this thread, there has been too far much supposition and speculation masquerading as fact.

I feel the information is now out there, and you and others may choose to accept it or not, but that is, of course your/their choice.  For me, though, the theme in this thread that "it won't work" has run its course.
« Last Edit: September 19, 2009, 10:16:54 AM by bradleygibson » Logged

sandymc
Full Member
***
Offline Offline

Posts: 248


« Reply #34 on: September 20, 2009, 08:44:44 AM »
ReplyReply

Quote from: bradleygibson
For me, though, the theme in this thread that "it won't work" has run its course.

Yes, lets all agree that it will work. However, "will work" and "is a good idea" are different. I think perhaps you skipped over the floating point issues a little fast - the thing is, floating point is not an efficient way to code information if your information has a fixed range. Practical example, because all the claims and counterclaims above without any examples was giving me a headache:

If you have an image, and you gamma encode to 2.0 (2.0 just to make the math easier), the mid-tone stop of that image is from 0.17 to 0.25 on a 0-1 scale. Using good old fashioned 16 integer encode, you get of the order of 4794 discrete levels within that mid tone stop.

However, if you use the s10E5 encoding, and assuming I understanding it correctly, the exponent doesn't change, so you get about 655 discrete levels. That's about 15% of the resolution in the mid tones for the float encoding vs the integer encoding. Given the choice between more mid-tone resolution, and the ability to encode a pixel value so small it will never exist, I, and I think most photographers, will take the mid tones.

Now I may have punched the wrong key on my calculator somewhere in there, but the principle applies pretty generally; the DNG format has had the ability to use float encoding since it was introduced. But I've never seen a single DNG image that used float; it's just not an efficient way to use bits for storing images. Float is of course a good way to process images, but that's a different situation.

Sandy
Logged
ejmartin
Sr. Member
****
Offline Offline

Posts: 575


« Reply #35 on: September 20, 2009, 10:57:13 AM »
ReplyReply

The most efficient use of levels for RAW encoding would be to perform a gamma transform with gamma=2.0, except for a linear portion at the low end (similar to the linear segment of the transform used in sRGB and Lab), and then use integer encoding.  This linearizes the shot noise so that in each part of the range, the noise dithering is a fixed number of integer levels.  At low levels one wants to account for the change in the dithering due to read noise; and also allow negative levels, again because of the read noise, which would be difficult with the gamma transform.

Floating point, for which the quantization step grows exponentially with the level, indeed uses too many levels at the low end, as Sandy says.
Logged

emil
knweiss
Jr. Member
**
Offline Offline

Posts: 71


« Reply #36 on: September 20, 2009, 11:02:54 AM »
ReplyReply

Quote from: Panopeeper
2. I am pretty sure, that most photographers would not publish their worthy images in this quasy raw format, which is fully open for further adjustment.
And musicians would not publish their music in CD quality. Oh, wait...  
« Last Edit: September 20, 2009, 11:03:22 AM by knweiss » Logged
knweiss
Jr. Member
**
Offline Offline

Posts: 71


« Reply #37 on: September 20, 2009, 11:22:11 AM »
ReplyReply

Quote from: GLuijk
Present JPEG is good enough to show your images on the net and even to print them. Why should users and the industry make any effort to change what already works fine enough?
Two reasons: 1. File size - Even though disk space is cheap these days I would very much appreciate smaller files out of my Canon 5D Mark II. 2. More than 8-bit/channel. So, personally I would really like to have the option to use a better file format.
Logged
Panopeeper
Sr. Member
****
Offline Offline

Posts: 1805


« Reply #38 on: September 20, 2009, 03:39:31 PM »
ReplyReply

Quote from: knweiss
And musicians would not publish their music in CD quality. Oh, wait...  
The technology of music and photography have very much common, so this is a splended analogy, for sure. Btw, in which shop do you get sheet music (gedruckte Noten) and the master recording with the CD?
Logged

Gabor
Panopeeper
Sr. Member
****
Offline Offline

Posts: 1805


« Reply #39 on: September 20, 2009, 05:58:29 PM »
ReplyReply

Quote from: bradleygibson
I don't know how to explain it any more clearly than I have.  There is no problem making contrast (or hightlight or shadow recovery or black point or...) adjustments in linear space.  One's editing program simply views the results after gamma adjustment, in real time
There is no need to explain it at all. The technical capability was not the question; if you go back in the posts, you find that I suggested already at the beginning, that JPEG XR be used for the raw data encoding. Furthermore, it is unquestionably better for storing the presentation version than the legacy JPEG.

However, you went very far from that point. What you are suggesting is, that a semi-raw format be used for the presentation, instead of the currently used final format. This is not the question of the technical possibility; it is the question of basic attitude:

do photographers want to distribute their images in a semit-raw format or do they want to have the final say about their appearance?

If you find many photographers, who want to go that way, then your position was the right one. My opinion is, that we may see JPEG XR two times: once for storing the raw data, and once for storing and distributing the presentation version in final form.
Logged

Gabor
Pages: « 1 [2] 3 »   Top of Page
Print
Jump to:  

Ad
Ad
Ad