-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: gain maps for PNG #380
Comments
Explanation of Proposal: gain maps for PNG
|
What problem is this solving since PNG already supports native HDR imagery? |
Good question. PNG supports a single HDR image, yes. This solves (or, their spec claims to solve):
I have not seen any data regarding the third claim. |
Here's a question for the PNG WG: |
I haven't read the actual spec, so I can't comment on whether gain maps are a good idea for PNG. But I can give some feedback on the gMAP chunk itself: Many of the fields are highly constrained, if not outright redundant. Compression method and filter method both have only a single valid value and even if more were added in the future, I don't see why defining the gain map to use the same filtering and compression as specified in the IHDR would actually cause any issue. That's what APNG does after all. The height field in the gMAP chunk is similarly redundant: given the dimensions in the IHDR and the width, there's only one valid value it could hold for the aspect ratios to match. Further, the components and bit depth fields are 8-bit values that each only have two valid values. Several of the other fields seem like they might be expected to match metadata elsewhere in the image, but without spec access I can't say whether mismatching values could make sense. Regardless, care should be taken that the meaning of metadata here exactly matches elsewhere in the PNG spec. Even using fixed precision in the gAMA chunk and floating point here probably isn't ideal. Another point that's worth noting but probably not a huge deal at this point is that AFAIK this would be the first PNG chunk with a floating point value in it. |
My initial thought (I haven't finished reading--will update as I do) is that this is similar to other non-RGB channels. And if we go the new chunk route, it makes me wonder if each new non-RGB channel should have a unique chunk (gMAP & gDAT) or if there should be some shared chunk. (Also, the rules for different resolutions sound great. That could apply to YUV-type images some day.) |
Is there time pressure to get an answer back to ISO quickly? |
I have asked that the ISO draft spec be made public to facilitate discussion, but so far there has been no reply. |
These are all good points and I was trying in this first draft to not close off any extensibility points and to follow the ISO specification literally. "Same as IHDR" could certainly work, and "must have same dimensions as |
Yes, I am aware. There have been attempts to use float data before, and IIRC they used human readable strings like "3.14" which seems like an interop nightmare. |
None of them are matching data elsewhere in the image. The gamma value is a gamma applied to the gain map data, not the image gamma, for instance. Again, hard to tell without the actual specification I know; I wish I was allowed to share it. |
There is a joint meeting in Tokyo in February 2024 where HDR in general and HDR gain maps at ISO, in particular, will be discussed. I have been asked to report on all W3C HDR-related work at that meeting. |
I just noticed this, published 2024-01-05: |
The UltraHDR format is basically identical to Adobe's white-paper. I think it got released mid-2023. Just as a FYI, there is a proposed update to the 21496-1 draft that outlines what things an image format needs to specify to correctly store a 21496-1 gain map, as well as a recommended binary payload containing the required metadata. The metadata payload is not enough since the image container needs to specify things like which image is the base, what the base and alternate color spaces are and similar. |
It would be really useful to get a copy of that update so I can read it ahead of the Tokyo meeting. |
I just realized that ordering might be an issue. Or it might not. If gainmap information comes after IDATs, a browser might display an SDR image during download and then have an update pass which applies the gainmap as it is downloaded. This is unlike RGB channels, which are interleaved. If gainmap information comes before IDATs, it would not cause that update pass. But it would delay coloring in pixels, which is an indicator to users that progress is happening. This problem applies to all channels which will be displayed. Additionally, progressive enhancement could apply, furthering the complication on interleaving. PNG is built around treating image data as bytes and not pixels. This has a handful of downsides. But one potential upside is it allows adding new interleaved channels. |
I didn't notice it before, but you bring up a good point. Regardless of whether gDAT comes before or after the IDAT chunks, I think that gMAP should be required to come before the IDATs. The existence of a gain map is metadata and so having that requirement would align it with the other metadata chunks. It would also prevent decoders from having to seek past all the IDATs to know whether the image has a gain map |
We might be able to interleave gDAT and IDAT chunks. I need to double check this against existing decoders and the spec. But right now, I think it would work. But I could imagine a scenario where either the spec or the decoder say it will be a stream of IDATs until an IEND is reached. I think interleaving gDAT and IDAT is our best bet for adding more visible channels beyond RGB. The downside is we aren't interleaving channels on a per-pixel level like RGB data already does. But it is interleaving as much as possible in a backward- & forward-compatible way. |
For my own education, can someone share why this might be preferred over another image format which supports gain maps and alpha. Both AVIF and JXL could support those needs, and at much smaller file sizes than PNG. Is there some other important use case I'm overlooking? I could see an argument for backwards compatibility here. However general AVIF support is now in all major browsers, so that's probably going to be fairly niche by the time we'd have support for PNG gain map rendering. Certainly is editing software and such and perhaps the interest in legacy support is higher than I might guess. Just not sure if I'm overlooking some other gap this might address. |
Fundamentally, because competition is good. You mentioned backwards compatibility. And you mentioned that perhaps it should be given more weight. I might be able to provide some examples to justify more weight. For example, the US Library of Congress lists PNG as a preferred format and will not accept AVIF and JXL. That has knock-on effects for organizations working with the government. There is also forward compatibility. PNG is VERY forward compatible, in ways almost no file format is. But to be fair, almost no tooling uses this. I should make a ranty blog post about that :) I think more to your point, AVIF and JXL can produce smaller files currently. For PNG Third Edition, we aren't planning any improvements to compression. But there is a good chance that Fourth Edition will. At which point you could perhaps ask your same question to AVIF & JXL. (This goes back to my "competition is good" point.) When we talk about compression, we're really talking about 3 separate things. Only one of them is compression. The others are tuning the data for the observer (IE jpeg's chroma subsampling & mp3s dropping sounds we likely wouldn't perceive) and tuning the data to the compressor (IE png's filtering or just general quantization). PNG is based on Huffman coding, which is proven to be the most efficient type of symbol compression. If we treat "compression" as those 3 distinct steps, we can provide optimal file sizes, too. The problem right now is PNG doesn't get chroma subsampling, for example. So it would be a shame for humanity to abandon PNG (or Huffman in general). Hope this helps :) |
@ProgramMax Thank you for the extra context and info! |
Interleaving gDAT and IDAT chunks wouldn't be backwards compatible. The version 1.2 spec says:
|
Shoot. |
I'd like to know people's thoughts on alternate image metadata. Both the base image (the normal image) and the alternate image (the image that results from fully applying the gain map) can have the full complement of 22028-5 metadata. This would include the color space (as sRGB, iCCP, or cICP), along with mDCv, cLLI, and potentially others (22028-5 include ccv, ndwt, and reve). What would be the best way to represent these in a PNG? Option A:
Option B:
Option C:
Of note is that for JPEG, the image is stored using MPF (CIPA DC-007), which requires that the gainmap image be stored contiguously and entirely-after the base image. This is more like "Option B". (Also, FWIW, I'm trying to get the liaison thing pushed forward). |
IIUC for Options A & B you're considering nesting chunks, right? If I could start over from scratch, I think I would make channels more dynamic. Rather than hard coding RGB/grayscale and then adding an additional channel for gainmaps, I think I would allow channels to be tagged with the data they support. So gainmaps would be just another channel. Decoders support the channels they choose. Then metadata could specify what channels they apply to. Currently, all chunks are assumed to apply to all channels. If we do end up allowing additional channels, perhaps we could add a chunk that clarifies "the following chunk applies to these channels: 0,1,2" or something similar. That would let us avoid nesting, if we decide that is a good thing. But it does so by requiring additional state be tracked across chunks. Given that things like color space already are state tracked across chunks, I think that would be okay. |
Remind me why gain map metadata is useful in the context of PNG, which is a lossless image format? What use case(s) is this targeting? |
The gain map metadata is key to adapting the image to the display. It provides the information necessary to determine how much weight to give the alternate vs base image. It also describes the encoding of the gain map data (which may use different ranges, gammas, and offsets). The gain map on its own isn't a real image nor a complete set of instructions to render the alternate version of the image. |
This is not my area of expertise so please correct my knowledge gaps. My understanding is the SDR base image is itself a good image to use on SDR displays. Also as I understand it, the data being separate allows for better tone mapping across various HDR display capabilities. (And since it is per-pixel, maybe even allowing the artist's hand to be involved.) |
A gain map specifies 2 images. The base is a real image and the alternative is derived by applying per-pixel multiples to the base (which are determined using the gain map and metadata). This is almost like providing two images, but at a significantly smaller total size. At the extremes, the alternate image is handcuffed to the base image to a degree - but with proper encoding it provides tremendous latitude to derive a high quality alternative version of the same scene (this would not generally be a good approach for combining two unrelated images). At this time, the base image is SDR and the alternate would be an HDR. But we'll see more variation in time. Because the alternate is likely of lower quality than a true image (at least when optimized to reduce file size), it would be ideal to encode the base image as HDR in the future when HDR screens are more common (and gain maps are widely supported, as using an SDR base offers better backwards compatibility with decodes which do not understand gain maps). One could even encode a gain map for two HDR images (such as a +2 stop version and a +6 stop version when we have support for the full 10,000 nit PQ range), in which case an SDR display would have to tone map down from the +2 stop version. So there would be a tradeoff there, but it is a possible use. Because it is per-pixel, it can be optimized much more than any global tone mapper and can in theory allow significant artistic input to optimize the base image (in practice this depends on the encoding). Additionally (as long as you encode an SDR as base or alternate), no tone mapping needed. The extreme ends of the display range are explicitly provided and you eliminate variability in how one display might be tone mapped compared to another. So consistency is improved as well. |
More info on gain maps from an artistic perspective: https://gregbenzphotography.com/hdr-photos/jpg-hdr-gain-maps-in-adobe-camera-raw/ |
AFAIK no image editing software will use gain maps as their internal representation, i.e., a gain map representation will be generated during export and this will be a lossy step unless it is possible to make the gain map algorithm lossless. Since this step will be lossy, why use PNG and not JPEG, JPEG XL, J2K, etc. which will all result in a smaller image? |
The base image (which could be HDR as the more important version) could be encoded as a lossless image with a gain map. So the alternative (SDR in this example) would be visually (but not completely) lossless - and the base HDR image would be truly lossless. One could of course use another lossless format for archiving as a gain map, assuming PNG is not preferred or the only approved format for a given institution. |
@gregbenz Thanks for confirming. PNG is lossless in people's mind so using in a lossy use case does not seem ideal. |
@palemieux Just to be clear here - This is just my opinion as an artist / independent developer who has worked quite a bit with currently available gain map concepts. I have not done any validation testing to support or disprove my hypothesis. And I do not have a background in archival processes used by governments, museums, etc. |
The lack of a public spec makes it hard to say with certainty, but based on the use of 16-bit floats the gain map should have under 0.1% maximum error. That's far lower what you'd expect from lossy formats like JPEG, and in fact, is better than any 8-bit format can do |
IIUC, I agree with @fintelia. Loss due to quantizing floats to ints from the original edit doesn't cross the lossy/lossless bar in my mind. That has always been true even prior to gain maps. If the original tool used ints and if the gain map equation can be perfectly mapped 1:1, there would be no loss. Perhaps the gain map equation isn't 1:1 mappable? |
To be sure, by "lossless", I meant "mathematically lossless". The point is that PNG is used in applications where mathematically lossless is expected. There are many better image formats for applications where loss is tolerable (whether visually detectable or not). So, I am questioning whether adding support for (lossy) gain maps is worth the effort (and the potential confusion). |
If image editing software wanted to save 240.5 into the red channel, it would need to quantize to an int, too. But that seems to be okay for the "lossless" zeitgeist. |
Someone would not save in PNG if they were authoring in floating point. They would instead save as EXR, TIFF, J2K, etc. |
Yeah, same here (thinking aloud). There is also another option I hadn't thought of: Option C
Option D:
I'll try to figure out in the ICC meeting if they're thinking to roll more metadatas into profiles (it would be nice to have a "one stop shop" for that stuff). |
PNGFor me, the main use-case for PNG is when:
If I need an ultra-portable format for photographic content I would use JPEG. The fact that PNG is lossless has very little to do with whether I choose it or not. (Not saying that this is not important to other use-cases, but it's usually not that important to my use-cases.) Gain mapsFor me the main use-case for 21496-1 gain maps is twofold:
The reason for adding gain maps to PNG is really to address the following:
JPEG-XL is very much not portable. AVIF is more portable but not when compared to the portability of PNG. And AVIF may end up compressing worse than PNG for non-photographic content. If gain maps can be added to PNG in such a way that they are backwards compatible I think it makes very much sense to do so. If it can't be made backwards compatible it doesn't really make much sense to add it though. |
|
Two private PNG chunks are defined: * gmAP: contains a binary blob that is an ISO gainmap payload * gdAT: contains a PNG-encoded image that is the gainmap. The base image contains both a gmAP and a gdAT chunk: the gmAP chunk only contains ISO versioning (for future-proofing). The gainmap image will contain only a gmAP chunk that actually contains the gainmap metadata. If there's a nested gdAT chunk upon DEcoding, then we drop it on the floor. This is pretty much option B described in: w3c/png#380 (comment), but we are using privately-defined chunks because the spec has not yet been agreed on yet :) Bug: b/329469053 Change-Id: I00da00f241eb02d3f19384b3525bd8650b368a9e Reviewed-on: https://skia-review.googlesource.com/c/skia/+/926765 Reviewed-by: Florin Malita <fmalita@google.com> Commit-Queue: Alec Mouri <alecmouri@google.com> Auto-Submit: Alec Mouri <alecmouri@google.com>
Proposal: gain maps for PNG
This proposal has no official standing in PNG WG and is presented for discussion only. Do not implement.
3 Terms, definitions, and abbreviated terms
Insert:
HDR headroom
the ratio of nominal peak white luminance to reference media white luminance.
4 Concepts
Insert, in a new section after 4.3 Colour spaces:
Baseline and alternate HDR images
Given a baseline image - typically SDR - (which will be the PNG reference image) and an HDR alternate image, a gain map is a space-efficient way to store the information needed to reconstruct the HDR alternate image from the baseline image and gain map, without actually storing the entire HDR alternate image. [ISO_21496-1].
In addition, a gain map provides greater display flexibility. Depending on the available HDR headroom, which varies with display brightness and with viewing conditions, a suitable display image can be computed by scaling the application of the gain map (effectively, interpolating the baseline and alternate images) to produce a result with "some HDR".
Gain maps consist of two parts: per-pixel data, and per-image metadata. These are stored in the
gDAT
andgMAP
chunks, respectively.5.6 Chunk ordering
To Table 6, add:
11.3.2.9
gMAP
Gain map metadataThe four-byte chunk type field contains the hexadecimal values
If present, the
gMAP
chunk holds gain map metadata.If
gDAT
is also present, an HDR alternate image may be reconstructed.The
gMAP
chunk contains:Width and height give the image dimensions in pixels. They are PNG four-byte unsigned integers. Zero is an invalid value. Width and Height must have the same aspect ratio as Width and Height in
IHDR
. They should be the same dimensions, but may be sampled down by a factor of 2 or more.Bit depth is a single-byte integer giving the number of bits per sample. Valid values are 8 and 16.
Compression method is a single-byte integer that indicates the method used to compress the image data. Only compression method 0 (deflate compression with a sliding window of at most 32768 bytes) is defined in this specification.
Filter method is a single-byte integer that indicates the preprocessing method applied to the image data before compression. Only filter method 0 (adaptive filtering with five basic filter types) is defined in this specification.
Components is a single-byte integer that indicates the number of gain map components. It must be either 3 (seperate, per-channel gain maps) or 1 (a single gain map is applied to all three channels).
Colour Primaries is a single-byte integer containing an enumerated value from [ITU-T-H.273] which identifies the color space primaries and white point of the reference alternate image.
Application colorspace is a single-byte integer which indicates whether the gain map is applied in the color space of the baseline image or of the alternate image. The value is 0 if the gain map is applied in a (linear-light version of) the alternate image colorspace, and 1 if it is applied in a (linear-light version of) the baseline image colorspace.
Baseline HDR Headroom is a four-byte floating point value IEEE 754 in network byte order, which indicates HBaseline, the HDR headroom of the baseline image. It is encoded as a log base 2 number. For an SDR baseline image, baseline HDR headroom will be zero.
Alternate HDR Headroom is a four-byte floating point value IEEE 754 in network byte order, which indicates HAlternate, the HDR headroom of the alternate image. It is encoded as a log base 2 number.
Example: If the reference media white is 203 cd/m2 and the nominal peak white of the HDR alternate image is 1000 cd/m2 the alternate HDR headroom would be log2 (1000/203) = 2.3004
Version is a single-byte integer containing the gain map version.
Gain min values is either one or three (depending on the value of Components) four-byte floating point value IEEE 754 in network byte order, which indicates min(G), the minimum values for each component of the gain map. It is encoded as a log base 2 number. It is used to normalize and unnormalize the gain map, see [ISO_21496-1] A3.1 and 6.2.2
Gain max values is either one or three (depending on the value of Components) four-byte single-precision floating point values IEEE 754 in network byte order, which indicates max(G), the maximum value for each component of the gain map. It is encoded as a log base 2 number. It is used to normalize and unnormalize the gain map, see [ISO_21496-1] A3.1 and 6.2.2
Baseline offset is either one or three (depending on the value of Components) four-byte single-precision floating point values IEEE 754 in network byte order, which indicates the baseline offset for each component of the gain map, kbaseline. It is used to avoid numerical issues when computing the gain map, see [ISO_21496-1] A2 and 6.3
Alternate offset is either one or three (depending on the value of Components) four-byte single-precision floating point values IEEE 754 in network byte order, which indicates the alternate offset for each component of the gain map, kalternate. It is used to avoid numerical issues when computing the gain map, see [ISO_21496-1] A2 and 6.3
Gamma is either one or three (depending on the value of Components) four-byte single-precision floating point values IEEE 754 in network byte order, which indicates the per-component gamma values applied to each component of the gain as a pre-compression step, see [ISO_21496-1] A3.2 and 6.2.2
11.3.2.9
gDAT
Gain map image dataThe four-byte chunk type field contains the hexadecimal values
The
gDAT
chunk serves the same purpose for HDR alternate images as the `IDAT`` chunk does for baseline images; it contains the per-pixel gain map data. It holds the compressed gain map data. Each chunk contains:The compressed datastream is then the concatenation of the contents of the data fields of all the gdAT chunks, (noting that data fields may be of zero length). When decompressed, the datastream is the complete per-pixel data as a PNG image, including the filter byte at the beginning of each scanline, similar to the uncompressed data of all the IDAT chunks.
The computed gain map data (see [ISO_21496-1] A.2), after any preprocessing (see [ISO_21496-1]A.3), prior to chunking, filtering and compression, is held as a PNG image with colour type 0 (Greyscale) and bit depth 16, for a one-component gain map; or colour type 2 (Truecolour) and bit depth 16, for a three-component gain map. The 16-bit values are two-byte half-precision float16 values [IEEE 754-2008].
there is an open question on required precision and quantisation, the ISO draft is unclear
The text was updated successfully, but these errors were encountered: