openexr-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Openexr-devel] OpenEXR 1.5.0, OpenEXR-Images 1.5.0 released


From: Florian Kainz
Subject: Re: [Openexr-devel] OpenEXR 1.5.0, OpenEXR-Images 1.5.0 released
Date: Tue, 19 Dec 2006 14:34:04 -0800
User-agent: Mozilla Thunderbird 1.0 (X11/20041207)

P.S.: I missed a step in my description of B44.  In the latest version
of OpenEXR, a new per-channel pLinear flag indicates if an image channel
is perceptually linear or logarithmic.

The B44 compressor uses this flag to reduce quantization artifacts in
luminance/chroma images.  The luminance channel is perceptually closer
to logarithmic than linear, and quantization steps should be proportional
to the magnitude of the values encoded in the file.  Chroma channels are
perceptually closer to linear than logarithmic; quantization steps should
be roughly the same throughout the range of encoded values.

In order to compensate for the logarithmic behavior of floating-point
numbers, the B44 compressor warps the s[i] values in input blocks for
perceptually linear channels before computing t[i]:

    s[i] = min (exp (s[i] / 8), HALF_MAX)

Note that this operation limits the range original pixel values that can
be encoded to approximately [-88.7, 88.7], and that only 962 different
warped s[i] values are produced.

Florian



Florian Kainz wrote:
Simon Green wrote:
Is there any more documentation on how the new B44 compression mode
works? At first glance it looks very DXT-like. Has anybody implemented a
GPU version of the decoder yet?


Hi Simon,

Below is a description of how B44 works.  I think the only
similarity between B44 and DXT is that both methods operate
on blocks of four by four pixels.

Regarding decompression on the GPU, no, I have not tried that.
When I tested playing back image sequences from directly disk,
the bottleneck was file I/O bandwidth, not the speed of the
decompression routine.  On the other hand, when the image files
were already present in the operating system's buffer cache
the bottleneck seemed to be uploading the pixels into the
graphics card.  I guess uploading compressed data would require
less bandwidth, but the GPU would have to uncompress the pixels.

Florian


--------------------------


The B44 compressor works on individual channels.  For multi-channel
images each channel is processed Ñ•eparately.  Only HALF channels are
compressed; FLOAT and UINT channels are stored in the file verbatim,
without compression.  The compression rate for an individual HALF
channel is 32/14, or approximately 2.28:1.  The compression rate for
three-channel color images can be increased to 4.57:1 by storing the
pixels in luminance/chroma format, with a full-resolution luminance
channel and two half-resolution chroma channels.

Each HALF channel is split into input blocks of four by four pixels.
Each input block occupies 32 bytes.  Incomplete blocks at right and
bottom edges of a tile or a group of scanlines are filled by repeating
pixels from the rightmost column and bottom row of pixels.

The pixels in a block are labeled s[0], s[1] ... s[15]:

  +------------------------------> X
  |
  | s[0]    s[1]    s[2]    s[3]
  |
  | s[4]    s[5]    s[6]    s[7]
  |
  | s[8]    s[9]    s[10]   s[11]
  |
  | s[12]   s[13]   s[14]   s[15]
  |
  Y

The original pixel values are mapped into sixteen unsigned 16-bit
integers, t[0] ... t[15], such that

    t[i] > t[j]         if s[i] > s[j], and s[i] and s[j] are finite
    t[i] == 0x8000      if s[i] is not finite

Now differences between horizontally or vertically adjacent integer
pixel values are computed according to the following diagram,
resulting in 15 signed integers, d[0] ... d[14].

     0--------- 1--------- 2--------- 3
     |     3          7         11
     |
     | 0
     |
     4--------- 5--------- 6--------- 7
     |     4          8         12
     |
     | 1
     |
     8--------- 9---------10---------11
     |     5          9         13
     |
     | 2
     |
    12---------13---------14---------15
           6         10         14

In the diagram,

     5--------- 6
          8

means that d[8] is the difference between t[5] and t[6].

d[0] ... d[14] are scaled and biased to form fifteen unsigned six-bit
values, r[0] ... r[14],

    r[i] = floor (d[i] * pow (2,-s) + 0.5) + 32

where s is chosen so that for i = 1, 2, ... 14

    32 <= floor (d[i] * pow (2,-s) + 0.5) <= -32

r[0] ... r[15] can be thought of as a set of fifteen floating-point
numbers with a shared exponent, s.  (Kind of like Greg Ward's RGBE
format for Radiance picture files.)

Finally, t[0]+e (16 bits), s (6 bits) and r[0] ... r[14] (90 bits)
are packed into a fourteen-byte output block.

e is a small integer offset which is chosen such that the maximum of the
original sixteen input pixel values (s[0] ... s[15]) will be recovered
exactly when when the output block is decoded later.  If the maximum
value in a pixel block is off by a small amount, the error is more
likely to be visible than when some of the smaller values are off.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]