[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Understanding lzma-302eos

From: Antonio Diaz Diaz
Subject: Re: Understanding lzma-302eos
Date: Wed, 24 Aug 2022 01:45:33 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv: Gecko/20110420 SeaMonkey/2.0.14

Hi Hoël,

Hoël Bézier wrote:
One thing (amongst many!) that I fail to figure out is why the range
decoder skips the first five bytes of the lzma stream. This happens in
the Range_decoder constructor in lzd code:

Range_decoder() : member_pos( 6 ), code( 0 ), range( 0xFFFFFFFFU )
for( int i = 0; i < 5; ++i ) code = ( code << 8 ) | get_byte();

Note that the code above does not "skip the first five bytes"; it shifts 5 bytes into 'code', of which the last 4 remain in 'code' after the shifting. It is equivalent to:

  for( int i = 0; i < 4; ++i ) code = ( code << 8 ) | get_byte();

This is also confirmed by the ietf draft:
The range encoder produces a first 0 byte that must be ignored by the
range decoder. This is done by shifting 5 bytes in the
initialization of 'code' instead of 4.

Note "shifting", not "skipping". BTW, the first 0 byte is the contents of 'cache'. See line 234 of encoder_base.h in the source of lzip-1.23. Any value you initialize 'cache' to will be copied in the compressed file, but it will not affect the decoding.

On a side note, this code snippet shows that the first five bytes are
used to update the code, which is the current point in the range,
according to the ietf draft, but range is not updated.

The constructor initializes 'range' to its initial value; the maximum range possible. Then loads into 'code' the 4 most significant bytes of the initial point in that range as produced by the encoding of the first bytes of data. From there on, 'range' is multiplied by 256 and a new compressed byte is shifted into 'code' each time 'range' falls below 0x01000000. (See 'decode' and 'decode_bit').

Best regards,

reply via email to

[Prev in Thread] Current Thread [Next in Thread]