[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Tue, 23 Aug 2022 22:15:25 +0200
I’ve recently decided to learn the Hare language, and figured that implementing
lzip support for it would be a good way to start.
Reading the ietf draft regarding the lzip format, the source code of lzd and
filling holes with the wikipedia page on lzma, I managed to grasp (most)
of the process going on when decompressing lzip — compression will certainly be
another challenge, but I’ll see when I get to it.
One thing (amongst many!) that I fail to figure out is why the range decoder
skips the first five bytes of the lzma stream. This happens in the
Range_decoder constructor in lzd code:
Range_decoder() : member_pos( 6 ), code( 0 ), range( 0xFFFFFFFFU )
for( int i = 0; i < 5; ++i ) code = ( code << 8 ) | get_byte();
This is also confirmed by the ietf draft:
The range encoder produces a first 0 byte that must be ignored by the
range decoder. This is done by shifting 5 bytes in the
initialization of 'code' instead of 4.
This tells me why it should skip five bytes instead of four, but why do we need
to skip four bytes in the first place, that I cannot understand. I guess I’m
missing some more general knowledge about range encoding, which is why I’m
sending this email in the hope that some of you might enlighten me.
On a side note, this code snippet shows that the first five bytes are used to
update the code, which is the current point in the range, according to the ietf
draft, but range is not updated. I don’t understand why, and this tells me I do
not properly understand what these variables represent. Any insight is welcome
on that matter too.
Description: PGP signature
- Understanding lzma-302eos,
Hoël Bézier <=