[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Disarchive database synchronization

From: Ludovic Courtès
Subject: Re: Disarchive database synchronization
Date: Mon, 20 Mar 2023 10:14:41 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

Howdy Timothy!

Timothy Sample <> skribis:

> Ludovic Courtès <> writes:


>> For the remaining entries, it’s trickier.  Sometimes it’s just the
>> gzip compression parameters that differ, which could be addressed with a
>> little bit more work:
>> $ file ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz 
>> ../../disarchive/sha256/ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz
>> ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz:         
>>                 gzip compressed data, max compression, from Unix, original 
>> size modulo 2^32 446731
>> ../../disarchive/sha256/ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz:
>>  gzip compressed data, max speed, from Unix, original size modulo 2^32 446731
> I’m not sure getting the compressed files to match matters.

No it doesn’t matter for sure; it’s just that it would have made it
easier to check for relevant differences between the two Disarchive

>> Sometimes it’s trickier:
>> # diff -u <(gunzip -d < 
>> 0001f025c1425ffe36270a81cb091eade87dd8d29ac773735ae47e1a8c8066c9.gz) 
>> <(gunzip -d < 
>> ../../disarchive/sha256/0001f025c1425ffe36270a81cb091eade87dd8d29ac773735ae47e1a8c8066c9.gz)
>> --- /dev/fd/63  2023-03-14 16:13:21.635733426 +0100
>> +++ /dev/fd/62  2023-03-14 16:13:21.635733426 +0100
>> @@ -1,7 +1,7 @@
>>  (disarchive
>>    (version 0)
>>    (gzip-member
>> -    (name "webview-sys-0.6.2.tar.gz")
>> +    (name "rust-webview-sys-0.6.2.tar.gz")


> The name field is not used for data reconstruction.  It’s for human
> consumption (and it may have made some early examples of use at the
> command line easier to explain).  Here, the difference is based on the
> fact that Crate URIs are weird, and the Preservation of Guix code does
> not keep the origin file name.  Hence, the PoG version extracts the
> Crate name alone from the URI, and the Cuirass version uses the Guix
> package name with the “rust-” prefix.

OK.  Again I was looking at this from the perspective of determining
whether there were “relevant” differences between the two Disarchive
databases.  Looks like it would be quite some work to determine that

>> As Tim pointed out, Disarchive disassembly is not fully deterministic
>> and/or might change a bit over time as Disarchive evolves, and that’s
>> prolly what we’re seeing here.
> I honestly think this is a good thing.  My instincts tell me that we
> should excise all sources of ambiguity, like we’re trying to do in the
> big picture.  However, Disarchive will get better at describing things
> over time.  For instance, it doesn’t handle tar extension headers
> elegantly at the moment.  In the future, if I fix this, I might consider
> creating a “migrate” feature that improves existing specifications
> (e.g., converting the old, verbose representation of extension headers
> into the new representation).  In particular, I’ve left some warts in
> the software in order to ship it, and I would be sad to try and commit
> to those for the rest of time!

That makes a lot of sense!

> We might also add other resolver addresses besides SWHIDs....
> Maybe I’m missing some perspective, but I don’t think trying to commit
> to reproducible outputs for Disarchive makes sense.

Yes, I feel the same.

> P.S., we’ll have to do this dance again shortly, as I just computed
> 2,023 historical bzip2 specifications.  They’re not online yet, but
> they’ll be up when I publish the next PoG report – which should take less
> than a year this time!  :p

Woow, bzip2!  I was just now looking at a concrete disappearing-tarball
issue that involves bzip2:

Thank you!


reply via email to

[Prev in Thread] Current Thread [Next in Thread]