[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Disarchive database synchronization

From: Ludovic Courtès
Subject: Disarchive database synchronization
Date: Tue, 14 Mar 2023 16:55:07 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

Hello Guix!

As you may know, there are currently two different Disarchive databases:
the one at <> that Timothy Sample set up a
few years back, and the one at <> that we
set up later, with a continuous integration job to populate it¹.

The database at has more historical metadata (metadata about
tarballs that older Guix revisions referred to) because Timothy worked
hard to populate it with tarballs from all the packages Guix refers to
starting from 1.0—which is crucial for long-term reproducibility.

Thanks to Timothy, I have now copied over things from to  The stats are as
follows: had 28,396 entries
  12,905 (45%) entries were missing from
  15,491 (the rest: 55%) entries were present in both yet different.
  3,444 entries of disarchive.guix were missing from disarchive.ngyro²

I copied over the 12K entries that were missing from  (Note that there are currently only two copies
of the database: one at/in [bB]erlin, and one at/in [Bb]ordeaux.) now weighs in at 1.8 GiB for 31,839 entries.

For the remaining entries, it’s trickier.  Sometimes it’s just the
gzip compression parameters that differ, which could be addressed with a
little bit more work:

--8<---------------cut here---------------start------------->8---
$ file ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz 
             gzip compressed data, max compression, from Unix, original size 
modulo 2^32 446731
 gzip compressed data, max speed, from Unix, original size modulo 2^32 446731
--8<---------------cut here---------------end--------------->8---

Sometimes it’s trickier:

# diff -u <(gunzip -d < 
0001f025c1425ffe36270a81cb091eade87dd8d29ac773735ae47e1a8c8066c9.gz) <(gunzip 
-d < 
--- /dev/fd/63  2023-03-14 16:13:21.635733426 +0100
+++ /dev/fd/62  2023-03-14 16:13:21.635733426 +0100
@@ -1,7 +1,7 @@
   (version 0)
-    (name "webview-sys-0.6.2.tar.gz")
+    (name "rust-webview-sys-0.6.2.tar.gz")
@@ -13,7 +13,7 @@
     (footer (crc 1807070134) (isize 121344))
     (compressor zlib-best)
     (input (tarball
-             (name "webview-sys-0.6.2.tar")
+             (name "rust-webview-sys-0.6.2.tar")
@@ -78,7 +78,7 @@
              (padding 0)
              (input (directory-ref
                       (version 0)
-                      (name "webview-sys-0.6.2")
+                      (name "rust-webview-sys-0.6.2")
As Tim pointed out, Disarchive disassembly is not fully deterministic
and/or might change a bit over time as Disarchive evolves, and that’s
prolly what we’re seeing here.

The admins among us can see the remaining files in
/gnu/ on berlin.  That directory also contains two
files: ‘files-present-in-both-yet-different.txt’ and

Kudos to Timothy for making it possible.

Feedback welcome!



² Some of these showed up at since I copied the
  database ~16h ago.  Example missing entry is “samplv1-0.9.24.tar.gz”:

Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]