|
From: | Jakob Bohm |
Subject: | Re: [Help-tar] --deterministic option? |
Date: | Wed, 27 May 2015 16:01:42 +0200 |
User-agent: | Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 |
On 27/05/2015 12:44, Jérémy Bobbio
wrote:
Agree in principle. Note that the boilerplate youHi! We are working in Debian— and I know other free software projects care— in providing our users with a way to reproduce bit-for-bit identical binary packages from the source and build enviroment. See <https://wiki.debian.org/ReproducibleBuilds/About> for some rationale and further explainations. In order to do this, we need to make our build processes as deterministic as possible. As you can imagine, Tar is quite involved in producing Debian packages. A straightforward call leads to multiple issues: * Order of files in the archive will depend on the filesystem order. * User and group names are recorded. This can be seen as a privacy leak for the package builder. * Permissions are dependent on the builder umask. * Last modification times of members of files created during the build will be dependent on the build time. * Also, if gzip compression is used, a timestamp will be recorded in gzip header. So, we are currently turning calls like: tar -zcf archive.tar.gz src into: find src -print0 | LC_ALL=C sort -z | GZIP=-9n tar --null -T - --no-recursion \ --owner=root --group=root --numeric-owner \ --mode=go=rX,u+rw,a-s \ --mtime=debian/changelog \ -zcf archive.tar It would be great to avoid at least some of the boilerplate. Finding a generic solution for permissions and modification times might be too much, but having a `--deterministic` flag for the rest of the issues would be quite helpful already. What do you think? show looks like it doesn't handle: - Creation/Access times (if stored in tar headers). - Random gzip version dependencies (also affects DAK producing different gzipped index files depending on the Debian release installed on/near master). - statoverride integration for suid/sgid binaries and special dir flags (mostly in basefiles and /usr/local). - Adding .gz extention to archive.tar (probably just a typo). Which probably makes the real command line even longer. Also, at least a few versions back, dpkg-source produced the wrong file timestamps in .diff.gz files, affecting the consistency of source file timestamps. Now for tar, I would suggest (as a future feature) three new determinism options: --nomode : Short for --owner=root --group=root --numeric-owner --mode=go=rX,u+rw, except for suid/sgid entries. Combine with --mode=a-s to make all files root:root with no suid/sgid bits. For more advanced permission systems (acls etc.) --nomode will in general archive each entry as if all non-modify permissions are the union of those granted to any users, while modify permissions are for owner only and any special attributes (sgid/suid/capabilities etc.) are preserved. --sort : Causes the entries in each processed directory to be output in Asciibetical order (thus each dir needs to be loaded into memory and sorted, using a locale-independent strcmp() variant, but no need to preload entire file listing). --onepass : (not for package builders): If a file changes while being archived, the archived file contents, file length and sparse holes will all be determined from a single read() pass over the file until end of file reached. This is in contrast to the current two-pass logic where length and holes are found on a first pass, contents of non-holes on a second pass, thus --onepass provides guarantees to applications (such as databases) that a restored file will have the property that if something in the file indicates that something earlier in the file was updated to checkpoint X, then that will be true, just as if the backup had been done with cat. The kernel/filesys is responsible for presenting a consistent view of each file to all processes/handles (a property already needed for ordinary interprocess use of a shared file). Enjoy Jakob -- Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10 This public discussion message is non-binding and may contain errors. WiseMo - Remote Service Management for PCs, Phones and Embedded |
[Prev in Thread] | Current Thread | [Next in Thread] |