|Subject:||Re: [Help-tar] --deterministic option?|
|Date:||Wed, 27 May 2015 16:01:42 +0200|
|User-agent:||Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0|
On 27/05/2015 12:44, Jérémy Bobbio wrote:
Agree in principle. Note that the boilerplate youHi! We are working in Debian— and I know other free software projects care— in providing our users with a way to reproduce bit-for-bit identical binary packages from the source and build enviroment. See <https://wiki.debian.org/ReproducibleBuilds/About> for some rationale and further explainations. In order to do this, we need to make our build processes as deterministic as possible. As you can imagine, Tar is quite involved in producing Debian packages. A straightforward call leads to multiple issues: * Order of files in the archive will depend on the filesystem order. * User and group names are recorded. This can be seen as a privacy leak for the package builder. * Permissions are dependent on the builder umask. * Last modification times of members of files created during the build will be dependent on the build time. * Also, if gzip compression is used, a timestamp will be recorded in gzip header. So, we are currently turning calls like: tar -zcf archive.tar.gz src into: find src -print0 | LC_ALL=C sort -z | GZIP=-9n tar --null -T - --no-recursion \ --owner=root --group=root --numeric-owner \ --mode=go=rX,u+rw,a-s \ --mtime=debian/changelog \ -zcf archive.tar It would be great to avoid at least some of the boilerplate. Finding a generic solution for permissions and modification times might be too much, but having a `--deterministic` flag for the rest of the issues would be quite helpful already. What do you think?
show looks like it doesn't handle:
- Creation/Access times (if stored in tar headers).
- Random gzip version dependencies (also affects DAK
producing different gzipped index files depending on
the Debian release installed on/near master).
- statoverride integration for suid/sgid binaries and
special dir flags (mostly in basefiles and /usr/local).
- Adding .gz extention to archive.tar (probably just
Which probably makes the real command line even longer.
Also, at least a few versions back, dpkg-source
produced the wrong file timestamps in .diff.gz
files, affecting the consistency of source file
Now for tar, I would suggest (as a future feature) three
new determinism options:
--nomode : Short for --owner=root --group=root
--numeric-owner --mode=go=rX,u+rw, except
for suid/sgid entries. Combine with
--mode=a-s to make all files root:root with
no suid/sgid bits.
For more advanced permission systems (acls
etc.) --nomode will in general archive each
entry as if all non-modify permissions are
the union of those granted to any users, while
modify permissions are for owner only and any
special attributes (sgid/suid/capabilities
etc.) are preserved.
--sort : Causes the entries in each processed
directory to be output in Asciibetical order
(thus each dir needs to be loaded into memory
and sorted, using a locale-independent
strcmp() variant, but no need to preload
entire file listing).
--onepass : (not for package builders): If a file
changes while being archived, the archived
file contents, file length and sparse holes
will all be determined from a single read()
pass over the file until end of file reached.
This is in contrast to the current two-pass
logic where length and holes are found on a
first pass, contents of non-holes on a second
pass, thus --onepass provides guarantees to
applications (such as databases) that a
restored file will have the property that if
something in the file indicates that
something earlier in the file was updated to
checkpoint X, then that will be true, just
as if the backup had been done with cat.
The kernel/filesys is responsible for
presenting a consistent view of each file to
all processes/handles (a property already
needed for ordinary interprocess use of a
Enjoy Jakob -- Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10 This public discussion message is non-binding and may contain errors. WiseMo - Remote Service Management for PCs, Phones and Embedded
|[Prev in Thread]||Current Thread||[Next in Thread]|