Re: [Help-tar] Reproducibility of tar archives

help-tar

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-tar] Reproducibility of tar archives

From:	Jakob Bohm
Subject:	Re: [Help-tar] Reproducibility of tar archives
Date:	Tue, 2 Apr 2019 21:49:55 +0200
User-agent:	Mozilla/5.0 (Windows NT 6.3; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1

On 02/04/2019 20:07, Yann E. MORIN wrote:

Jakob, All,

On 2019-04-01 23:56 +0200, Jakob Bohm spake thusly:

On 01/04/2019 22:15, Yann E. MORIN wrote:

On 2019-04-01 12:12 +0200, Jakob Bohm spake thusly:

On 31/03/2019 14:08, Yann E. MORIN wrote:

So, here's my question: starting with tar-1.32 (the latest release as of
today), is the gnu tar format considered stable now, or is there no
guarantee about the gnu tar format stability?

[--SNIP--]

The 3rd option, consistent with how reproducible builds are
otherwise done, is to treat tar as part of the tool chain, thus
making the exact build or source version of tar part of the list
of exact tool versions needed to reproduce a specific build (just
like there is already a requirement to use exact versions of gcc,
autotools etc.), doing so would also allow the historic hash values
to remain valid, as they are each tied to the tar version they were
historically built with.

The problem is that today, Buildroot uses tar-1.29, so all hashes are
generated with that "gnu-1.29" format, and they eventually percolate to
our source mirror (aka backup): http://sources.buildroot.org/

The problem that I don't understand is this:

In which situations does Buildroot recreate a tar file that doesn't
contain built/generated files and expect it to be exactly the same tar
archive as a different build configuration?

So, a bit of background: Buildroot is a cross-compilation build system,
in the same spirit as OpenEmebedded or OpenWrt. As such, it downloads
the source code from various projects, and compiles it

I have used earlier versions of Buildroot and was somewhat annoyed at how

those versions made the build environment depend on external sites andtheir

changes.  So having a consistency checksum of known good versions is an
improvement.

Most packages provides readily-made archives, and that is what we
download. The archive is extracted, the code is built and installed. The
archive is eventualyl used as-is and copied to a "legla-info" landing
area.

But some packages only have a git, an Hg, an svn, or a cvs repository.
For those, we eventually need to generate the archive that lands in the
"legal-info".

From what I have seen, many upstream git/Hg/svn/cvs repositories require
that people downloading from there run some pre-processing tools such as
GNU autotools (autoconf etc.) to produce their recommended tarball form.
For such repositories, changes in those post processing tools would have
even greater effects on reproducability than the format changes in tar.
Thus treating tar similarly to those other tools would be a natural
extension.

In practice, each source tarball creation Makefile snippet would specify
dependency on specific tar, autotools, patch etc. versions to create the
tarball with the known hash.  As with other such snippets, the values
specified can depend on the source version being processed (for example,
some source versions need autotools version 1.13, others a more current
version, similarly versions that were previously hashed with tar 1.29 would
specify that, while later versions would specify later tar versions).

Irecall Buildroot already downloading and building the entire gcc source
tree, so adding 2 or 3 versions of tar would be minor in the greater scheme
of things.

This is the case where we need to generate an archive that contains
actual source code, and for which we want reproducibility of the
archive.

Do those situations incorporate other computed files, such as the
result of running autotools on a Configure.in file in an upstream
source?

No it does not.

  If so, the generated tar content already depends on the
versions of tools (such as autotools) used, and tar would belong to
the same version control as those tools.

Do those situations really need to recreate the tar file instead of
downloading it from sources.buildroot.org and checking the hash?

Hashes are bundled in Buildroot, but people are free not to use our
mirror. Especially, enterprise-class users would typically want to grab
the sources from the real, official upstream, rather than use our
mirror, just in case they are worried archives there would be trojaned.

Still, they want to be sure that what they get from upstream is indeed
what Buildroot expects to build, so they want the hashes to match.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  http://www.wisemo.com
Transformervej 29, 2860 Soborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Help-tar] Reproducibility of tar archives, Jakob Bohm, 2019/04/01
- Re: [Help-tar] Reproducibility of tar archives, Yann E. MORIN, 2019/04/01
  - Re: [Help-tar] Reproducibility of tar archives, Jakob Bohm, 2019/04/01
    - Re: [Help-tar] Reproducibility of tar archives, Yann E. MORIN, 2019/04/02
    - Re: [Help-tar] Reproducibility of tar archives, Jakob Bohm <=

Prev by Date: Re: [Help-tar] Reproducibility of tar archives
Next by Date: [Help-tar] why tar can't automatically creates directories
Previous by thread: Re: [Help-tar] Reproducibility of tar archives
Next by thread: [Help-tar] why tar can't automatically creates directories
Index(es):
- Date
- Thread