[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Conda environments and reproducibility
From: |
Simon Tournier |
Subject: |
Re: Conda environments and reproducibility |
Date: |
Mon, 28 Nov 2022 21:46:05 +0100 |
Hi,
On Mon, 28 Nov 2022 at 17:28, Thibault Lestang <t.lestang@imperial.ac.uk> wrote:
> -----
> @luispedrocoelho
> Me, 6 months ago: I am going to save this conda
> environment with all the versions of all the packages so it can be
> recreated later; this is Reproducible Science!
>
> conda, today: these versions don't work together, lol.
> -----
>
> I simply can't explain how such a behavior can happen.
One thing is the link rot. I do not know if it is currently estimated,
but for sure, we always underestimate it.
> I understand that conda ships pre-compiled binaries. I see how that's
> bad for reproducibility and provenance tracking since it's not
> straightforward to know how these binaries and dependencies were
> compiled. I'm assuming that, when conda saves an environment, it records
> version tags and "everything else required" to pull the same binaries
> later. Okay - I see how binaries could /technically/ be modified at a
> later stage whilst maintaning the same version tag (provenance tracking
> issue).
Aside, you are assuming the availability of such binaries. :-)
Another thing, from the old time where I used Conda, and I may be wrong,
is, I guess , the SAT solver [1]. Well, 6 months ago, you described
your environment, for instance saying:
1.0 <= foo
2.0 <= bar <= 3.0
baz <= 4.0
then foo@1.1, foo@1.2 and foo@2.0 had been released in these past 6
months. But baz <= 4.0 only works with 0.9 <= foo <= 1.2 and the
constraint on bar implies other constraints on foo and/or baz.
The complexity about SAT solvers is exponential, IIRC, for sure really
bad, and I do not know the state-of-the-art but I guess the problem to
solve is going to be worse and worse as the time flies.
>From my experience, you have only one solution to fight against the
time: freeze. The question is then how or what to freeze. :-)
One way for freezing is the binary container. Another way for freezing
is to have a “summary” capturing the whole (fixed) graph of
dependencies. This is (usually named) the channels.scm file (guix
describe). Then, the assumptions become:
1. solve the link rot; tackled by Software Heritage,
2. Linux kernel API backward compatibility,
3. hardware compatibility,
to be able to rebuild. If I might, here some stuff: :-)
https://www.nature.com/articles/s41597-022-01720-9
https://simon.tournier.info/posts/2022-11-08-bluehats.html
https://simon.tournier.info/posts/2022-04-15-cafe-guix-long-term.html
Cheers,
simon
1: https://en.wikipedia.org/wiki/SAT_solver