[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Conda environments and reproducibility

From: Hugo Buddelmeijer
Subject: Re: Conda environments and reproducibility
Date: Tue, 29 Nov 2022 14:12:55 +0100

Hi Konrad, Thibault and others,

Konrad, is it perhaps possible for you to dig up this broken conda environment file?

First, just like you all, my conclusion is that guix is the answer. The last two paragraphs by Simon captures it succinctly. However, conda seems to work fine for most people. It would therefore be instructive to have concrete 'failure stories' in order to show people that conda is not enough.

On Tue, 29 Nov 2022 at 11:32, Thibault Lestang <> wrote:
That's fair enough. Conda & pip are everywhere around me, and I'd like
to form an accurate picture of their shotcomings before mentioning
alternative approaches to people who use these tools everyday!

I agree, let me share my perspective.

Konrad Hinsen <> writes:
> That's in a way what happened in my scenario: rebuilding with a new
> compilation infrastructure produces different packages that share
> version numbers and tags with the prior ones.

Okay - this is an explanation I can understand. A better approach
would have been /not/ to overwrite existing package binaries with new
ones produced from the new infrastructure.

It doesn't seem common to overwrite conda binaries. Conda takes some (not enough?) measures to prevent the scenario Konrad describes. In particular, the filenames include a 'hash' since conda 3 (~2014) [1]:

in the past, we have had things like py27np111 in filenames. This is the same idea, just generalized. Since we can't readily put every possible constraint into the filename, we have kept the old ones, but added the hash as a general solution.

This hash includes information about the compiler used (~2017) [2, 3]:

The build hash will be added to the build string if these are true for any dependency: [...] package uses {{ compiler() }} jinja2 function

That is, "conda env export" should contain entries like "scipy=1.8.0=py39hee8e79c_1", where the hee8e79c should uniquely define the dependencies 'that matter', like which compiler is used. What goes into the hash seems rather complicated, and grows over time.

This hash is a great step forward in reproducibility. But it is too fragile. I can't directly see how, but I can easily assume that this dependency-hash mechanism leads to the problem that Konrad faced even when no files are overwritten. Maybe because a new dependency resolver in conda would have stricter rules on interoperability. (It is still possible that files indeed were overwritten though; it was probably an incident like this that made them change the hashes.)

My realization was that improving these hashes is a goose chase and will ultimately lead to horrific things like "turing-complete yaml files". And at that point it is clear, at least to me, that guix is the answer.

One thing that conda (or actualy conda-forge) does well, are their bots. I'm a maintainer of some conda packages and once a month or so I get a fully automated pull request to update my package [4], e.g. when the upstream package is updated, or when a dependency is updated. They even have a tracking system for migrating dependencies that are used by many packages, such as compilers. This makes maintaining conda-forge packages a breeze. Having such bots also within the guix-ecosystem would probably help attract developers.

By the way, it is quite hard to use conda in guix, primarily because "conda activate myenvironment" will try to set PS1 by calling a bash function called 'conda'. This bash function calls the 'conda' executable, which takes PS1, modifies it, and returns it to the bash function. The bash function subsequently sets PS1 (and makes a backup for deactivating the environment again). However, the conda executable is replaced by a bash script that calls conda_real. And bash scripts eat PS1 (because it is in non-interactive mode), so conda_real gets an empty PS1, fails to modify it, and then the bash function sets PS1 to nothing. I've got it working properly on my machine, but don't feel comfortable enough yet with Scheme / guix to provide a proper patch. The simplest might be to use another shell for the conda package (because I believe only bash eats PS1); not sure whether that is possible in guix. And I would rather make guix packages of everything and ditch conda altogether. But supporting conda properly would help more people transition.

(Oh, this reminds me of the problems of activation and deactivation scripts in conda. For another time.)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]