[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] Add draft post "CRAN, a practical example for being reproducible

From: Lars-Dominik Braun
Subject: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
Date: Tue, 6 Dec 2022 08:53:22 +0100

Hi Simon, hi all,

attached my draft post for regarding guix-cran.


* drafts/ New file.
 drafts/ | 195 ++++++++++++++++++++++++++++++++++++
 1 file changed, 195 insertions(+)
 create mode 100644 drafts/

diff --git a/drafts/ b/drafts/
new file mode 100644
index 0000000..c759b02
--- /dev/null
+++ b/drafts/
@@ -0,0 +1,195 @@
+# CRAN, a practical example for being reproducible at large scale using GNU 
+GNU Guix provides scripts (“importer”) to turn packages from
+various language-specific repositories like [PyPi](
+for Python, []( for Rust and
+[CRAN]( for R into Guix package recipes.
+An example workflow for the CRAN package
+[zoid](, which is not available
+in Guix proper, would look like this:
+1. Import the package into a manifest.
+   ```console
+   $ guix import cran -r zoid > manifest.scm
+   ```
+2. Edit `manifest.scm` to import the required modules and return a
+   usable manifest containing the package and R itself.
+   ```scheme
+   (use-modules (guix packages)
+                (guix download)
+                (guix licenses)
+                (guix build-system r)
+                (gnu packages cran)
+                (gnu packages statistics))
+   (define-public r-zoid …)
+   (packages->manifest (list r-zoid r))
+   ```
+3. Run your code.
+   ```console
+   guix shell -m manifest.scm -- R -e 'library(zoid)'
+   ```
+Although Guix displays hints which modules are missing when trying to
+use an incomplete manifest, editing the manifest file to include all of
+them can be quite tedious.
+For R specifically the R package
+[guix.install]( provides
+a way to automate this import. It also uses `guix import`, but references
+dependencies using package specifications like `(specification->package
+"r-bh")`. This way no extra logic to figure out the correct module
+imports is required. It then extends the package search path, including
+the newly written file at `~/.Rguix/packages.scm`, installs the package
+into the default Guix profile at `~/.guix-profile` and adds this profile
+to R’s search path.
+While this approach works well for individual users, Guix installations
+with a larger user-base, for instance institution-wide, would benefit
+from default availability of the entire CRAN package collection with
+pre-built substitutes to speed up installation times. Additionally
+reproducing environments would include less steps if the package
+recipes were available to anyone by default.
+## Introducing guix-cran
+GNU Guix provides a mechanism called “channels”,
+which can extend the package collection in Guix
+proper. [guix-cran]( does
+exactly that: It provides all CRAN packages missing in Guix proper in
+a channel and has all of the properties mentioned above. It can be
+installed globally via `/etc/guix/channels.scm` and packages can be
+pre-built on a central server.
+As of commit `cc7394098f306550c476316710ccad20a510fa4b` there are 17431
+packages available in guix-cran. 95% of them are buildable and only 0.5%
+of these builds are not reproducible via `guix build --check`.  It is
+also possible to use old package versions via `guix time-machine`, similar
+to what [MRAN](
+offers. However, that time-frame only spans about two months right now.
+Creating and updating guix-cran is [fully
+automated]( and happens
+without any human intervention. Improvements to the already very good
+CRAN importer also improve the channel’s quality. The channel itself
+is always in a usable state, because updates are tested with `guix pull`
+before committing and pushing them. However some packages may not build
+or work, because (usually undeclared) build or runtime dependencies are
+missing. This could be improved through better auto-detection in the
+CRAN importer.
+Currently building the channel derivation is very slow, most
+likely due to Guile performance issues. For this reason packages
+are split into files by first letter.  This way they can
+still be referenced deterministically by the first letter of
+their name.  Since the number of loadable modules is [limited to
+creating one module file per package is not possible and putting them
+all into the same file is even slower.
+The channel is not signed, because all changes are automated anyway.
+## Usage
+Using guix-cran requires the following steps:
+1. Create `channels.scm`:
+   ```scheme
+   (cons
+     (channel
+       (name 'guix-cran)
+       (url "";))
+     %default-channels)
+   ```
+2. Create `manifest.scm`:
+   ```scheme
+   (specifications->manifest '("r-zoid" "r"))
+   ```
+3. Run:
+   ```console
+   guix time-machine -C channels.scm -- shell -m manifest.scm -- R -e 
+   ```
+For true reproducibility it’s necessary to pin the channels to a
+specific commit by running
+guix time-machine -C channels.scm -- describe -f channels > channels.pinned.scm
+once and using `channels.pinned.scm` instead of `channels.scm` from there on.
+## Appendix
+Ludovic Courtès, Simon Tournier and Ricardo Wurmus provided valuable
+feedback to the draft of this post.
+The channel statistics above can be reproduced using the following
+manifest (`channels.scm`):
+  (channel
+    (name 'guix)
+    (url "";)
+    (branch "master")
+    (commit
+      "4781f0458de7419606b71bdf0fe56bca83ace910")
+    (introduction
+      (make-channel-introduction
+        "9edb3f66fd807b096b48283debdcddccfea34bad"
+        (openpgp-fingerprint
+          "BBB0 2DDF 2CEA F6A8 0D1D  E643 A2A0 6DF2 A33A 54FA"))))
+  (channel
+    (name 'guix-cran)
+    (url "";)
+    (branch "master")
+    (commit
+      "cc7394098f306550c476316710ccad20a510fa4b")))
+And the following Scheme code to obtain a list of all packages provided
+by guix-cran (`list-packages.scm`):
+(use-modules (guix discovery)
+             (gnu packages)
+             (guix modules)
+             (guix utils)
+             (guix packages))
+(let* ((modules (all-modules (%package-module-path)))
+       (packages (fold-packages
+                   (lambda (p accum)
+                     (let ((mod (file-name->module-name (location-file 
(package-location p)))))
+                       (if (member (car mod) '(guix-cran))
+                         (cons p accum)
+                         accum)))
+                   '() modules)))
+  (for-each (lambda (p) (format #t "~a~%" (package-name p))) packages))
+And this Bash script:
+guix pull -p guix-profile -C channels.scm
+export GUIX_PROFILE=`pwd`/guix-profile
+source guix-profile/etc/profile
+guix repl list-packages.scm > packages
+cat packages| parallel -j 4 'rm -f builds/{} && guix build --no-grafts 
--timeout=300 -r builds/{} -q {} 2>&1 && guix build --no-grafts --timeout=300 
--check -q {} 2>&1' | tee build.log
+echo "total" && wc -l packages
+echo "success" && sort -u build.log | grep '^/gnu/store' | wc -l
+echo "failure" && sort -u build.log | grep 'failed$' | wc -l
+echo "non-reproducible" && sort -u build.log | grep 'differs$' | wc -l

Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]