bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#39885: Bioconductor tarballs are not archived


From: Timothy Sample
Subject: bug#39885: Bioconductor tarballs are not archived
Date: Fri, 19 Jan 2024 09:46:21 -0600
User-agent: Gnus/5.13 (Gnus v5.13)

Hello,

Ludovic Courtès <ludovic.courtes@inria.fr> writes:

> As for past tarballs, #swh-devel comrades say we could send them a list
> of URLs and they’d create “Save Code Now” requests on our behalf (we
> cannot do it ourselves since the site doesn’t accept plain tarballs.)
>
> Any volunteer to write a script that’d generate a list of Bioconductor
> content-addressed URLs (the bordeaux.guix.gnu.org/file ones) for say the
> past couple of years?

Sorry I’m a little late to this party, but I wrote a similar script a
while ago.  It creates a “sources.json” file of all the sources that the
PoG database analyzed and found missing in SWH.  It only covers what PoG
monitors (which is *almost* everything, but not quite).

  $ git clone https://git.ngyro.com/preservation-of-guix
  $ cd preservation-of-guix
  $ wget https://ngyro.com/pog-reports/latest/pog.db

  [Wait a long time because my server is sloooow.]

  $ guile -L . etc/sources.scm pog.db > missing-sources.json

With some modifications, I used it to generate the attached list of
Bioconductor sources (based off of recent, unpublished PoG data).  I’ve
also attached the modifications in case anyone is curious or wants to
make a similar list.  I will publish the PoG database soon (today?), so
maybe wait for that before generating any lists.


-- Tim

Attachment: bioconductor-sources.json.gz
Description: Binary data

diff --git a/etc/sources.scm b/etc/sources.scm
index 71d157d..515cf00 100644
--- a/etc/sources.scm
+++ b/etc/sources.scm
@@ -1,5 +1,5 @@
 ;;; Preservation of Guix
-;;; Copyright © 2022 Timothy Sample <samplet@ngyro.com>
+;;; Copyright © 2022, 2024 Timothy Sample <samplet@ngyro.com>
 ;;;
 ;;; This file is part of Preservation of Guix.
 ;;;
@@ -61,6 +61,7 @@ FROM fods f
 WHERE f.algorithm = 'sha256'
     AND (fr.reference LIKE '\"%'
         OR fr.reference LIKE '(\"%')
+    AND fr.reference LIKE '%bioconductor.org%'
     AND NOT fr.is_error
     AND f.is_in_swh IS NOT NULL
     AND NOT f.is_in_swh")
@@ -85,22 +86,25 @@ Subresource Integrity metadata value."
   (define b64 (base64-encode bv))
   (string-append "sha256-" b64))
 
-(define (web-reference-urls reference)
+(define (web-reference-filename reference)
   (define uris
     (match (call-with-input-string reference read)
       ((urls ...) (map string->uri urls))
       (url (list (string->uri url)))))
-  (append-map (lambda (uri)
-                (map uri->string
-                     (maybe-expand-mirrors uri %mirrors)))
-              uris))
+  (or (any (lambda (uri)
+             (and (string-suffix? "bioconductor.org" (uri-host uri))
+                  (basename (uri-path uri))))
+           uris)
+      (error "Not a 'bioconductor.org' refernce" reference)))
 
 (define (record->url-source rec)
   (match-let ((#(digest reference) rec))
-    (let ((urls (web-reference-urls reference))
-          (integrity (nix-base32-sha256->subresource-integrity digest)))
+    (let* ((filename (web-reference-filename reference))
+           (url (string-append "https://bordeaux.guix.gnu.org/file/";
+                               filename "/sha256/" digest))
+           (integrity (nix-base32-sha256->subresource-integrity digest)))
       `(("type" . "url")
-        ("urls" . ,(list->vector urls))
+        ("urls" . ,(vector url))
         ("integrity" . ,integrity)))))
 
 (define (lookup-missing-sources db)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]