bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24450: [PATCHv2] Re: pypi importer outputs strange character series


From: Ricardo Wurmus
Subject: bug#24450: [PATCHv2] Re: pypi importer outputs strange character series in optional dependency case.
Date: Tue, 28 May 2019 15:21:56 +0200
User-agent: mu4e 1.2.0; emacs 26.2

Next up: Seven of Nine, tertiary adjunct of unimatrix zero one:

> From 37e499d5d5d5f690aa0a065c730e13f6a31dd30d Mon Sep 17 00:00:00 2001
> From: Maxim Cournoyer <address@hidden>
> Date: Thu, 28 Mar 2019 23:12:26 -0400
> Subject: [PATCH 7/9] import: pypi: Include optional test inputs as
>  native-inputs.
>
> * guix/import/pypi.scm (maybe-inputs): Add INPUT-TYPE argument, and use it.
> (test-section?): New predicate.
> (parse-requires.txt): Collect the optional test inputs, and return them as the
> second element of the returned list.

AFAICT parse-requires.txt now returns a list of pairs, but used to
return a plain list before.  Is this correct?

>  (define (parse-requires.txt requires.txt)
> -  "Given REQUIRES.TXT, a Setuptools requires.txt file, return a list of
> -requirement names."
> -  ;; This is a very incomplete parser, which job is to select the 
> non-optional
> -  ;; dependencies and strip them out of any version information.
> +  "Given REQUIRES.TXT, a Setuptools requires.txt file, return a pair of 
> requirements.
> +
> +The first element of the pair contains the required dependencies while the
> +second the optional test dependencies.  Note that currently, optional,
> +non-test dependencies are omitted since these can be difficult or expensive 
> to
> +satisfy."
> +
> +  ;; This is a very incomplete parser, which job is to read in the 
> requirement
> +  ;; specification lines, and strip them out of any version information.
>    ;; Alternatively, we could implement a PEG parser with the (ice-9 peg)
>    ;; library and the requirements grammar defined by PEP-0508
>    ;; (https://www.python.org/dev/peps/pep-0508/).

Does it really return a pair?  Or a list of pairs?  Or is it a
two-element list of lists?

>    (call-with-input-file requires.txt
>      (lambda (port)
> -      (let loop ((result '()))
> +      (let loop ((required-deps '())
> +                 (test-deps '())
> +                 (inside-test-section? #f)
> +                 (optional? #f))
>          (let ((line (read-line port)))
> -          ;; Stop when a section is encountered, as sections contains 
> optional
> -          ;; (extra) requirements.  Non-optional requirements must appear
> -          ;; before any section is defined.
> -          (if (or (eof-object? line) (section-header? line))
> +          (if (eof-object? line)
>                ;; Duplicates can occur, since the same requirement can be
>                ;; listed multiple times with different conditional markers, 
> e.g.
>                ;; pytest >= 3 ; python_version >= "3.3"
>                ;; pytest < 3 ; python_version < "3.3"
> -              (reverse (delete-duplicates result))
> +              (map (compose reverse delete-duplicates)
> +                   (list required-deps test-deps))

Looks like a list of lists to me.  “delete-duplicates” now won’t delete
a name that is in both “required-deps” as well as in “test-deps”.  Is
this acceptable?

Personally, I’m not a fan of using data structures for returning
multiple values, because we can simply return multiple values.

Or we could have more than just strings.  The meaning of these strings
is provided by the bin into which they are thrown — either
“required-deps” or “test-deps”.  It could be an option to collect tagged
values instead and have the caller deal with filtering.

>  (define (parse-wheel-metadata metadata)
> -  "Given METADATA, a Wheel metadata file, return a list of requirement 
> names."
> +  "Given METADATA, a Wheel metadata file, return a pair of requirements.
> +
> +The first element of the pair contains the required dependencies while the 
> second the optional
> +test dependencies.  Note that currently, optional, non-test dependencies are
> +omitted since these can be difficult or expensive to satisfy."
>    ;; METADATA is a RFC-2822-like, header based file.

This sounds like this is going to duplicate the previous procedures.

>    (define (requires-dist-header? line)
>      ;; Return #t if the given LINE is a Requires-Dist header.
> -    (regexp-match? (string-match "^Requires-Dist: " line)))
> +    (string-match "^Requires-Dist: " line))
>
>    (define (requires-dist-value line)
>      (string-drop line (string-length "Requires-Dist: ")))
>
>    (define (extra? line)
>      ;; Return #t if the given LINE is an "extra" requirement.
> -    (regexp-match? (string-match "extra == " line)))
> +    (string-match "extra == '(.*)'" line))

These hunks should be part of the previous patch where they were
introduced.  (See my comments there about “regexp-match?”.)

> +  (define (test-requirement? line)
> +    (let ((extra-label (match:substring (extra? line) 1)))
> +      (and extra-label (test-section? extra-label))))

You can use “and=>” instead of binding a name:

    (and=> (match:substring (extra? line) 1) test-section?)

>    (call-with-input-file metadata
>      (lambda (port)
> -      (let loop ((requirements '()))
> +      (let loop ((required-deps '())
> +                 (test-deps '()))
>          (let ((line (read-line port)))
> -          ;; Stop at the first 'Provides-Extra' section: the non-optional
> -          ;; requirements appear before the optional ones.
>            (if (eof-object? line)
> -              (reverse (delete-duplicates requirements))
> +              (map (compose reverse delete-duplicates)
> +                   (list required-deps test-deps))
>                (cond
>                 ((and (requires-dist-header? line) (not (extra? line)))
>                  (loop (cons (specification->requirement-name
>                               (requires-dist-value line))
> -                            requirements)))
> +                            required-deps)
> +                      test-deps))
> +               ((and (requires-dist-header? line) (test-requirement? line))
> +                (loop required-deps
> +                      (cons (specification->requirement-name 
> (requires-dist-value line))
> +                            test-deps)))
>                 (else
> -                (loop requirements)))))))))
> +                (loop required-deps test-deps))))))))) ;skip line

Could you refactor this so that the other parser can be reused?  If not,
the same comments about “if” and “cond” and the use of pairs/lists
instead of “values” applies here.

>  (define (guess-requirements source-url wheel-url archive)
> -  "Given SOURCE-URL, WHEEL-URL and a ARCHIVE of the package, return a list
> +  "Given SOURCE-URL, WHEEL-URL and an ARCHIVE of the package, return a list
>  of the required packages specified in the requirements.txt file.  ARCHIVE 
> will
>  be extracted in a temporary directory."
>
> @@ -244,7 +289,10 @@ cannot determine package dependencies") (file-extension 
> url))
>             (metadata (string-append dirname "/METADATA")))
>        (call-with-temporary-directory
>         (lambda (dir)
> -         (if (zero? (system* "unzip" "-q" wheel-archive "-d" dir metadata))
> +         (if (zero?
> +              (parameterize ((current-error-port (%make-void-port "rw+"))
> +                             (current-output-port (%make-void-port "rw+")))
> +                (system* "unzip" wheel-archive "-d" dir metadata)))
>               (parse-wheel-metadata (string-append dir "/" metadata))
>               (begin
>                 (warning
> @@ -283,32 +331,41 @@ cannot determine package dependencies") (file-extension 
> url))
>                       (warning
>                        (G_ "Failed to extract file: ~a from source.~%")
>                        requires.txt)
> -                     '())))))
> -          '())))
> +                     (list '() '()))))))
> +          (list '() '()))))

I would like to see cosmetic changes like these three hunks in separate
commits.

>  (define (compute-inputs source-url wheel-url archive)
> -  "Given the SOURCE-URL of an already downloaded ARCHIVE, return a list of
> -name/variable pairs describing the required inputs of this package.  Also
> +  "Given the SOURCE-URL and WHEEL-URL of an already downloaded ARCHIVE, 
> return
> +a pair of lists, each consisting of a list of name/variable pairs, for the
> +propagated inputs and the native inputs, respectively.  Also
>  return the unaltered list of upstream dependency names."
> -  (let ((dependencies
> -         (remove (cut string=? "argparse" <>)
> -                 (guess-requirements source-url wheel-url archive))))
> -    (values (sort
> -             (map (lambda (input)
> -                    (let ((guix-name (python->package-name input)))
> -                      (list guix-name (list 'unquote (string->symbol 
> guix-name)))))
> -                  dependencies)
> -             (lambda args
> -               (match args
> -                 (((a _ ...) (b _ ...))
> -                  (string-ci<? a b)))))
> -            dependencies)))
> +
> +  (define (strip-argparse deps)
> +    (remove (cut string=? "argparse" <>) deps))
> +
> +  (define (requirement->package-name/sort deps)
> +    (sort
> +     (map (lambda (input)
> +            (let ((guix-name (python->package-name input)))
> +              (list guix-name (list 'unquote (string->symbol guix-name)))))
> +          deps)
> +     (lambda args
> +       (match args
> +         (((a _ ...) (b _ ...))
> +          (string-ci<? a b))))))
> +
> +  (define process-requirements
> +    (compose requirement->package-name/sort strip-argparse))
> +
> +  (let ((dependencies (guess-requirements source-url wheel-url archive)))
> +    (values (map process-requirements dependencies)
> +            (concatenate dependencies))))

Giving names to these processing steps is fine and improves clarity, but
I’m not so comfortable about returning ad-hoc pairs *and* multiple
values (like above).

> +            (match guix-dependencies
> +              ((required-inputs test-inputs)
> +               (values
> +                `(package
> +                   (name ,(python->package-name name))
> +                   (version ,version)
> +                   (source (origin
> +                             (method url-fetch)
> +                             ;; Sometimes 'pypi-uri' doesn't quite work due 
> to mixed
> +                             ;; cases in NAME, for instance, as is the case 
> with
> +                             ;; "uwsgi".  In that case, fall back to a full 
> URL.
> +                             (uri (pypi-uri ,(string-downcase name) version))
> +                             (sha256
> +                              (base32
> +                               ,(guix-hash-url temp)))))
> +                   (build-system python-build-system)
> +                   ,@(maybe-inputs required-inputs 'propagated-inputs)
> +                   ,@(maybe-inputs test-inputs 'native-inputs)
> +                   (home-page ,home-page)
> +                   (synopsis ,synopsis)
> +                   (description ,description)
> +                   (license ,(license->symbol license)))
> +                ;; Flatten the nested lists and return the upstream
> +                ;; dependencies.
> +                upstream-dependencies))))))))

I don’t see anything being flattened here?

--
Ricardo





reply via email to

[Prev in Thread] Current Thread [Next in Thread]