bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24450: [PATCHv2] Re: pypi importer outputs strange character series


From: Maxim Cournoyer
Subject: bug#24450: [PATCHv2] Re: pypi importer outputs strange character series in optional dependency case.
Date: Wed, 12 Jun 2019 12:00:50 +0900

Ricardo Wurmus <address@hidden> writes:

> Next up: Seven of Nine, tertiary adjunct of unimatrix zero one:

Ehe! I had to look up the reference; I'm not much of a Star Trek fan
obviously :-P.

>> From 37e499d5d5d5f690aa0a065c730e13f6a31dd30d Mon Sep 17 00:00:00 2001
>> From: Maxim Cournoyer <address@hidden>
>> Date: Thu, 28 Mar 2019 23:12:26 -0400
>> Subject: [PATCH 7/9] import: pypi: Include optional test inputs as
>>  native-inputs.
>>
>> * guix/import/pypi.scm (maybe-inputs): Add INPUT-TYPE argument, and use it.
>> (test-section?): New predicate.
>> (parse-requires.txt): Collect the optional test inputs, and return them as 
>> the
>> second element of the returned list.
>
> AFAICT parse-requires.txt now returns a list of pairs, but used to
> return a plain list before.  Is this correct?

Right, a list of two lists to be technically correct.

>>  (define (parse-requires.txt requires.txt)
>> -  "Given REQUIRES.TXT, a Setuptools requires.txt file, return a list of
>> -requirement names."
>> -  ;; This is a very incomplete parser, which job is to select the 
>> non-optional
>> -  ;; dependencies and strip them out of any version information.
>> +  "Given REQUIRES.TXT, a Setuptools requires.txt file, return a pair of 
>> requirements.
>> +
>> +The first element of the pair contains the required dependencies while the
>> +second the optional test dependencies.  Note that currently, optional,
>> +non-test dependencies are omitted since these can be difficult or expensive 
>> to
>> +satisfy."
>> +
>> +  ;; This is a very incomplete parser, which job is to read in the 
>> requirement
>> +  ;; specification lines, and strip them out of any version information.
>>    ;; Alternatively, we could implement a PEG parser with the (ice-9 peg)
>>    ;; library and the requirements grammar defined by PEP-0508
>>    ;; (https://www.python.org/dev/peps/pep-0508/).
>
> Does it really return a pair?  Or a list of pairs?  Or is it a
> two-element list of lists?

The latter! I've fixed the docstring accordingly.

>>    (call-with-input-file requires.txt
>>      (lambda (port)
>> -      (let loop ((result '()))
>> +      (let loop ((required-deps '())
>> +                 (test-deps '())
>> +                 (inside-test-section? #f)
>> +                 (optional? #f))
>>          (let ((line (read-line port)))
>> -          ;; Stop when a section is encountered, as sections contains 
>> optional
>> -          ;; (extra) requirements.  Non-optional requirements must appear
>> -          ;; before any section is defined.
>> -          (if (or (eof-object? line) (section-header? line))
>> +          (if (eof-object? line)
>>                ;; Duplicates can occur, since the same requirement can be
>>                ;; listed multiple times with different conditional markers, 
>> e.g.
>>                ;; pytest >= 3 ; python_version >= "3.3"
>>                ;; pytest < 3 ; python_version < "3.3"
>> -              (reverse (delete-duplicates result))
>> +              (map (compose reverse delete-duplicates)
>> +                   (list required-deps test-deps))
>
> Looks like a list of lists to me.  “delete-duplicates” now won’t delete
> a name that is in both “required-deps” as well as in “test-deps”.  Is
> this acceptable?

It is acceptable, as this corner case cannot exist given the current
code (a requirement can exist in either required-deps or test-deps, but
never in both). It also doesn't make sense that a run time requirement
would also be listed as a test requirement, so that corner case is not
likely to exist in the future either.

> Personally, I’m not a fan of using data structures for returning
> multiple values, because we can simply return multiple values.

I thought the Guile supported multiple values return value would be
great here as well, but I've found that for this specific case here, a
list of lists worked better, since the two lists contain requirements to
be processed the same, which "map" can readily do (i.e. less ceremony is
required).

> Or we could have more than just strings.  The meaning of these strings
> is provided by the bin into which they are thrown — either
> “required-deps” or “test-deps”.  It could be an option to collect tagged
> values instead and have the caller deal with filtering.

Sounds neat, but I'd rather punt on this one for now.

>>  (define (parse-wheel-metadata metadata)
>> -  "Given METADATA, a Wheel metadata file, return a list of requirement 
>> names."
>> +  "Given METADATA, a Wheel metadata file, return a pair of requirements.
>> +
>> +The first element of the pair contains the required dependencies while the 
>> second the optional
>> +test dependencies.  Note that currently, optional, non-test dependencies are
>> +omitted since these can be difficult or expensive to satisfy."
>>    ;; METADATA is a RFC-2822-like, header based file.
>
> This sounds like this is going to duplicate the previous procedures.

Instead of duplicating the docstring, I'm now referring to that of
PARSE-REQUIRES.TXT for PARSE-WHEEL-METADATA. The two procedures share
the same interface, but implement a different parser, which consist of
mostly a loop + conditional branches. IMHO, it's not worth, or even,
desirable to try to merge those two parsers into one; I prefer to keep
the logic of two distinct parsers separate.

>>    (define (requires-dist-header? line)
>>      ;; Return #t if the given LINE is a Requires-Dist header.
>> -    (regexp-match? (string-match "^Requires-Dist: " line)))
>> +    (string-match "^Requires-Dist: " line))
>>
>>    (define (requires-dist-value line)
>>      (string-drop line (string-length "Requires-Dist: ")))
>>
>>    (define (extra? line)
>>      ;; Return #t if the given LINE is an "extra" requirement.
>> -    (regexp-match? (string-match "extra == " line)))
>> +    (string-match "extra == '(.*)'" line))
>
> These hunks should be part of the previous patch where they were
> introduced.  (See my comments there about “regexp-match?”.)

Done.

>> +  (define (test-requirement? line)
>> +    (let ((extra-label (match:substring (extra? line) 1)))
>> +      (and extra-label (test-section? extra-label))))
>
> You can use “and=>” instead of binding a name:
>
>     (and=> (match:substring (extra? line) 1) test-section?)

Neat! I still don't have the reflex to use "and=>", thanks for reminding
me about it!

>>    (call-with-input-file metadata
>>      (lambda (port)
>> -      (let loop ((requirements '()))
>> +      (let loop ((required-deps '())
>> +                 (test-deps '()))
>>          (let ((line (read-line port)))
>> -          ;; Stop at the first 'Provides-Extra' section: the non-optional
>> -          ;; requirements appear before the optional ones.
>>            (if (eof-object? line)
>> -              (reverse (delete-duplicates requirements))
>> +              (map (compose reverse delete-duplicates)
>> +                   (list required-deps test-deps))
>>                (cond
>>                 ((and (requires-dist-header? line) (not (extra? line)))
>>                  (loop (cons (specification->requirement-name
>>                               (requires-dist-value line))
>> -                            requirements)))
>> +                            required-deps)
>> +                      test-deps))
>> +               ((and (requires-dist-header? line) (test-requirement? line))
>> +                (loop required-deps
>> +                      (cons (specification->requirement-name 
>> (requires-dist-value line))
>> +                            test-deps)))
>>                 (else
>> -                (loop requirements)))))))))
>> +                (loop required-deps test-deps))))))))) ;skip line
>
> Could you refactor this so that the other parser can be reused?  If not,
> the same comments about “if” and “cond” and the use of pairs/lists
> instead of “values” applies here.

Done w.r.t. using only "cond" rather than "if" + "cond". As explained
above, the docstring's body now refer to the documentation of
PARSE-REQUIRES.TXT; otherwise I prefer to keep the parsers separate.

>>  (define (guess-requirements source-url wheel-url archive)
>> -  "Given SOURCE-URL, WHEEL-URL and a ARCHIVE of the package, return a list
>> +  "Given SOURCE-URL, WHEEL-URL and an ARCHIVE of the package, return a list
>>  of the required packages specified in the requirements.txt file.  ARCHIVE 
>> will
>>  be extracted in a temporary directory."
>>
>> @@ -244,7 +289,10 @@ cannot determine package dependencies") (file-extension 
>> url))
>>             (metadata (string-append dirname "/METADATA")))
>>        (call-with-temporary-directory
>>         (lambda (dir)
>> -         (if (zero? (system* "unzip" "-q" wheel-archive "-d" dir metadata))
>> +         (if (zero?
>> +              (parameterize ((current-error-port (%make-void-port "rw+"))
>> +                             (current-output-port (%make-void-port "rw+")))
>> +                (system* "unzip" wheel-archive "-d" dir metadata)))
>>               (parse-wheel-metadata (string-append dir "/" metadata))
>>               (begin
>>                 (warning
>> @@ -283,32 +331,41 @@ cannot determine package dependencies") 
>> (file-extension url))
>>                       (warning
>>                        (G_ "Failed to extract file: ~a from source.~%")
>>                        requires.txt)
>> -                     '())))))
>> -          '())))
>> +                     (list '() '()))))))
>> +          (list '() '()))))
>
> I would like to see cosmetic changes like these three hunks in separate
> commits.

Done for the first two hunks; the last one isn't cosmetic; it changes
the default return from an empty list to a list of two empty lists.

>>  (define (compute-inputs source-url wheel-url archive)
>> -  "Given the SOURCE-URL of an already downloaded ARCHIVE, return a list of
>> -name/variable pairs describing the required inputs of this package.  Also
>> +  "Given the SOURCE-URL and WHEEL-URL of an already downloaded ARCHIVE, 
>> return
>> +a pair of lists, each consisting of a list of name/variable pairs, for the
>> +propagated inputs and the native inputs, respectively.  Also
>>  return the unaltered list of upstream dependency names."
>> -  (let ((dependencies
>> -         (remove (cut string=? "argparse" <>)
>> -                 (guess-requirements source-url wheel-url archive))))
>> -    (values (sort
>> -             (map (lambda (input)
>> -                    (let ((guix-name (python->package-name input)))
>> -                      (list guix-name (list 'unquote (string->symbol 
>> guix-name)))))
>> -                  dependencies)
>> -             (lambda args
>> -               (match args
>> -                 (((a _ ...) (b _ ...))
>> -                  (string-ci<? a b)))))
>> -            dependencies)))
>> +
>> +  (define (strip-argparse deps)
>> +    (remove (cut string=? "argparse" <>) deps))
>> +
>> +  (define (requirement->package-name/sort deps)
>> +    (sort
>> +     (map (lambda (input)
>> +            (let ((guix-name (python->package-name input)))
>> +              (list guix-name (list 'unquote (string->symbol guix-name)))))
>> +          deps)
>> +     (lambda args
>> +       (match args
>> +         (((a _ ...) (b _ ...))
>> +          (string-ci<? a b))))))
>> +
>> +  (define process-requirements
>> +    (compose requirement->package-name/sort strip-argparse))
>> +
>> +  (let ((dependencies (guess-requirements source-url wheel-url archive)))
>> +    (values (map process-requirements dependencies)
>> +            (concatenate dependencies))))
>
> Giving names to these processing steps is fine and improves clarity, but
> I’m not so comfortable about returning ad-hoc pairs *and* multiple
> values (like above).

Using "values" is required by the API of the recursive importer.

>> +            (match guix-dependencies
>> +              ((required-inputs test-inputs)
>> +               (values
>> +                `(package
>> +                   (name ,(python->package-name name))
>> +                   (version ,version)
>> +                   (source (origin
>> +                             (method url-fetch)
>> +                             ;; Sometimes 'pypi-uri' doesn't quite work due 
>> to mixed
>> +                             ;; cases in NAME, for instance, as is the case 
>> with
>> +                             ;; "uwsgi".  In that case, fall back to a full 
>> URL.
>> +                             (uri (pypi-uri ,(string-downcase name) 
>> version))
>> +                             (sha256
>> +                              (base32
>> +                               ,(guix-hash-url temp)))))
>> +                   (build-system python-build-system)
>> +                   ,@(maybe-inputs required-inputs 'propagated-inputs)
>> +                   ,@(maybe-inputs test-inputs 'native-inputs)
>> +                   (home-page ,home-page)
>> +                   (synopsis ,synopsis)
>> +                   (description ,description)
>> +                   (license ,(license->symbol license)))
>> +                ;; Flatten the nested lists and return the upstream
>> +                ;; dependencies.
>> +                upstream-dependencies))))))))
>
> I don’t see anything being flattened here?

Good catch! Seems a remnant of something that is now gone :-).

Thanks for the review.

Maxim





reply via email to

[Prev in Thread] Current Thread [Next in Thread]