guix-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#51838] [PATCH v6 03/41] guix: node-build-system: Add JSON utilities


From: Liliana Marie Prikler
Subject: [bug#51838] [PATCH v6 03/41] guix: node-build-system: Add JSON utilities.
Date: Fri, 31 Dec 2021 11:18:00 +0100
User-agent: Evolution 3.42.1

Hi,

Am Freitag, dem 31.12.2021 um 00:22 -0500 schrieb Philip McGrath:
> I agree that these functions ultimately belong in their own file, and
> I'd even had the name (guix build json-utils) in mind.
> 
> I put them in (guix build node-build-system) for now because, if they
> were in (guix build json-utils), they would have to be exported, at 
> which point their API would have to be relatively stable, and I
> didn't  want designing them to block, or to be rushed by, the rest of
> this series. Now, maybe consensus on the json-utils will turn out to
> be the easy part! But my high-level question is, in your view, do any
> of the points I'm about to respond to block this patch series?
We can certainly do them inside node-build-system, but I'd much prefer
if they took up less vertical real-estate.  I hope we can agree on
that.

> On 12/30/21 13:18, Liliana Marie Prikler wrote:
> > Having argued for these procedures to be moved into their own file
> > in a separate mail, now it's time to bikeshed stylistic choices.
> > 
> > Am Donnerstag, dem 30.12.2021 um 02:38 -0500 schrieb Philip
> > McGrath:
> > > +(define (jsobject-ref js key failure-result)
> > > +  "Return the value assosciated with KEY in the json object JS. 
> > > If
> > > KEY is not
> > > +found and FAILURE-RESULT is a procedure, it is called in tail
> > > position
> > > with
> > > +zero arguments.  Otherwise, FAILURE-RESULT is returned."
> > > +  ;; TODO: `failure-result` should be optional, but should the
> > > default
> > > +  ;; `failure-result` be #f (like `assoc-ref`), a thunk raising
> > > an
> > > exception,
> > > +  ;; '(@), or something else?  Keep it mandatory until we
> > > discuss and
> > > decide.
> > > +  (match js
> > > +    (('@ . alist)
> > > +     (match (assoc key alist)
> > > +       (#f
> > > +        (if (procedure? failure-result)
> > > +            (failure-result)
> > > +            failure-result))
> > > +       ((_ . value)
> > > +        value)))))
> > We can safely replace failure-result by Guile's DEFAULT and leave
> > error handling to the user.
> 
> I don't care whether we call it DEFAULT or FAILURE-RESULT.
This is not just a question of naming, but also it doesn't really make
sense to call a thunk here.  Just take DEFAULT as a value.

> I agree that it should not raise an exception by default. My current 
> thinking is that '(@) would be a good default DEFAULT: it is useful
> for the common pattern of traversing or transforming nested objects,
> and, as you point out at the end of this email, explicitly typing #f
> (the other useful possibility) looks much more like normal Scheme
> than explicitly typing '(@).
The question here is whether '(@) is a good meaningful default or
whether we just want a different way of writing it, e.g. empty-object
or (empty-object).

> In my experience with [1] and [2] (the purely-functional dictionary 
> libraries I use most often), the special case for a procedure as
> DEFAULT is useful. I feel less strongly about it because it's
> relatively easy to work around for JSON, since you can pick some non-
> JSON signal value, but it also seems to have especially little
> downside for JSON, since (guix build json) will never have a
> procedure in a valid JSON object. In addition to raising exceptions
> and other control flow, it's useful for default values that are
> expensive to produce.
That's all nice and dandy, but given that a valued DEFAULT is simpler
both in terms of interface and implementation, I think it is The Right
Thing here.  For error handling purposes, the Guile way of doing things
is to produce a value outside of the expected range (such as #f,
*unspecified* or a locally defined gensym) and check against it.

> [1]: https://docs.racket-lang.org/reference/hashtables.html
> [2]: https://docs.racket-lang.org/reference/dicts.html
> 
> > 
> > > +(define (alist-pop alist key)
> > > +  "Return two values: the first pair in ALIST with the given KEY
> > > in its
> > > +'car' (or #f, if no such pair exists) and an assosciation list
> > > like (and
> > > +potentially sharing storage with) ALIST, but with no entry for
> > > KEY."
> > > +  (match (assoc key alist)
> > > +    ;; If key isn't present, we don't need to do any allocation
> > > +    (#f
> > > +     (values #f alist))
> > > +    (found
> > > +     (values found
> > > +             ;; Because we have `found`, we can find it more
> > > +             ;; efficiently this time with `eq?`. We avoid using
> > > +             ;; `delq` because it would copy pairs in a shared
> > > +             ;; tail. We assume a sufficiently smart compiler to
> > > +             ;; handle "tail recursion modulo cons" (vid. e.g.
> > > Indiana
> > > +             ;; University Technical Report No. 19, Friedman &
> > > Wise
> > > +             ;; 1975) at least as efficiently as a hand-written
> > > +             ;; tail-recursive implementation with an
> > > accumulator.
> > > +             (let loop ((alist alist))
> > > +               (match alist
> > > +                 ;; We know that `found` is present,
> > > +                 ;; so no need to check for '()
> > > +                 ((this . alist)
> > > +                  (if (eq? this found)
> > > +                      alist
> > > +                      (cons this (loop alist))))))))))
> > I think this can be more efficiently be done in a "single" loop.
> > 
> >    (let loop ((rest alist)
> >               (previous '()))
> >      (match rest
> >        (() (values #f alist))
> >        ((first . rest)
> >         (if (eq? (car first) key)
> >             (values first (reverse! previous rest))
> >             (loop rest (cons first previous))))))
> > 
> 
> I'll admit to a Racket bias, but, having just eliminated the use of 
> 'assoc-set!', I'm loathe to start mutating pairs (even correctly). To
> quote a bit from the SRFI-1 spec for 'append-reverse!', "note that
> this pattern of iterative computation followed by a reverse can
> frequently be rewritten as a recursion, dispensing with the reverse
> and append-reverse steps, and shifting temporary, intermediate
> storage from the heap to the stack, which is typically a win for
> reasons of cache locality and eager storage reclamation." (See how
> 'set-cdr!' can crash safe Chez Scheme! 
> <https://github.com/cisco/ChezScheme/issues/599>)
> 
> IIUC, using SRFI-1's 'span' would lead to the same situation.
For the record, we can use the non-destructive append and reverse here
at the expense of more copying.  If done in terms of SRFI-1 span, we
would not need reverse as far as I understand.

> > Also, I don't think your version is tail-recursive.  (loop alist)
> > is not in tail position from what I can tell.
> 
> Yes, "tail recursion modulo cons" refers to a compiler optimization
> for functions which are _not_ tail recursive. For full details, see
> the Friedman & Wise 1975 tech report I cited at 
> <https://legacy.cs.indiana.edu/ftp/techreports/TR19.pdf> (or various 
> other articles), but, as briefly as I can: The optimization rests on
> the observation that many recursive functions, like the classic
> definition of 'map':
> 
>      (define (map f lst)
>        (match lst
>          (()
>           '())
>          ((this . lst)
>           (cons (f this)
>                 (map f lst)))))
> 
> are nearly tail-recursive, and the only real work remaining to be
> done in the continuation of the recursive call is to fill in the cdr
> of the pair. Thus, a compiler can safely transform this code into a
> truly tail-recursive implementation:
> 
>      (define (map f lst)
>        (match lst
>          (()
>           '())
>          ((this . lst)
>           (define ret (list (f this)))
>           (let loop ((dest ret)
>                      (lst lst))
>             (match lst
>               ((this . lst)
>                (define new (list (f this)))
>                (set-cdr! dest new)
>                (loop new lst))
>               (()
>                ret))))))
> 
> Unlike the Proper Implementation of Tail Calls (so-called "tail-call 
> optimization"), handling "tail recursion modulo cons" truly is an 
> optimization: it does not change the space complexity of the
> function. But it can allow the compiler to generate whatever code it
> thinks will work best with its collector/allocator and
> continuation/"call stack" implementation.
> 
> (The optimizations applies to constructors in general, not just
> 'cons', and a compiler can safely apply it to values that are
> immutable from the perspective of the source language.)
I'm not aware to which extent Guile implements tail recursion modulo
cons and I'd argue neither are you until you dig down into disassembly.
I think it's better here to avoid patterns from Racket that would feel
foreign to Guilers, particularly if you have to explain them with
reference to a paper (we already get hate for referring to Wingo's fold
for XML handling).

In principle, what you're supposing is that a sufficiently smart
compiler could rewrite

  (let ((before after (span PRED mylist))) (append before after))

to (list-copy mylist), which as far as I'm aware Guile currently
doesn't.  It could be argued that it would start doing so once I cast
some magic incantations, but I wouldn't count on it without reading the
disassembly.

> > That's a pretty long comment around something that could be done
> > with call-with-values or SRFI-71 let.  I think one of these two
> > should be preferred.
> > 
> > Note that both our versions of alist-pop only pop the first key (as
> > they should).  This means that alist-delete* should really be
> > called alist-delete-1 as in "remove the first pair in ALIST
> > belonging to KEY".
> > For the larger JSON handling below, this makes no difference
> > however.
> 
> Here I was using '*' to mean "a slightly altered version of", as with
> 'letrec' and 'letrec*', but, yes, since the other functions defined
> here use '*' to mean "zero or more times", the name is confusing: I
> think I'd just call it 'alist-delete' and not import (srfi srfi-1)'s
> version.
That's not the issue here, the issue is that the behaviour also differs
from alist-delete!

> The comment may be unnecessarily long ... the essence of what I was 
> trying to explain is that, in all of these implementations, I've
> tried to avoid unnecessary allocation. Being able to rely on
> 'alist-delete', and more generally 'alist-pop', not to needlessly
> copy the "spine" of the list lets later functions use them
> unconditionally.
Which again would be simpler if we used SRFI-1 span or another
primitive that produced (head item spine) as three outputs.  WDYT?

> Why would you prefer 'call-with-values' or SRFI-71 over 'define-
> values'?  The style guide against which I'm used to working [3]
> generally prefers internal definitions, to avoid rightward drift.
> 
> [3]: 
> https://docs.racket-lang.org/style/Choosing_the_Right_Construct.html#%28part._.Definitions%29
Again, that's a racketism.  To avoid rightward drift, we either
carefully choose where to indent or simplify our definitions to no
longer drift.  A single let in a function spanning three lines is
hardly an issue w.r.t. rightward drift.
+(define (alist-set alist key value)

> > Is order relevant here?  Because we could just as well reimplement
> > our alist-delete* loop and cons the replacement onto the rest. 
> > WDYT?
> 
> Relying on order for JSON objects is non-interoperable, per RFC 8259
> §4. I'm not intending for these alist procedures to be exported, so
> I'm not trying to handle any more general case than that, as I
> explain in the comments at the top of the file.
> 
> I'm not sure what the advantage would be to reimplementing the 
> 'alist-delete' loop here.
Fair enough, the question was however not so much what is required per
RFC, but rather if there is a natural feel of order to package.json
that we ought not disturb.  Particularly, putting dependencies before
name and version could be confusing to whoever needs to debug a delete-
dependencies phase gone wrong.

> So you would make 'jsobject-set*' a macro? When you say, "with
> FIELD1, FIELD2 being identifiers", do you mean that the macro should
> convert them to strings at compile-time? 
Yes to both.

> While, if I were designing a JSON representation, I'd much prefer to
> use symbols for the object keys, I think it would be confusing to use
> strings everywhere else but magic symbols here.
I don't think I agree here.  Assuming that jsobject-set* is a primitive
we need at the moment to be defined (rather than the more generic
jsobject-update*), I think using identifiers for JSON keys in our
rewriters would be a good abstraction.  That being said, this is not
necessarily a blocker, just a weird interface imo.
> 

> > Which alist-update* are you referring to here?  Either way, the
> > failure-result to default argument from above applies, but we could
> > keyword it.
> 
> Ah, I guess read that as, "Plus, making 'default' mandatory helps
> make 'jsobject-update' consistent with 'jsobject-update*'."
Fair enough, it's okay to admit one's typos :)
> > 

> > SRFI-71 let says hi.  Also the ordering question applies.  I'm
> > starting to think we should implement alist-pop, alist-set and
> > alist-update in terms of a single more powerful function producing
> > three values (or SRFI-1 span).
> 
> Same question again re 'define-values'.
Same answer.

> My intent in creating 'alist-pop' was to have a primitive that would 
> work for both 'alist-update' and 'alist-delete', and thereby 
> 'alist-set'. Returning the prefix and the tail separately would
> involve either extra allocation or mutating pairs.  Since order never
> matters in this context, why pay that price?
Again, our consumers are not just machines, but also human readers who
might value that order.  It is nice that you're paying attention to
allocated objects, but I think you are blind to some allocations that
you are accustomed to being reasoned away by Racket.
> 

> > Same default argument.  Cons inside.
> > I think having a single combine function taking (k a b) would be
> > less confusing than having two.  Is there a rationale for the form
> > you chose?
> 
> I based this function in particular on 'hash-union' from
> 'racket/hash' [6], which uses these keywords. (But in 'hash-union',
> collisions trigger an exception by default, and it requires at least
> one argument, because otherwise it would be unclear what key-
> comparison function the result should use.)
> 
> Having '#:combine' in addition to '#:combine/key' is ultimately just
> a convenience, but it is quite a nice convenience in practice. It is
> quite rare, in my experience, for a '#:combine' function to actually
> depend on the key: it might depend on the type of the value, but,
> very often, it  unconditionally applies to all keys. Using
> '#:combine' is particularly nice when using a combination function
> that already exists, like 'append' or '+'.
Fair enough, but what about having #:combine be your #:combine/key and
#:default-combine be your #:combine in that keyword soup?  It takes up
more horizontal real-estate (not that we wouldn't write those
vertically anyway).

Alternatively, this would be a good place to use procedure generators
for once.  (ignore-key combinator) takes a procedure that takes two
arguments (such as 'append' or '+') and produces one that takes three
and ignores the initial key.  This would also help making the interface
and implementation simpler: (json-union combinator obj ...)
> > 

> > > +  (with-atomic-json-file-replacement "package.json"
> > > +    (lambda (pkg-meta)
> > > +      (jsobject-update*
> > > +       pkg-meta
> > > +       "devDependencies" '(@) resolve-dependencies
> > > +       "dependencies" '(@) (lambda (deps)
> > > +                             (resolve-dependencies
> > > +                              (jsobject-union
> > > +                               (jsobject-ref pkg-meta
> > > "peerDependencies" '(@))
> > > +                               deps))))))
> > >     #t)
> > We should probably add a function to our js utils that "generates
> > anempty object", because '(@) is quite confusing to see in these
> > circumstances.  Otherwise LGTM with the aforementioned caveats.
> 
> I'm not sure what to call it: it would have to be short, or people
> (me, at least) might end up writing '(@) anyway.
'none', 'epsilon', '@psilon' would be nice "short" ways of writing
"empty object".

> Also, IIUC Guile doesn't actually prevent you from mutating quoted
> constant pairs, so a function would have to allocate a fresh pair to
> be robust.
That is correct, so we do have to be careful with our other primitives
here.

> It's a somewhat odd idea, but how about this?
> 
>      (define-syntax |{}| (identifier-syntax '(@)))
We can currently do without the bar escapes, but I'd prefer not to do
either, since it'd appear as similar line noise to '(@).  Having a
readable name is in my opinion to be preferred.  That being said,
identifier-syntax is the right direction in general imo.

> Alternatively, if we give up the thunk special case for 'default' 
> values, we could do:
> 
>      (define jsobject-update
>        (case-lambda
>          ((js key updater)
>           (jsobject-update js key '(@) updater))
>          ((js key default updater)
>           ...)))
> [...]
For the record, I think the only JSON primitive we need here is

(rewrite-json-file "package.json"
  ((dependencies) [do stuff with dependencies bound])
  ((devDependencies) [do stuff with devDependencies bound])
  [...])

wherein we can convert dependencies and devDependencies to and from
alists using alist->json-object and json-object->alist.  The rest can
be done using [a]list stuff from Guile core and SRFI-1 with one or two
amendments of our own.  WDYT?

The delete-dependencies phase would then be defined as 
  (define (delete-dependencies . etc)
    (define (%delete-dependencies (obj)
              (let ((alist (json-object->alist obj)))
                (alist->json [actually delete the dependencies]))))
    (rewrite-json-file "package.json"
      ((dependencies) (%delete-dependencies dependencies))
      ((devDependencies) (%delete-dependencies devDependencies))))

Pardon me not writing out etc, but I think you get the idea.

As for what happens if a value is unbound, we could make it 
  ((key [default]) code)
so imagine writing (dependencies none) instead of (dependencies) for
safety.

Cheers 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]