Re: documentation for (web ...)

guile-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: documentation for (web ...)

From:	Neil Jerram
Subject:	Re: documentation for (web ...)
Date:	Thu, 23 Dec 2010 23:51:26 +0000
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)
address@hidden (Ludovic Courtès) writes:

> Hello!
>
> Andy Wingo <address@hidden> writes:
>
>> I was looking at documenting the recent web stuff. My idea is to make a
>> new section in the Guile Modules chapter, after POSIX, with the intro
>> that the web is the new POSIX (sorta).
>
> I’m not keen on the comparison to POSIX.  For one, POSIX is for
> operating systems, and the web is no substitute to operating systems.
> In addition, what makes it /look/ like an operating system, i.e., the
> fact that many applications can run “on the web”, in the browser, is
> largely software as a service (SaaS), which I’d rather not promote.
>
> Mind you, I do like the idea of having ready-to-use web tools in Guile,
> and prominent in the manual.
>
> Anyway, my 2 opinionated ¢.  ;-)

I agree with Ludo that the comparison with POSIX isn't quite right.  But
on the other hand I tend to discount that feeling, because I think it's
more fun for multiple "voices" to come through in different parts of the
manual.  Overall this is another lovely piece of doc from you; following
are some thoughts and comments that occurred to me on reading through.

> 7.3 HTTP, the Web, and All That
> ===============================
> 
> When Guile started back in the mid-nineties, the GNU system was still
> focused on producing a good POSIX implementation.  This is why Guile's
> POSIX support is good, and has been so for a while.
> 
>    But times change, and in a way these days the web is the new POSIX: a
> standard and a motley set of implementations on which much computing is
> done.  So today's Guile also supports the web at the programming
> language level, by defining common data types and operations for the
> technologies underpinning the web: URIs, HTTP, and XML.
> 
>    It is particularly important to define native web data types.  Though
> the web is text in motion, programming the web in text is like
> programming with `goto': muddy, and error-prone.  Most current security
> problems on the web are due to treating the web as text instead of as
> instances of the proper data types.

This is an interesting point of view, and I would certainly like more
exposition of it.  I wonder if that will be coming later?

[After reading all through: I'd say you've demonstrated that data types
are good, but haven't shown any link with security problems, so the hook
here remains dangling.]

>    In addition, common web data types help programmers to share code.

Also, I guess it's not totally clear at this point what you mean by web
data types, but perhaps that will become clear as we go on.

[It did]

>    Well.  That's all very nice and opinionated and such, but how do I
> use the thing?  Read on!
> 
> * Menu:
> 
> * URIs::                        Universal Resource Identifiers.
> * HTTP::                        The Hyper-Text Transfer Protocol.
> * HTTP Headers::                How Guile represents specific header values.
> * Requests::                    HTTP requests.
> * Responses::                   HTTP responses.
> * Web Server::                  Serving HTTP to the internet.
> * Web Examples::                How to use this thing.
> 
> 
> File: guile.info,  Node: URIs,  Next: HTTP,  Up: Web
> 
> 7.3.1 Universal Resource Identifiers
> ------------------------------------
> 
> Guile provides a standard data type for Universal Resource Identifiers
> (URIs), as defined in RFC 3986.
> 
>    The generic URI syntax is as follows:
> 
>      URI := scheme ":" ["//" [userinfo "@"] host [":" port]] path \
>             [ "?" query ] [ "#" fragment ]
> 
>    So, all URIs have a scheme and a path. Some URIs have a host, and
> some of those have ports and userinfo. Any URI might have a query part
> or a fragment.
> 
>    Userinfo is something of an abstraction, as some legacy URI schemes
> allowed userinfo of the form `USERNAME:PASSWD'.  Passwords don't belong
> in URIs, so the RFC does not want to condone this, but neither can it
> say that what is before the `@' sign is just a username, so the RFC
> punts on the issue and calls it "userinfo".
> 
>    Also, strictly speaking, a URI with a fragment is a "URI reference".
> A fragment is typically not serialized when sending a URI over the
> wire; that is, it is not part of the identifier of a resource.  It only
> identifies a part of a given resource.

I found that a bit tricky to understand.  I think an example of what you
mean is that a web browser would only request the URI up to and
excluding the #, and process the #... part itself (by scrolling to that
point in the page).  It might help to say that.

>  But it's useful to have a field
> for it in the URI record itself, so we hope you will forgive the
> inconsistency.
> 
>      (use-modules (web uri))
> 
>    The following procedures can be found in the `(web uri)' module.
> Load it into your Guile, using a form like the above, to have access to
> them.
> 
>  -- Function: build-uri scheme [#:userinfo] [#:host] [#:port] [#:path]
>           [#:query] [#:fragment] [#:validate?]

Why is the path arg not mandatory?

>      Construct a URI object. If VALIDATE? is true, also run some
>      consistency checks to make sure that the constructed URI is valid.
> 
>  -- Function: uri? x
>  -- Function: uri-scheme uri
>  -- Function: uri-userinfo uri
>  -- Function: uri-host uri
>  -- Function: uri-port uri
>  -- Function: uri-path uri
>  -- Function: uri-query uri
>  -- Function: uri-fragment uri
>      A predicate and field accessors for the URI record type.
> 
>  -- Function: declare-default-port! scheme port
>      Declare a default port for the given URI scheme.
> 
>      Default ports are for printing URI objects: a default port is not
>      printed.

Does this really belong here?  Seems like mixing a bit of the `model'
into the `view'.  I'd expect a URI without an explicit port to give

  (uri-port uri) => #f

and that if I do

  (set! (uri-port uri) 80)

the :80 would be there in the string representation of the URI.

That's a mostly theoretical point though; I admit I haven't thought
through what is most _useful_.  Although, often what is most useful is
for an API to behave as most programmers would expect it to.

>  -- Function: parse-uri string
>      Parse STRING into a URI object. Returns `#f' if the string could
>      not be parsed.
> 
>  -- Function: unparse-uri uri
>      Serialize URI to a string.

Or uri->string ?  And I guess parse-uri could be string->uri.
Cf. string->number and number->string.

>  -- Function: uri-decode str [#:charset]
>      Percent-decode the given STR, according to CHARSET.
> 
>      Note that this function should not generally be applied to a full
>      URI string. For paths, use split-and-decode-uri-path instead. For
>      query strings, split the query on `&' and `=' boundaries, and
>      decode the components separately.
> 
>      Note that percent-encoded strings encode _bytes_, not characters.
>      There is no guarantee that a given byte sequence is a valid string
>      encoding. Therefore this routine may signal an error if the decoded
>      bytes are not valid for the given encoding. Pass `#f' for CHARSET
>      if you want decoded bytes as a bytevector directly.

So the return value is a bytevector if CHARSET is #f, and a string if
not?

>  -- Function: uri-encode str [#:charset] [#:unescaped-chars]
>      Percent-encode any character not in UNESCAPED-CHARS.

UNESCAPED-CHARS is a vector, a list, ...?

>      Percent-encoding first writes out the given character to a

s/character/string

>      bytevector within the given CHARSET, then encodes each byte as
>      `%HH', where HH is the hexadecimal representation of the byte.
> 
>  -- Function: split-and-decode-uri-path path
>      Split PATH into its components, and decode each component,
>      removing empty components.
> 
>      For example, `"/foo/bar/"' decodes to the two-element list,
>      `("foo" "bar")'.

Presumably this does % decoding too, so it would be good to give another
example to show that.

>  -- Function: encode-and-join-uri-path parts
>      URI-encode each element of PARTS, which should be a list of
>      strings, and join the parts together with `/' as a delimiter.
> 
> 
> File: guile.info,  Node: HTTP,  Next: HTTP Headers,  Prev: URIs,  Up: Web
> 
> 7.3.2 The Hyper-Text Transfer Protocol
> --------------------------------------
> 
> The initial motivation for including web functionality in Guile, rather
> than rely on an external package, was to establish a standard base on
> which people can share code.  To that end, we continue the focus on data
> types by providing a number of low-level parsers and unparsers for
> elements of the HTTP protocol.
> 
>    If you are want to skip the low-level details for now and move on to
> web pages, *note Web Server::.  Otherwise, load the HTTP module, and
> read on.
> 
>      (use-modules (web http))
> 
>    The focus of the `(web http)' module is to parse and unparse
> standard HTTP headers, representing them to Guile as native data
> structures.  For example, a `Date:' header will be represented as a
> SRFI-19 date record (*note SRFI-19::), rather than as a string.
> 
>    Guile tries to follow RFCs fairly strictly--the road to perdition
> being paved with compatibility hacks--though some allowances are made
> for not-too-divergent texts.
> 
>    The first bit is to define a registry of parsers, validators, and
> unparsers, keyed by header name.  That is the function of the
> `<header-decl>' object.
> 
>  -- Function: make-header-decl sym name multiple? parser validator
>           writer
>  -- Function: header-decl? x
>  -- Function: header-decl-sym decl
>  -- Function: header-decl-name decl
>  -- Function: header-decl-multiple? decl
>  -- Function: header-decl-parser decl
>  -- Function: header-decl-validator decl
>  -- Function: header-decl-writer decl.
>      A constructor, predicate, and field accessors for the
>      `<header-decl>' type. The fields are as follows:
> 
>     `sym'
>           The symbol name for this header field, always in lower-case.
>           For example, `"Content-Length"' has a symbolic name of
>           `content-length'.
> 
>     `name'
>           The string name of the header, in its preferred
>           capitalization.
> 
>     `multiple?'
>           `#t' iff this header may appear multiple times in a message.
> 
>     `parser'
>           A procedure which takes a string and returns a parsed value.
> 
>     `validator'
>           A predicate, returning `#t' iff the value is valid for this
>           header.

Maybe say something here about validator function often being very
similar to parsing function?

>     `writer'
>           A writer, which writes a value to the port given in the
>           second argument.
> 
>  -- Function: declare-header! sym name [#:multiple?] [#:parser]
>           [#:validator] [#:writer]
>      Make a header declaration, as above, and register it by symbol and
>      by name.

Are the keyword args really optional?  If so, what are the defaults?

A possibly important point: what is the scope of the space in which
these header declarations are made?  My reason for asking is that this
infrastructure looks applicable for other HTTP-like protocols too, such
as SIP.  But the detailed rules for a given header in SIP may be
different from a header with the same name in HTTP, and hence different
header-decl objects would be needed.  Therefore, even though we claim no
other protocol support right now, perhaps we should anticipate that by
enhancing declare-header! so as to distinguish between HTTP-space and
other-protocol-spaces.

[After reading all through, I remain confused about exactly how general
this server infrastructure is intended to be]

>  -- Function: lookup-header-decl name
>      Return the HEADER-DECL object registered for the given NAME.
> 
>      NAME may be a symbol or a string. Strings are mapped to headers in
>      a case-insensitive fashion.
> 
>  -- Function: valid-header? sym val
>      Returns a true value iff VAL is a valid Scheme value for the
>      header with name SYM.

Note slight inconsistency in the two above deffns: "Return" vs
"Returns".

>    Now that we have a generic interface for reading and writing
> headers, we do just that.
> 
>  -- Function: read-header port
>      Reads one HTTP header from PORT. Returns two values: the header
>      name and the parsed Scheme value.

As multiple values?  Is that more helpful than as a cons?

> May raise an exception if the
>      header was known but the value was invalid.
> 
>      Returns #F for both values if the end of the message body was
>      reached (i.e., a blank line).

I'd find #<eof> more intuitive.

>  -- Function: parse-header name val
>      Parse VAL, a string, with the parser for the header named NAME.
> 
>      Returns two values, the header name and parsed value. If a parser
>      was found, the header name will be returned as a symbol. If a
>      parser was not found, both the header name and the value are
>      returned as strings.

Again, multiple values or a cons?

>  -- Function: write-header name val port
>      Writes the given header name and value to PORT. If NAME is a
>      symbol, looks up a declared header and uses that writer. Otherwise
>      the value is written using DISPLAY.
>
>  -- Function: read-headers port
>      Read an HTTP message from PORT, returning the headers as an
>      ordered alist.

s/Read/Read the headers of/  ?  i.e. Should the caller have already read
the request/response line?

>  -- Function: write-headers headers port
>      Write the given header alist to PORT. Doesn't write the final
>      \r\n, as the user might want to add another header.
> 
>    The `(web http)' module also has some utility procedures to read and
> write request and response lines.
> 
>  -- Function: parse-http-method str [start] [end]
>      Parse an HTTP method from STR. The result is an upper-case symbol,
>      like `GET'.
> 
>  -- Function: parse-http-version str [start] [end]
>      Parse an HTTP version from STR, returning it as a major-minor
>      pair. For example, `HTTP/1.1' parses as the pair of integers, `(1
>      . 1)'.
> 
>  -- Function: parse-request-uri str [start] [end]
>      Parse a URI from an HTTP request line. Note that URIs in requests
>      do not have to have a scheme or host name. The result is a URI
>      object.
> 
>  -- Function: read-request-line port
>      Read the first line of an HTTP request from PORT, returning three
>      values: the method, the URI, and the version.
> 
>  -- Function: write-request-line method uri version port
>      Write the first line of an HTTP request to PORT.
> 
>  -- Function: read-response-line port
>      Read the first line of an HTTP response from PORT, returning three
>      values: the HTTP version, the response code, and the "reason
>      phrase".
> 
>  -- Function: write-response-line version code reason-phrase port
>      Write the first line of an HTTP response to PORT.
> 
> 
> File: guile.info,  Node: HTTP Headers,  Next: Requests,  Prev: HTTP,  Up: Web
> 
> 7.3.3 HTTP Headers
> ------------------
> 
> The `(web http)' module defines parsers and unparsers for all headers
> defined in the HTTP/1.1 standard.  This section describes the parsed
> format of the various headers.
> 
>    We cannot describe the function of all of these headers, however, in
> sufficient detail.

I don't get the point here.

>  The interested reader would do well to download a
> copy of RFC 2616 and have it on hand.
> 
>    To begin with, we should make a few definitions:
> 
> "key-value list"
>      A key-value list is a list of values.  Each value may be a string,
>      a symbol, or a pair.  Known keys are parsed to symbols; otherwise
>      keys are left as strings.  Keys with values are parsed to pairs,
>      the car of which is the symbol or string key, and the cdr is the
>      parsed value.  Parsed values for known keys have key-dependent
>      formats.  Parsed values for unknown keys are strings.
> 
> "param list"
>      A param list is a list of key-value lists.  When serialized to a
>      string, items in the inner lists are separated by semicolons.
>      Again, known keys are parsed to symbols.
> 
> "quality"
>      A number of headers have quality values in them, which are decimal
>      fractions between zero and one indicating a preference for various
>      kinds of responses, which the server may choose to heed.  Given
>      that only three digits are allowed in the fractional part, Guile
>      parses quality values to integers between 0 and 1000 instead of
>      inexact numbers between 0.0 and 1.0.
> 
> "quality list"
>      A list of pairs, the car of which is a quality value.
> 
> "entity tag"
>      A pair, the car of which is an opaque string, and the cdr of which
>      is true iff the entity tag is a "strong" entity tag.

A bit of a conceptual stack has built up at this point.  i.e. I have no
idea why you're telling me this....

> 7.3.3.1 General Headers
> .......................
> 
> `cache-control'
>      A key-value list of cache-control directives. Known keys are
>      `max-age', `max-stale', `min-fresh', `must-revalidate',
>      `no-cache', `no-store', `no-transform', `only-if-cached',
>      `private', `proxy-revalidate', `public', and `s-maxage'.
> 
>      If present, parameters to `max-age', `max-stale', `min-fresh', and
>      `s-maxage' are all parsed as non-negative integers.
> 
>      If present, parameters to `private' and `no-cache' are parsed as
>      lists of header names, represented as symbols if they are known
>      headers or strings otherwise.

... but this is pretty quickly justifying the stuff above, so I think
the stack is actually OK.

> `connection'
>      A list of connection tokens.  A connection token is a string.
> 
> `date'
>      A SRFI-19 date record.
> 
> `pragma'
>      A key-value list of pragma directives.  `no-cache' is the only
>      known key.
> 
> `trailer'
>      A list of header names.  Known header names are parsed to symbols,
>      otherwise they are left as strings.
> 
> `transfer-encoding'
>      A param list of transfer codings.  `chunked' is the only known key.

OK, why a param list rather than key-value?  How are elements in the
second key-value list, say, different from elements in the first
key-value list?

> `upgrade'
>      A list of strings.
> 
> `via'
>      A list of strings.  There may be multiple `via' headers in ne
>      message.
> 
> `warning'
>      A list of warnings.  Each warning is a itself a list of four
>      elements: a code, as an exact integer between 0 and 1000, a host
>      as a string, the warning text as a string, and either `#f' or a
>      SRFI-19 date.
> 
>      There may be multiple `warning' headers in one message.
> 
> 7.3.3.2 Entity Headers
> ......................
> 
> `allow'
>      A list of methods, as strings.  Methods are parsed as strings
>      instead of `parse-http-method' so as to allow for new methods.
> 
> `content-encoding'
>      A list of content codings, as strings.
> 
> `content-language'
>      A list of language tags, as strings.
> 
> `content-length'
>      An exact, non-negative integer.
> 
> `content-location'
>      A URI record.
> 
> `content-md5'
>      A string.
> 
> `content-range'
>      A list of three elements: the symbol `bytes', either the symbol
>      `*' or a pair of integers, indicating the byte rage, and either
>      `*' or an integer, for the instance length.
> 
> `content-type'
>      A pair, the car of which is the media type as a string, and the
>      cdr is an alist of parameters, with strings as keys and values.
> 
>      For example, `"text/plain"' parses as `("text/plain")', and
>      `"text/plain;charset=utf-8"' parses as `("text/plain" ("charset" .
>      "utf-8"))'.
> 
> `expires'
>      A SRFI-19 date.
> 
> `last-modified'
>      A SRFI-19 date.
> 
> 
> 7.3.3.3 Request Headers
> .......................
> 
> `accept'
>      A param list.  Each element in the list indicates one media-range
>      with accept-params.  They only known key is `q', whose value is
>      parsed as a quality value.
> 
> `accept-charset'
>      A quality-list of charsets, as strings.
> 
> `accept-encoding'
>      A quality-list of content codings, as strings.
> 
> `accept-language'
>      A quality-list of languages, as strings.
> 
> `authorization'
>      A string.
> 
> `expect'
>      A param list of expectations.  The only known key is
>      `100-continue'.
> 
> `from'
>      A string.
> 
> `host'
>      A pair of the host, as a string, and the port, as an integer. If
>      no port is given, port is `#f'.
> 
> `if-match'
>      Either the symbol `*', or a list of entity tags (see above).
> 
> `if-modified-since'
>      A SRFI-19 date.
> 
> `if-none-match'
>      Either the symbol `*', or a list of entity tags (see above).
> 
> `if-range'
>      Either an entity tag, or a SRFI-19 date.
> 
> `if-unmodified-since'
>      A SRFI-19 date.
> 
> `max-forwards'
>      An exact non-negative integer.
> 
> `proxy-authorization'
>      A string.
> 
> `range'
>      A pair whose car is the symbol `bytes', and whose cdr is a list of
>      pairs. Each element of the cdr indicates a range; the car is the
>      first byte position and the cdr is the last byte position, as
>      integers, or `#f' if not given.
> 
> `referer'
>      A URI.
> 
> `te'
>      A param list of transfer-codings.  The only known key is
>      `trailers'.
> 
> `user-agent'
>      A string.
> 
> 7.3.3.4 Response Headers
> ........................
> 
> `accept-ranges'
>      A list of strings.
> 
> `age'
>      An exact, non-negative integer.
> 
> `etag'
>      An entity tag.
> 
> `location'
>      A URI.
> 
> `proxy-authenticate'
>      A string.
> 
> `retry-after'
>      Either an exact, non-negative integer, or a SRFI-19 date.
> 
> `server'
>      A string.
> 
> `vary'
>      Either the symbol `*', or a list of headers, with known headers
>      parsed to symbols.
> 
> `www-authenticate'
>      A string.

Obviously there's lots of substructure there (in WWW-Authenticate) that
we just don't support yet.  Is there a clear compatibility story for
if/when Guile is enhanced to parse that out?

I guess yes; calling code will just need something like

  (if (string? val)
      ;; An older Guile that doesn't parse authentication fully.
      (do-application-own-parsing)
      ;; A newer Guile that does parse authentication.
      (use-the-parsed-authentication-object))


> 
> File: guile.info,  Node: Requests,  Next: Responses,  Prev: HTTP Headers,  
> Up: Web
> 
> 7.3.4 HTTP Requests
> -------------------
> 
>      (use-modules (web request))
> 
>    The request module contains a data type for HTTP requests.  Note that
> the body is not part of the request, but the port is.  Once you have
> read a request, you may read the body separately, and likewise for
> writing requests.
> 
>  -- Function: build-request [#:method] [#:uri] [#:version] [#:headers]
>           [#:port] [#:meta] [#:validate-headers?]
>      Construct an HTTP request object. If VALIDATE-HEADERS? is true,
>      the headers are each run through their respective validators.
> 
>  -- Function: request?
>  -- Function: request-method
>  -- Function: request-uri
>  -- Function: request-version
>  -- Function: request-headers
>  -- Function: request-meta
>  -- Function: request-port
>      A predicate and field accessors for the request type.  The fields
>      are as follows:
>     `method'
>           The HTTP method, for example, `GET'.
> 
>     `uri'
>           The URI as a URI record.
> 
>     `version'
>           The HTTP version pair, like `(1 . 1)'.
> 
>     `headers'
>           The request headers, as an alist of parsed values.
> 
>     `meta'
>           An arbitrary alist of other data, for example information
>           returned in the `sockaddr' from `accept' (*note Network
>           Sockets and Communication::).
> 
>     `port'
>           The port on which to read or write a request body, if any.
> 
>  -- Function: read-request port [meta]
>      Read an HTTP request from PORT, optionally attaching the given
>      metadata, META.
> 
>      As a side effect, sets the encoding on PORT to ISO-8859-1
>      (latin-1), so that reading one character reads one byte. See the
>      discussion of character sets in "HTTP Requests" in the manual, for
>      more information.

That last sentence is OK for a docstring, but strange here _in_ the
manual.

And, where is that discussion?

>  -- Function: write-request r port
>      Write the given HTTP request to PORT.
> 
>      Returns a new request, whose `request-port' will continue writing
>      on PORT, perhaps using some transfer encoding.
> 
>  -- Function: read-request-body/latin-1 r
>      Reads the request body from R, as a string.
> 
>      Assumes that the request port has ISO-8859-1 encoding, so that the
>      number of characters to read is the same as the
>      `request-content-length'. Returns `#f' if there was no request
>      body.
> 
>  -- Function: write-request-body/latin-1 r body
>      Write BODY, a string encodable in ISO-8859-1, to the port
>      corresponding to the HTTP request R.
> 
>  -- Function: read-request-body/bytevector r
>      Reads the request body from R, as a bytevector. Returns `#f' if
>      there was no request body.
> 
>  -- Function: write-request-body/bytevector r bv
>      Write BODY, a bytevector, to the port corresponding to the HTTP
>      request R.
> 
>    The various headers that are typically associated with HTTP requests
> may be accessed with these dedicated accessors.  *Note HTTP Headers::,
> for more information on the format of parsed headers.
> 
>  -- Function: request-accept request [default='()]
>  -- Function: request-accept-charset request [default='()]
>  -- Function: request-accept-encoding request [default='()]
>  -- Function: request-accept-language request [default='()]
>  -- Function: request-allow request [default='()]
>  -- Function: request-authorization request [default=#f]
>  -- Function: request-cache-control request [default='()]
>  -- Function: request-connection request [default='()]
>  -- Function: request-content-encoding request [default='()]
>  -- Function: request-content-language request [default='()]
>  -- Function: request-content-length request [default=#f]
>  -- Function: request-content-location request [default=#f]
>  -- Function: request-content-md5 request [default=#f]
>  -- Function: request-content-range request [default=#f]
>  -- Function: request-content-type request [default=#f]
>  -- Function: request-date request [default=#f]
>  -- Function: request-expect request [default='()]
>  -- Function: request-expires request [default=#f]
>  -- Function: request-from request [default=#f]
>  -- Function: request-host request [default=#f]
>  -- Function: request-if-match request [default=#f]
>  -- Function: request-if-modified-since request [default=#f]
>  -- Function: request-if-none-match request [default=#f]
>  -- Function: request-if-range request [default=#f]
>  -- Function: request-if-unmodified-since request [default=#f]
>  -- Function: request-last-modified request [default=#f]
>  -- Function: request-max-forwards request [default=#f]
>  -- Function: request-pragma request [default='()]
>  -- Function: request-proxy-authorization request [default=#f]
>  -- Function: request-range request [default=#f]
>  -- Function: request-referer request [default=#f]
>  -- Function: request-te request [default=#f]
>  -- Function: request-trailer request [default='()]
>  -- Function: request-transfer-encoding request [default='()]
>  -- Function: request-upgrade request [default='()]
>  -- Function: request-user-agent request [default=#f]
>  -- Function: request-via request [default='()]
>  -- Function: request-warning request [default='()]
>      Return the given request header, or DEFAULT if none was present.
> 
>  -- Function: request-absolute-uri r [default-host] [default-port]
>      A helper routine to determine the absolute URI of a request, using
>      the `host' header and the default host and port.

Hmm, I think the provision of this data type needs a bit more
motivation.  It doesn't appear to offer much additional value, compared
with reading or writing the components of a request individually, and on
the other hand it appears to bake in assumptions about charsets and
content length that might not always be true.

> 
> File: guile.info,  Node: Responses,  Next: Web Server,  Prev: Requests,  Up: 
> Web
> 
> 7.3.5 HTTP Responses
> --------------------
> 
>      (use-modules (web response))
> 
>    As with requests (*note Requests::), Guile offers a data type for
> HTTP responses.  Again, the body is represented separately from the
> request.
> 
>  -- Function: response?
>  -- Function: response-version
>  -- Function: response-code
>  -- Function: response-reason-phrase response
>  -- Function: response-headers
>  -- Function: response-port
>      A predicate and field accessors for the response type.  The fields
>      are as follows:
>     `version'
>           The HTTP version pair, like `(1 . 1)'.
> 
>     `code'
>           The HTTP response code, like `200'.
> 
>     `reason-phrase'
>           The reason phrase, or the standard reason phrase for the
>           response's code.
> 
>     `headers'
>           The response headers, as an alist of parsed values.
> 
>     `port'
>           The port on which to read or write a response body, if any.
> 
>  -- Function: read-response port
>      Read an HTTP response from PORT, optionally attaching the given
>      metadata, META.
> 
>      As a side effect, sets the encoding on PORT to ISO-8859-1
>      (latin-1), so that reading one character reads one byte. See the
>      discussion of character sets in "HTTP Responses" in the manual,
>      for more information.

As above.

>  -- Function: build-response [#:version] [#:code] [#:reason-phrase]
>           [#:headers] [#:port]
>      Construct an HTTP response object. If VALIDATE-HEADERS? is true,
>      the headers are each run through their respective validators.
> 
>  -- Function: extend-response r k v . additional
>      Extend an HTTP response by setting additional HTTP headers K, V.
>      Returns a new HTTP response.

What does the ADDITIONAL arg mean?

>  -- Function: adapt-response-version response version
>      Adapt the given response to a different HTTP version. Returns a
>      new HTTP response.
> 
>      The idea is that many applications might just build a response for
>      the default HTTP version, and this method could handle a number of
>      programmatic transformations to respond to older HTTP versions
>      (0.9 and 1.0). But currently this function is a bit heavy-handed,
>      just updating the version field.

Interesting, and adds more value to the idea of the response object.
Why not for the request object too - are you assuming that Guile will
usually be acting as the HTTP server?  (Which I'm sure is correct, but
"usually" is not "always".)

>  -- Function: write-response r port
>      Write the given HTTP response to PORT.
> 
>      Returns a new response, whose `response-port' will continue writing
>      on PORT, perhaps using some transfer encoding.
> 
>  -- Function: read-response-body/latin-1 r
>      Reads the response body from R, as a string.
> 
>      Assumes that the response port has ISO-8859-1 encoding, so that the
>      number of characters to read is the same as the
>      `response-content-length'. Returns `#f' if there was no response
>      body.
> 
>  -- Function: write-response-body/latin-1 r body
>      Write BODY, a string encodable in ISO-8859-1, to the port
>      corresponding to the HTTP response R.
> 
>  -- Function: read-response-body/bytevector r
>      Reads the response body from R, as a bytevector. Returns `#f' if
>      there was no response body.
> 
>  -- Function: write-response-body/bytevector r bv
>      Write BODY, a bytevector, to the port corresponding to the HTTP
>      response R.
> 
>    As with requests, the various headers that are typically associated
> with HTTP responses may be accessed with these dedicated accessors.
> *Note HTTP Headers::, for more information on the format of parsed
> headers.
> 
>  -- Function: response-accept-ranges response [default=#f]
>  -- Function: response-age response [default='()]
>  -- Function: response-allow response [default='()]
>  -- Function: response-cache-control response [default='()]
>  -- Function: response-connection response [default='()]
>  -- Function: response-content-encoding response [default='()]
>  -- Function: response-content-language response [default='()]
>  -- Function: response-content-length response [default=#f]
>  -- Function: response-content-location response [default=#f]
>  -- Function: response-content-md5 response [default=#f]
>  -- Function: response-content-range response [default=#f]
>  -- Function: response-content-type response [default=#f]
>  -- Function: response-date response [default=#f]
>  -- Function: response-etag response [default=#f]
>  -- Function: response-expires response [default=#f]
>  -- Function: response-last-modified response [default=#f]
>  -- Function: response-location response [default=#f]
>  -- Function: response-pragma response [default='()]
>  -- Function: response-proxy-authenticate response [default=#f]
>  -- Function: response-retry-after response [default=#f]
>  -- Function: response-server response [default=#f]
>  -- Function: response-trailer response [default='()]
>  -- Function: response-transfer-encoding response [default='()]
>  -- Function: response-upgrade response [default='()]
>  -- Function: response-vary response [default='()]
>  -- Function: response-via response [default='()]
>  -- Function: response-warning response [default='()]
>  -- Function: response-www-authenticate response [default=#f]
>      Return the given request header, or DEFAULT if none was present.
> 
> 
> File: guile.info,  Node: Web Server,  Next: Web Examples,  Prev: Responses,  
> Up: Web
> 
> 7.3.6 Web Server
> ----------------
> 
> `(web server)' is a generic web server interface, along with a main
> loop implementation for web servers controlled by Guile.
> 
>      (use-modules (web server))
> 
>    The lowest layer is the `<server-impl>' object, which defines a set
> of hooks to open a server, read a request from a client, write a
> response to a client, and close a server.  These hooks - `open',
> `read', `write', and `close', respectively - are bound together in a
> `<server-impl>' object.  Procedures in this module take a
> `<server-impl>' object, if needed.
> 
>    A `<server-impl>' may also be looked up by name.  If you pass the
> `http' symbol to `run-server', Guile looks for a variable named `http'
> in the `(web server http)' module, which should be bound to a
> `<server-impl>' object.  Such a binding is made by instantiation of the
> `define-server-impl' syntax.  In this way the run-server loop can
> automatically load other backends if available.
> 
>    The life cycle of a server goes as follows:
> 
>   1. The `open' hook is called, to open the server. `open' takes 0 or
>      more arguments, depending on the backend,

How is that possible?  (immediate thought... perhaps it will be
explained later)

> and returns an opaque
>      server socket object, or signals an error.
> 
>   2. The `read' hook is called, to read a request from a new client.
>      The `read' hook takes one argument, the server socket.  It should

It feels surprising for the infrastructure to pass the server socket to
the read hook.  I'd expect the infrastructure to do the `accept' itself
and pass the client socket to the read hook.

Also, does the infrastructure assume that each client socket will only
be used for one request and response, and then closed?  Would it be hard
to remove that assumption, so that the <server-impl> idea is more
general?

>      return three values: an opaque client socket, the request, and the
>      request body. The request should be a `<request>' object, from
>      `(web request)'.  The body should be a string or a bytevector, or
>      `#f' if there is no body.
> 
>      If the read failed, the `read' hook may return #f for the client
>      socket, request, and body.
> 
>   3. A user-provided handler procedure is called, with the request and
>      body as its arguments.  The handler should return two values: the
>      response, as a `<response>' record from `(web response)', and the
>      response body as a string, bytevector, or `#f' if not present.  We
>      also allow the reponse to be simply an alist of headers, in which

s/reponse/response

>      case a default response object is constructed with those headers.

What about response status?  (perhaps represented as a "status" header,
a la modlisp)

>   4. The `write' hook is called with three arguments: the client
>      socket, the response, and the body.  The `write' hook returns no
>      values.
> 
>   5. At this point the request handling is complete. For a loop, we
>      loop back and try to read a new request.
> 
>   6. If the user interrupts the loop, the `close' hook is called on the
>      server socket.
> 
>    A user may define a server implementation with the following form:
> 
>  -- Function: define-server-impl name open read write close
>      Make a `<server-impl>' object with the hooks OPEN, READ, WRITE,
>      and CLOSE, and bind it to the symbol NAME in the current module.
> 
>  -- Function: lookup-server-impl impl
>      Look up a server implementation. If IMPL is a server
>      implementation already, it is returned directly. If it is a
>      symbol, the binding named IMPL in the `(web server IMPL)' module is
>      looked up. Otherwise an error is signaled.
> 
>      Currently a server implementation is a somewhat opaque type,
>      useful only for passing to other procedures in this module, like
>      `read-client'.
> 
>    The `(web server)' module defines a number of routines that use
> `<server-impl>' objects to implement parts of a web server.  Given that
> we don't expose the accessors for the various fields of a
> `<server-impl>', indeed these routines are the only procedures with any
> access to the impl objects. 

How general is <server-impl> hoping to be?  Correspondingly, is the (web
server) module name appropriate?

To me, "web" => "http", so (web server http) is a tautological name.
And in fact it sounds like you intend <server-impl> to cover more than
just web/HTTP, so I suppose it should be in a module like (server),
rather than (web server).

It seems we could do with some more server impls in order to validate
that the infrastructure is all defined correctly.  Time-permitting, I'd
like to play with writing modlisp support for this new system, analogous
to what I did already in guile-www.

>  -- Function: open-server impl open-params
>      Open a server for the given implementation. Returns one value, the
>      new server object. The implementation's `open' procedure is
>      applied to OPEN-PARAMS, which should be a list.
> 
>  -- Function: read-client impl server
>      Read a new client from SERVER, by applying the implementation's
>      `read' procedure to the server. If successful, returns three
>      values: an object corresponding to the client, a request object,
>      and the request body. If any exception occurs, returns `#f' for
>      all three values.

I think there's a one-request-per-connection assumption here, isn't
there?

>  -- Function: handle-request handler request body state
>      Handle a given request, returning the response and body.
> 
>      The response and response body are produced by calling the given
>      HANDLER with REQUEST and BODY as arguments.
> 
>      The elements of STATE are also passed to HANDLER as arguments, and
>      may be returned as additional values. The new STATE, collected
>      from the HANDLER's return values, is then returned as a list. The
>      idea is that a server loop receives a handler from the user, along
>      with whatever state values the user is interested in, allowing the
>      user's handler to explicitly manage its state.
> 
>  -- Function: sanitize-response request response body
>      "Sanitize" the given response and body, making them appropriate
>      for the given request.
> 
>      As a convenience to web handler authors, RESPONSE may be given as
>      an alist of headers, in which case it is used to construct a
>      default response. Ensures that the response version corresponds to
>      the request version. If BODY is a string, encodes the string to a
>      bytevector, in an encoding appropriate for RESPONSE. Adds a
>      `content-length' and `content-type' header, as necessary.
> 
>      If BODY is a procedure, it is called with a port as an argument,
>      and the output collected as a bytevector. In the future we might
>      try to instead use a compressing, chunk-encoded port, and call
>      this procedure later, in the write-client procedure. Authors are
>      advised not to rely on the procedure being called at any
>      particular time.
> 
>  -- Function: write-client impl server client response body
>      Write an HTTP response and body to CLIENT. If the server and
>      client support persistent connections, it is the implementation's
>      responsibility to keep track of the client thereafter, presumably
>      by attaching it to the SERVER argument somehow.

Ah, interesting, I guess this is what removes the
one-request-per-connection assumption.

>  -- Function: close-server impl server
>      Release resources allocated by a previous invocation of
>      `open-server'.
> 
>    Given the procedures above, it is a small matter to make a web
> server:
> 
>  -- Function: serve-one-client handler impl server state
>      Read one request from SERVER, call HANDLER on the request and
>      body, and write the response to the client. Returns the new state
>      produced by the handler procedure.
> 
>  -- Function: run-server handler [impl] [open-params] . state
>      Run Guile's built-in web server.
> 
>      HANDLER should be a procedure that takes two or more arguments,
>      the HTTP request and request body, and returns two or more values,
>      the response and response body.
> 
>      For example, here is a simple "Hello, World!" server:
> 
>            (define (handler request body)
>              (values '((content-type . ("text/plain")))
>                      "Hello, World!"))
>            (run-server handler)
> 
>      The response and body will be run through `sanitize-response'
>      before sending back to the client.
> 
>      Additional arguments to HANDLER are taken from STATE.  Additional
>      return values are accumulated into a new STATE, which will be used
>      for subsequent requests. In this way a handler can explicitly
>      manage its state.
> 
>      The default server implementation is `http', which accepts
>      OPEN-PARAMS like `(#:port 8081)', among others. See "Web Server"
>      in the manual, for more information.

Last sentence should be removed from the manual version of the
docstring.

> 
> File: guile.info,  Node: Web Examples,  Prev: Web Server,  Up: Web
> 
> 7.3.7 Web Examples
> ------------------
> 
> Well, enough about the tedious internals.  Let's make a web application!
> 
> 7.3.7.1 Hello, World!
> .....................
> 
> The first program we have to write, of course, is "Hello, World!".
> This means that we have to implement a web handler that does what we
> want.

The thunder here has been somewhat stolen by the fact that you already
presented this example above!

>    Now we define a handler, a function of two arguments and two return
> values:
> 
>      (define (handler request request-body)
>        (values RESPONSE RESPONSE-BODY))
> 
>    In this first example, we take advantage of a short-cut, returning an
> alist of headers instead of a proper response object. The response body
> is our payload:
> 
>      (define (hello-world-handler request request-body)
>        (values '((content-type . ("text/plain")))
>                "Hello World!"))
> 
>    Now let's test it, by running a server with this handler. Load up the
> web server module if you haven't yet done so, and run a server with this
> handler:
> 
>      (use-modules (web server))
>      (run-server hello-world-handler)
> 
>    By default, the web server listens for requests on `localhost:8080'.
> Visit that address in your web browser to test.  If you see the string,
> `Hello World!', sweet!
> 
> 7.3.7.2 Inspecting the Request
> ..............................
> 
> The Hello World program above is a general greeter, responding to all
> URIs.  To make a more exclusive greeter, we need to inspect the request
> object, and conditionally produce different results.  So let's load up
> the request, response, and URI modules, and do just that.
> 
>      (use-modules (web server)) ; you probably did this already
>      (use-modules (web request)
>                   (web response)
>                   (web uri))
> 
>      (define (request-path-components request)
>        (split-and-decode-uri-path (uri-path (request-uri request))))
> 
>      (define (hello-hacker-handler request body)
>        (if (equal? (request-path-components request)
>                    '("hacker"))
>            (values '((content-type . ("text/plain")))
>                    "Hello hacker!")
>            (not-found request)))
> 
>      (run-server hello-hacker-handler)
> 
>    Here we see that we have defined a helper to return the components of
> the URI path as a list of strings, and used that to check for a request
> to `/hacker/'. Then the success case is just as before - visit
> `http://localhost:8080/hacker/' in your browser to check.
> 
>    You should always match against URI path components as decoded by
> `split-and-decode-uri-path'. The above example will work for
> `/hacker/', `//hacker///', and `/h%61ck%65r'.
> 
>    But we forgot to define `not-found'!  If you are pasting these
> examples into a REPL, accessing any other URI in your web browser will
> drop your Guile console into the debugger:
> 
>      <unnamed port>:38:7: In procedure module-lookup:
>      <unnamed port>:38:7: Unbound variable: not-found
> 
>      Entering a new prompt.  Type `,bt' for a backtrace or `,q' to continue.
>      scheme@(guile-user) [1]>
> 
>    So let's define the function, right there in the debugger.  As you
> probably know, we'll want to return a 404 response.
> 
>      ;; Paste this in your REPL
>      (define (not-found request)
>        (values (build-response #:code 404)
>                (string-append "Resource not found: "
>                               (unparse-uri (request-uri request)))))
> 
>      ;; Now paste this to let the web server keep going:
>      ,continue

Cool, I didn't know Guile could do that!

>    Now if you access `http://localhost/foo/', you get this error
> message.  (Note that some popular web browsers won't show
> server-generated 404 messages, showing their own instead, unless the 404
> message body is long enough.)
> 
> 7.3.7.3 Higher-Level Interfaces
> ...............................
> 
> The web handler interface is a common baseline that all kinds of Guile
> web applications can use.  You will usually want to build something on
> top of it, however, especially when producing HTML.  Here is a simple
> example that builds up HTML output using SXML (*note sxml simple::).
> 
>    First, load up the modules:
> 
>      (use-modules (web server)
>                   (web request)
>                   (web response)
>                   (sxml simple))
> 
>    Now we define a simple templating function that takes a list of HTML
> body elements, as SXML, and puts them in our super template:
> 
>      (define (templatize title body)
>        `(html (head (title ,title))
>               (body ,@body)))
> 
>    For example, the simplest Hello HTML can be produced like this:
> 
>      (sxml->xml (templatize "Hello!" '((b "Hi!"))))
>      -|
>      <html><head><title>Hello!</title></head><body><b>Hi!</b></body></html>
> 
>    Much better to work with Scheme data types than to work with HTML as
> strings. Now we define a little response helper:
> 
>      (define* (respond #:optional body #:key
>                        (status 200)
>                        (title "Hello hello!")
>                        (doctype "<!DOCTYPE html>\n")
>                        (content-type-params '(("charset" . "utf-8")))
>                        (content-type "text/html")
>                        (extra-headers '())
>                        (sxml (and body (templatize title body))))
>        (values (build-response
>                 #:code status
>                 #:headers `((content-type
>                              . (,content-type ,@content-type-params))
>                             ,@extra-headers))
>                (lambda (port)
>                  (if sxml
>                      (begin
>                        (if doctype (display doctype port))
>                        (sxml->xml sxml port))))))
> 
>    Here we see the power of keyword arguments with default
> initializers. By the time the arguments are fully parsed, the `sxml'
> local variable will hold the templated SXML, ready for sending out to
> the client.
> 
>    Instead of returning the body as a string, here we give a
>    procedure,

Insert "Also, " before "Instead"?  Otherwise this reads as moving onto a
new example.

> which will be called by the web server to write out the response to the
> client.
> 
>    Now, a simple example using this responder, which lays out the
> incoming headers in an HTML table.
> 
>      (define (debug-page request body)
>        (respond
>         `((h1 "hello world!")
>           (table
>            (tr (th "header") (th "value"))
>            ,@(map (lambda (pair)
>                     `(tr (td (tt ,(with-output-to-string
>                                     (lambda () (display (car pair))))))
>                          (td (tt ,(with-output-to-string
>                                     (lambda ()
>                                       (write (cdr pair))))))))
>                   (request-headers request))))))
> 
>      (run-server debug-page)
> 
>    Now if you visit any local address in your web browser, we actually
> see some HTML, finally.
> 
> 7.3.7.4 Conclusion
> ..................
> 
> Well, this is about as far as Guile's built-in web support goes, for
> now.  There are many ways to make a web application, but hopefully by
> standardizing the most fundamental data types, users will be able to
> choose the approach that suits them best, while also being able to
> switch between implementations of the server.  This is a relatively new
> part of Guile, so if you have feedback, let us know, and we can take it
> into account.  Happy hacking on the web!

Thanks, a fun read!

     Neil
[Prev in Thread]
Current Thread
[Next in Thread]
documentation for (web ...), Andy Wingo, 2010/12/14
- Re: documentation for (web ...), Ludovic Courtès, 2010/12/16
  - Re: documentation for (web ...), Neil Jerram <=
    - Re: documentation for (web ...), Andy Wingo, 2010/12/26
Prev by Date: Set width on ,bt
Next by Date: Re: documentation for (web ...)
Previous by thread: Re: documentation for (web ...)
Next by thread: Re: documentation for (web ...)
Index(es):
- Date
- Thread