lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html


From: Bela Lubkin
Subject: Re: [Lynx-dev] Cannot open: https://m.medicalxpress.com/page2.html
Date: Wed, 14 Aug 2019 22:17:21 -0700

Mouse wrote:

> 2396 does specifically say that
>
>    URI that are hierarchical in nature use the slash "/" character for
>    separating hierarchical components.  For some file systems, a "/"
>    character (used to denote the hierarchical structure of a URI) is the
>    delimiter used to construct a file name hierarchy, and thus the URI
>    path will look similar to a file pathname.  This does NOT imply that
>    the resource is a file or that the URI maps to an actual filesystem
>    pathname.
>
> So speaking of /./ as "a reference to the current directory" is, at
> least, misleading; path components in URIs/URLs do not need to bear any
> relationship to directory structure anywhere.  I also have not found
> any indication that . or .. components are special in absolute
> URIs/URLs; again, perhaps that's just because I haven't found the right
> reference.

It looks like RFC3986 is the current state of the art, and specifically
https://tools.ietf.org/html/rfc3986#section-5.2.4 for this.  This is
part of section 5:

| 5. Reference Resolution
|
|    This section defines the process of resolving a URI reference
|    within a context that allows relative references so that the result
|    is a string matching the <URI> syntax rule of Section 3.

-- which doesn't really say *who* is supposed to be doing this, but I
believe it's meant to be understood as 'whenever manipulating URIs'.
That is, both the client (Lynx) & the server (Apache) should be
modifying '/./' => '/'.  Both are at fault.

The RFC never mentions HTTPS and uses HTTP all over the place, but I
think this is simply because HTTP is being used as a standard example
scheme, and URIs are meant to be uniform across schemes.

> So I think lynx is at fault for not handling relative path resolution
> correctly.  Depending on what I've failed to find, the webserver may
> also be at fault - does anyone have any pointers to the RFC(s) I've
> missed?

Does this suffice?

I add another quote from 3986 (sec. 1.2.3):

|    It is often the case that a group or "tree" of documents has been
|    constructed to serve a common purpose, wherein the vast majority
|    of URI references in these documents point to resources within the
|    tree rather than outside it.  Similarly, documents located at a
|    particular site are much more likely to refer to other resources at
|    that site than to resources at remote sites.  Relative referencing
|    of URIs allows document trees to be partially independent of their
|    location and access scheme.  For instance, it is possible for a
|    single set of hypertext documents to be simultaneously accessible
|    and traversable via each of the "file", "http", and "ftp" schemes
|    if the documents refer to each other with relative references.
|    Furthermore, such document trees can be moved, as a whole, without
|    changing any of the relative references.

This seems to make it clear that (1) the designers of the whole concept
of 'URI schemes' are strongly thinking of them mapping to filesystems
and (2) that the really believe in the cross-scheme concordance of URIs.
So this applies to HTTPS whether or not HTTPS is mentioned or even
existed at the time of 3986 publication.

|    A relative reference (Section 4.2) refers to a resource by
|    describing the difference within a hierarchical name space between
|    the reference context and the target URI.  The reference resolution
|    algorithm, presented in Section 5, defines how such a reference is
|    transformed to the target URI.

This bit *could* be taken as an oblique suggestion that only the client
(Lynx), who is composing the relative reference onto the base URI of the
source document, is responsible.  I don't believe it's meant that way.

|    All URI references are parsed by generic syntax parsers when used.

-- this seems like a clumsy way of saying 'thou shalt run the
canonicalization code whenever operating on a URI'; '/./' should never
be present in the final output.  The next sentence reiterates the use of
that assumption:

|    However, because hierarchical processing has no effect on an absolute
|    URI used in a reference unless it contains one or more dot-segments
|    (complete path segments of "." or "..", as described in Section 3.3),
|    URI scheme specifications can define opaque identifiers by
|    disallowing use of slash characters, question mark characters, and
|    the URIs "scheme:." and "scheme:..".

>Bela<



reply via email to

[Prev in Thread] Current Thread [Next in Thread]