lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV /foo/.. ?


From: Klaus Weide
Subject: Re: LYNX-DEV /foo/.. ?
Date: Mon, 4 Nov 1996 16:50:19 -0600 (CST)

On Mon, 4 Nov 1996, Wayne Buttles wrote:

> Klaus Weide (address@hidden) wrote:
> 
> > On Sat, 2 Nov 1996, Foteos Macrides wrote:
> 
> >>       For DOS, you should do it "right", as I did for VMS, and
> >> use an overt URL_symbolic_path <-> local_file_system translation
> >> function.
> 
> > I don't know what happens with such paths on DOS, I think multiple
> > backslashes in arbitrary places are illegal there.
> 
> Is there a rule here?  I couldn't find rules for file:// from a quick
> search.  

What kind of rule do you mean?   RFC 1738 talks about file: URLs, and
RFC 1808 talks about resolution of relative URLs (in general).

The syntax for relative URLs doesn't apply to all URL schemes.
But we definitely want it to apply for file: URLs, so that relative
URLs (e.g. in a HTML doc found locally in a file: location, or in the
generated HTML for a directory listing) can be resolved using the same
rules as for http: and other schemes.  So then, you have got lots of
rules to look at :)

IMHO it would be best to first specify a standard way of expressing
local filenames and paths as URLs (ie. a mapping from DOS filename
syntax to URL syntax and back in your case), and then consequently
use that standard syntax when filenames have to be expressed as URLs.
That's where the for-unix code goes wrong: in many places, it just
prepends a "file://localhost" to make a filename into a URL (or strips
it from the beginning for the reverse), because the syntax looks 
similar enough.  

>    My current DOS URL bastardization accepts the following (and a
> bunch more variations):

I would suggest accepting a range of variations from the user, but
immediately converting them to the above-postulated standard format
for internal use.  And never generating file: URLs internally in any
but the standard format.

In my remark about "//" in unix paths, I had the *path* portion of
a URL in mind, i.e. whatever comes after the host+separating slash
if it's an absolute URL.  ("//" should probably never appear there,
and a proper transltion function would eliminate superfluous slashes
from local filenames that happen to have them.)


Let me try to apply the RFCs to your examples:

> file://localhost/c:\lynx\bookmark.htm

That is a file URL with an absolute path of "c:\lynx\bookmark.htm".
(It is automatically absolute because it is part of an absolute URL.)
It is using pure DOS syntax for the path part, not forward "/" as
separators as required by RFC 1738.  If a relative URL is resolved
against this as a base, you won't get the intended result (unless
HTParse and lots of other places in the code are made 
backslash-aware, which nobody should seriously consider).

The ":" shouldn't be a problem here, it is allowed in path segments
by both RFCs.  There would be a problem if a *relative* URL started
with "c:" because that would look like the scheme part of an absolute
URL. (Section 5.3 of RFC 1808 has advice on what to do if this is
really wanted).


> file:///c|/lynx/bookmark.htm

An absolute URL with an empty host part (which has the same meaning as
the string "localhost" according to RFC 1738).  I suggest using (i.e.
converting to) //localhost/ whenever possible, to guard against some
routines somewhere (that you wouldn't normally think of) that may try
to strip all trailing slashes for some purpose.

Strict adherence to the RFCs would require that an unescaped "|" never
appears in a URL.  Well the same applies for "~" which normally works
fine e.g. in http: URLs; but I think it is a bad choice as a replacement
for ":" (which shouldn't really be needed).


> file:////lynx/bookmark.htm

Now here we have one slash too many, illegal according to the syntax
of RFC 1808 (but not RFC 1738).  That extra slash doesn't seem to
fulfil any useful purpose.  (file:///lynx/bookmark.htm should always
refer to the same file IMHO, i.e. the path in a fully specified URL
should always be taken as an absolute path.)


> file:///./\/\/\////\\\bookmark.htm

I am not even sure what that monstrosity is supposed to mean.
I just note that an absolute path starting with "." is somewhat, hmm,
ill-defined.  Is that relative to the root directory?  My reading
of RFC 1808 would say that it has to be interpreted that way, and 
also that it gets simplified away as soon as relative URLs are resolved
w.r.t. this absolute URL.  If this is meant to refer to the "current
directory" (in which sense?  When Lynx was started?), I question the
wisdom of packing this into a full URL (thereby giving it an absolute
form) without first resolving it.   

Note that this is different from ftp: URLs, where whatever is in the
host component of the URL, including a possible username, (supposedly) 
defines a starting directory.


So you may want to accept all those forms, but get rid of them as soon
as possible...

As a standard URL syntax for DOS, I suggest:

 - file://localhost/c:/subdir/file.htm
   for absolute URLs (convert everything to lowercase (?), HTEscape
   characters not acceptable in URL syntax)

 - /c:/subdir/file.htm
   for absolute file paths (but as URLs these are *relative*).
   (same character conversions as above)

 - subdir/file.htm, file.htm or ./subdir/file.htm, ./file.htm
   for relative paths in relative URLs.
   for absolute file paths (but as URLs these are *relative*).

 - c:/something should never appear as beginning of a string that
   could be interpreted as URL syntax.

 - maybe interpret file://localhost/something as synonymous to
   file://localhost/c:/something, and /something as synonymous to
   /c:/something (where something is not a:, b:, c: etc.) so that
   relative URL .. with base URL file://localhost/c:/ or 
   file:///localhost/a:/ has a well-defined meaning.

But I don't know what, hmm, "other vendors" for DOS & Windows have
done that requires compatibility, and I probably have overlooked
lots of things.

Just some, ahem, thoughts.

  Klaus

;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]