Re: [GNU-linux-libre] [gnu.org #1262331] (inactive Linux distributions)

gnu-linux-libre

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNU-linux-libre] [gnu.org #1262331] (inactive Linux distributions)

From:	Luke Shumaker
Subject:	Re: [GNU-linux-libre] [gnu.org #1262331] (inactive Linux distributions)
Date:	Thu, 25 Jan 2018 12:09:52 -0500
User-agent:	Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM/1.14.9 (Gojō) APEL/10.8 EasyPG/1.0.0 Emacs/25.3 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)

On Thu, 25 Jan 2018 05:58:18 -0500,
Andrew Nesbit wrote:
> 
> On 25/01/2018 02:38, bill-auger wrote:
> 
> > in the case of the 'www.' sub-domain in 'http://www.foo.com', that 
> > clearly identifies the HTTP "World Wide Web" server of foo.com
> As a somewhat relevant side issue, what are the rules or conventions
> regarding URLs with unadorned directory or file components, like
> "http://www.foo.com";?
> 
> After reading up the other day, my understanding is that since a
> trailing slash indicates something like a directory resource depending
> on context, "http://www.foo.com"; should canonically be represented as
> "http://www.foo.com/";.  The web server will resolve this "directory" to
> "http://www.foo.com/index.html"; or something similar.  Do I understand
> correctly?

The "path" part of an URL starts at the first "/" after the initial
"://". (RFC 3986)

If there is no path in an HTTP URL (unless making an OPTIONS
request[1]) the default path is "/".  That is, without knowing a thing
about the server, the client can know that it can normalize
"http://www.foo.com"; to "http://www.foo.com/";. (RFC 7230)

The path part of an URL is hierarchical in a way similar to your
filesystem; without knowing a thing about the server, the client may
normalize the path by removing "/foo/../" segments, "/./" segments,
and repeated slashes right next to eachother.  But in this hierarchy,
"foo" and "foo/" are not necessarily the same resource; it does not
get to add or remove a trailing slash. (RFC 3986, incorporated by RFC
7230).

HTTP has no concept of directories or files, just a hierarchy of
resources.  Of course, because of the similarity, an obvious
implementation strategy for HTTP servers is to have those HTTP
resources stored as files in a traditional *nix filesystem.  That has
a few consequences to consider:
 - "what to send when the path given is a directory?".  As `cat` will
   tell you on most *nixen, there's no obvious flat representation of
   a directory.  So, a *convention* for server implementations is to
   look for an "index.html" file in that directory, and assuming that
   that file is a reasonable HTML representation of the directory.
 - In the underlying *nix filesystem, presence or absence of a
   trailing slash when requesting a directory is equivalent; the
   filesystem will give the server the same thing either way.  But,
   the trailing slash will affect how the client interprets relative
   URLs in the resource's HTML representation.  So, to simplify
   writing relative URLs, when an HTTP server receives a request for a
   directory without a trailing slash, it will typically respond by
   sending back a 301 redirect to the version with a trailing slash
   (but some servers do it the other way around!).  (This also helps
   with SEO; it's generally bad to have 2 different URLs serving the
   same content.)
But these are both server implementation details, and are not general
truths about HTTP.

[1]: OPTIONS requests are a special case; for OPTIONS request having
     no path means "tell me things about the entire server", which is
     different than a "/" path, which means "tell me things about the
     '/' resource". (RFC 7231)

> What are the history and rules regarding this?  Is there an RFC or some
> other authoritative resource that explains it?

Modern HTTP 1.1 is defined in the RFC 723X series; RFC 7230 (Message
Syntax and Routing) includes the URI specification, but the
interpretation of that is deferred to RFC 7231 (Semantics and
Content).  Generic URI syntax is RFC 3986.

-- 
Happy hacking,
~ Luke Shumaker

[Prev in Thread]

Current Thread

[Next in Thread]

[GNU-linux-libre] [gnu.org #1262331] (inactive Linux distributions), (continued)

Prev by Date: Re: [GNU-linux-libre] [gnu.org #1262331] (inactive Linux distributions)
Next by Date: Re: [GNU-linux-libre] [gnu.org #1262331] (inactive Linux distributions)
Previous by thread: Re: [GNU-linux-libre] [gnu.org #1262331] (inactive Linux distributions)
Next by thread: Re: [GNU-linux-libre] [gnu.org #1262331] (inactive Linux distributions)
Index(es):
- Date
- Thread