lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: LYNX-DEV BUG? Problem with suggested name for downloaded compressed


From: Foteos Macrides
Subject: Re: LYNX-DEV BUG? Problem with suggested name for downloaded compressed files
Date: Mon, 22 Sep 1997 14:39:12 -0500 (EST)

WWW server manager <address@hidden> wrote:
>Using lynx 2.7.1 + fotemods.zip dated 21 Sep 1997, on Sun Solaris 2.5.1
>(SPARC) with Sun's C compiler ...
>
>Fetching compressed tar files (xxx.tar.Z) from web servers (i.e. HTTP not 
>FTP) by selecting the relevant links explicitly using the d(ownload) command
>in lynx seems to behave oddly, and certainly differently from older versions
>of lynx (though I don't know when it changed).
>
>Where there is a well-defined filename available from the URL, I would 
>expect lynx to offer that as the default for saving downloaded files (and
>that's what happened with the older versions I've used), but with the
>version described above it sometimes omits the compression suffix (for .Z, I
>presume .gz would be treated similarly), although the file *is* saved in
>in its original form. With e.g. lynx 2.4.1, the name from the URL is used
>unaltered.
>
>Limited testing suggests that .Z (presumably also .gz, but I wasn't able to 
>test that) is dropped when the server does not send a Content-Encoding
>header, and retains .Z (and .gz) if the encoding is specified by the server,
>though in all cases the file was saved in its original form, with no change
>to the encoding (thus making the proposed filename misleading when .Z was 
>dropped). In the cases I examined, the files (xxx.tar.Z and xxx.tar.gz) were
>all served with Content-Type application/octet-stream.
>
>While it is reasonable to make use of Content-Encoding to uncompress a file
>for display, and then to strip the suffix implying compression from the
>default file name for saving the results via p(rint), it seems totally
>inappropriate (=wrong) for lynx to change the filename from the one in the
>URL (where that is recognisable and usable) when the file is being 
>saved unmodified (e.g. it has not been uncompressed).
>
>Is this change to the handling of compression suffixes intentional, or a bug
>(perhaps related to the documented change in handling of filenames for
>p(rint))? It seems especially odd for it to happen in the case where lynx
>does *not* know (from HTTP headers) that the file is compressed!
>
>Suggesting misleading filenames in this way is potentially *very* confusing
>for users (doubly so if the text around the link did not name the file, so
>they don't know what to expect). I was hoping to upgrade lynx before the
>imminent start of term (not least to fix the security issues, so it has to be 
>2.7.1 + subsequent updates), but that may not be an option unless this problem
>can be fixed.

        The HTCheckFnameForCompression() function which I added in
GridText.c based on discussions about Content-Disposition header
handling in the IETF's HTTP-WG had a bug such that it wasn't taking
a Content-Type of application/octet-stream into account.  That's
fixed in the fotemods.zip I just updated at slcc.  As Klaus has
reported and is discussing, he added that function to the devel
code.  However, he changed it's logic in ways I don't intent to
reproduce in the fotemods (i.e., the Lynx code that I actually
use seriously), so keep those discussions distinct from ones about
the fotemods.  Here's how it's intended to work in the fotemods.
If it doesn't, continue reporting bugs via the lynx-dev list.

        For ftp, no headers are available, so the suffix mappings
in HTInit.c (supplemented by any SUFFIX: mappings in lynx.cfg,
and/or replaced with ones in a global and/or personal mime.types
file) are used.  If there's a tail match for a suffix (need not
begin with a dot) mapped to gzipped or Unix compressed files,
that's temporarily stripped and a suffix check is done again to
guess the Content-Type.  If that guess is one which Lynx can
display or has been mapped (via VIEWER:, HTInit.c, or mailcap
entries) to a helper app, the anchor is loaded with that
Content-Type and a Content-Encoding of gzip or compress.
Otherwise, or if a 'd'ownload command was used, such that the
file would not be uncompressed and a D)ownload or C)ancel prompt
should be invoked or the downloading proceed automatically, the
Content-Type is set to application/gzip or application/compress
and no Content-Encoding is set.  The HTCheckFnameForCompression()
function then tweaks the default output filename appropriately
for the platform and security concerns.

        For http (or https if available), the HTTP headers are
used.  If a Content-Disposition header (see the HTTP/1.1 draft
via the online 'h'elp) with a suggested default output file
name is present, that is used as the starting point.  Otherwise,
the last element of the URL is used as a starting point.  Then
the above logic is applied (plus as of today a check for a
Content-Type header with application/octet-stream) to concoct
a default output filename appropriately for the platform.

        If the http/https server (or its CGI script) sends
headers which do not correctly describe the resource, then
the HTCheckFnameForCompression() function, which lacks ESP,
may conconct a default output filename which needs to be
edited, based on one's understanding of the HTTP protocol
and diagnostic procedures available in Lynx, or desciptions
of the resouce in the document from which a download attempt
was made (if they should give a better indication than what
a clueless WebMaster or script writer caused the server to
return in its HTTP headers :).

                                Fote

=========================================================================
 Foteos Macrides            Worcester Foundation for Biomedical Research
 address@hidden         222 Maple Avenue, Shrewsbury, MA 01545
=========================================================================
;
; To UNSUBSCRIBE:  Send a mail message to address@hidden
;                  with "unsubscribe lynx-dev" (without the
;                  quotation marks) on a line by itself.
;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]