bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] bad filenames (again)


From: Tim Ruehsen
Subject: Re: [Bug-wget] bad filenames (again)
Date: Fri, 21 Aug 2015 13:31:45 +0200
User-agent: KMail/4.14.2 (Linux/4.1.0-1-amd64; KDE/4.14.2; x86_64; ; )

On Friday 21 August 2015 13:00:34 Andries E. Brouwer wrote:
> On Fri, Aug 21, 2015 at 12:07:56PM +0200, Tim Ruehsen wrote:
> > The charset is *not* determined (guessed) from the URL string, be it hex
> > encoded or not. We take the locale setup as default, but it can be
> > overridden by --local-encoding. Right now, Wget does not have the ability
> > to have different encodings for file input (--input-file) and input via
> > STDIN (when used at the same time). But that is another issue...
> 
> It seems to me that I keep saying the same thing. We are not communicating.
Yes, I am also under this impression :-(

> You talk about locale and local-encoding but that is not the point.
Sorry, exactly that seems to be the point.

> There is a remote site.
> Nothing is known about this remote site.
Wrong. Regarding HTTP(S), we exactly know the encoding of each downloaded HTML 
and CSS document (that's what I call 'remote encoding'). It is only these type 
of (downloaded) files we scan when going recursive.
If the server (or document) states a wrong encoding (e.g. *saying* it has 
Japanese/EUC-JP encoding, but in fact it is iso-8859-1 encoded), we either 
have to use escaping or the user uses a --remote-encoding to override the 
wrong server/document statement.

But leaving these misconfigured servers away as a special case, we are fine.

You might take a look at http://www.w3.org/TR/html4/charset.html#h-5.2.2 which 
describes how servers and clients should work regarding HTML character 
encoding (there should be something for CSS as well out there).

Andries, if you still have the impression that we are not communicating, I 
suggest that you make up a simple example test case to show your problem (and 
excuse me please for being kinda dump/blind). Maybe two small HTML files with 
references to each other to demonstrate your point. (I can put them on my 
server and start wget/wget2 on it to see if it works or not).

Regards, Tim

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]