[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] bad filenames (again)
From: |
Tim Ruehsen |
Subject: |
Re: [Bug-wget] bad filenames (again) |
Date: |
Fri, 21 Aug 2015 13:31:45 +0200 |
User-agent: |
KMail/4.14.2 (Linux/4.1.0-1-amd64; KDE/4.14.2; x86_64; ; ) |
On Friday 21 August 2015 13:00:34 Andries E. Brouwer wrote:
> On Fri, Aug 21, 2015 at 12:07:56PM +0200, Tim Ruehsen wrote:
> > The charset is *not* determined (guessed) from the URL string, be it hex
> > encoded or not. We take the locale setup as default, but it can be
> > overridden by --local-encoding. Right now, Wget does not have the ability
> > to have different encodings for file input (--input-file) and input via
> > STDIN (when used at the same time). But that is another issue...
>
> It seems to me that I keep saying the same thing. We are not communicating.
Yes, I am also under this impression :-(
> You talk about locale and local-encoding but that is not the point.
Sorry, exactly that seems to be the point.
> There is a remote site.
> Nothing is known about this remote site.
Wrong. Regarding HTTP(S), we exactly know the encoding of each downloaded HTML
and CSS document (that's what I call 'remote encoding'). It is only these type
of (downloaded) files we scan when going recursive.
If the server (or document) states a wrong encoding (e.g. *saying* it has
Japanese/EUC-JP encoding, but in fact it is iso-8859-1 encoded), we either
have to use escaping or the user uses a --remote-encoding to override the
wrong server/document statement.
But leaving these misconfigured servers away as a special case, we are fine.
You might take a look at http://www.w3.org/TR/html4/charset.html#h-5.2.2 which
describes how servers and clients should work regarding HTML character
encoding (there should be something for CSS as well out there).
Andries, if you still have the impression that we are not communicating, I
suggest that you make up a simple example test case to show your problem (and
excuse me please for being kinda dump/blind). Maybe two small HTML files with
references to each other to demonstrate your point. (I can put them on my
server and start wget/wget2 on it to see if it works or not).
Regards, Tim
signature.asc
Description: This is a digitally signed message part.
- Re: [Bug-wget] bad filenames (again), (continued)
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/19
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again),
Tim Ruehsen <=
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Tim Rühsen, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/21
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/24
- Re: [Bug-wget] bad filenames (again), Andries E. Brouwer, 2015/08/25
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Tim Ruehsen, 2015/08/20
- Re: [Bug-wget] bad filenames (again), Eli Zaretskii, 2015/08/19