bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] IDN and IRI tests fail on MS-Windows with wget 1.16.1


From: Tim Rühsen
Subject: Re: [Bug-wget] IDN and IRI tests fail on MS-Windows with wget 1.16.1
Date: Sat, 27 Dec 2014 13:57:21 +0100
User-agent: KMail/4.14.2 (Linux/3.16.0-4-amd64; KDE/4.14.2; x86_64; ; )

Am Samstag, 27. Dezember 2014, 10:39:25 schrieb Eli Zaretskii:
> > From: Tim Rühsen <address@hidden>
> > Date: Thu, 25 Dec 2014 15:43:27 +0100
> > 
> > >      FAIL: Test-idn-headers.px
> > >      FAIL: Test-idn-meta.px
> > >    
> > >    These use EUC_JP encoded file name, but do not state
> > >    --local-encoding on the wget command line, so the non-ASCII
> > >    characters get mangled by Windows (because Windows tries to convert
> > >    non-Unicode non-ASCII strings to the current system codepage).
> > >    Test-idn-* tests that do state --local-encoding do succeed.  Is it
> > >    possible that the tests assume something about the local encoding,
> > >    like that it's UTF-8?
> > 
> > Let's start with 'Test-idn-meta'.
> > No non-ASCII filename will be written to disk, the Content-type is stated
> > correctly. --local-encoding set the encoding for when reading a local file
> > or the command line. So it shouldn't influence this test. And i can't
> > reproduce the stated behavior.
> > 
> > Please send me the --debug output of this test with and without --local-
> > encoding given.
> 
> The output is attached.  I collected that by redirecting the test
> script's stderr to a file, I hope that's what you meant.
> 
> I noticed that the output says:
> 
>   converted 'http://<bunch of octal escapes>/' (CP1255) -> 'http://<another
> bunch of octal escapes/' (UTF-8)
> 
> So I tried to use --local-encoding=EUC-JP, and that made the test
> succeed.  The third attachment below is from that successful run.

Thanks, Eli.

Your tests helped me to reproduce the problem:
- install (and set) a non-UTF-8 and non-C/POSIX locale
- use this locale for testing, e.g.:
  TESTS_ENVIRONMENT="address@hidden" make check TESTS=Test-idn-
meta

And what I see in the logs Wget has a severe problem.
When loading a saved (HTML) document, Wget parses it with the local-encoding 
instead of the encoding stated by the server (or document). Of course this 
can't work and this is the reason why your 3rd test works (setting the local-
encoding to the real encoding of the document).

After the 400 server response, Wget loads the document again, now with the 
correct encoding. But Wget 'remembers' some incorrect conversions from the 
first try and thus fails again.


I would expect Wget to load the document with the correct encoding in the first 
place... but it looks that this 'double loading' has been done on purpose.

Can anyone bring some light here before I fix Wget's behavior, please !

Tim

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]