[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-wget] [bug #50383] --local-encoding isn't used when converting a re
From: |
Tim Ruehsen |
Subject: |
[Bug-wget] [bug #50383] --local-encoding isn't used when converting a relative link in a recursive download |
Date: |
Thu, 23 Feb 2017 05:47:13 -0500 (EST) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0 |
Update of bug #50383 (project wget):
Status: None => Confirmed
_______________________________________________________
Follow-up Comment #1:
Two problems here:
1. the command-line URL is converted by 'remote_to_utf8()' ind url_parse().
This is wrong, locale_to_utf8() must be taken.
On many locales, this wouldn't make a difference with tilde, but I just
recognized it when tracing wget.
2. After dequeing (before download), wget converts the complete URL with
remote_to_utf8(). This is wrong - only the part coming from remote should be
converted (~foo came from local input).
Suggested fix:
The charset conversion to utf-8 should take place whenever input is taken
(from command line or from remote). Internally, wget should work with utf-8
only. That is what Wget2 already does.
I add my Python test script to reproduce this issue, if someone wants to work
on it. Copy it to testenv/ and manually start it or add it to Makefile.am.
(file #39814)
_______________________________________________________
Additional Item Attachment:
File name: Test-link-shiftjis.py Size:1 KB
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?50383>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/