bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] [bug #50383] --local-encoding isn't used when converting a re


From: Tim Ruehsen
Subject: [Bug-wget] [bug #50383] --local-encoding isn't used when converting a relative link in a recursive download
Date: Thu, 23 Feb 2017 05:47:13 -0500 (EST)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:51.0) Gecko/20100101 Firefox/51.0

Update of bug #50383 (project wget):

                  Status:                    None => Confirmed              

    _______________________________________________________

Follow-up Comment #1:

Two problems here:

1. the command-line URL is converted by 'remote_to_utf8()' ind url_parse().
This is wrong, locale_to_utf8() must be taken.
On many locales, this wouldn't make a difference with tilde, but I just
recognized it when tracing wget.

2. After dequeing (before download), wget converts the complete URL with
remote_to_utf8(). This is wrong - only the part coming from remote should be
converted (~foo came from local input).

Suggested fix:
The charset conversion to utf-8 should take place whenever input is taken
(from command line or from remote). Internally, wget should work with utf-8
only. That is what Wget2 already does.

I add my Python test script to reproduce this issue, if someone wants to work
on it. Copy it to testenv/ and manually start it or add it to Makefile.am.

(file #39814)
    _______________________________________________________

Additional Item Attachment:

File name: Test-link-shiftjis.py          Size:1 KB


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?50383>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]