bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] wget fails to encode spaces in URLs


From: Volker Kuhlmann
Subject: [Bug-wget] wget fails to encode spaces in URLs
Date: Sun, 05 Jun 2011 12:38:24 +1200
User-agent: KMail/1.13.6 (Linux/2.6.37.6-0.5-desktop; KDE/4.6.0; x86_64; ; )

 > wget --version
GNU Wget 1.12 built on linux-gnu.

To reproduce:

Go to any sourceforge project and download a file whos URL contains a
space. Copy the "direct link" from the download page into wget -i-

Run wireshark and press ^D in the wget input stream.

If the upstream strips spaces (e.g. squid, default setting in pfsense)
the download goes round in circles.

The bug does not exist in wget when passing the URL on the command line.
I always use -i- because of all the shell crud in URLs.

I am using the openSUSE 11.4 version, but the only source code change is
additional support for libproxy.


Problem:

Looking at the source, in main.c url_parse() is called for each URL from
the command line. For -i, it calls retrieve_from_file().

retrieve_from_file() (in retr.c) reads a list of URLs from the given
file. It then calls url_parse() only if IRI is enabled (which in my
version of wget is not even compiled in).
Hence the URL is never parsed and never encoded before being downloaded
with retrieve_url().
That's a bug.

The fix is probably to always call url_parse() in retrieve_from_file(),
and not only when IRI is turned on.


If this goes to a mailing list, please cc me on replies, I am not
subscribed.

Thanks,

Volker

-- 
Volker Kuhlmann
http://volker.dnsalias.net/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]