bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] -m --iri unnecessarily modifies double-escapes incorrectl


From: Juaristi Álamos , Ander
Subject: Re: [Bug-wget] -m --iri unnecessarily modifies double-escapes incorrectly, whereas -m --no-iri works
Date: Mon, 28 Sep 2015 08:19:48 +0000

Hi there,

I'm afraid I cannot reproduce it in the latest git snapshot.

The resulting link is exactly the same in the website (online) and in
the downloaded content:

http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%
2FXLATB.html

vs

file:///home/aja/codebase/wget/www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%2FXLATB.html

When opening 'reference.html' on my browser and clicking on the link,
it's true that the browser itself converts it from %2F to %252F, but I
didn't get any 404 in any case. What's more, if the downloaded content
looks exactly the same as the online one, I don't think we can consider
this a bug. Additionally, we had a similar problem a while which was
(apparently) resolved in commit
b0820d553b6bef4400c493474d38930fee461b45. However, such changes have not
been released, yet. So, which Wget version are you using? Could you
please confirm that the issue persists in the latest git snapshot?

Thanks.
- AJ

On Sun, 2015-09-27 at 14:29 -0700, Barry Allard wrote:
> # skips all double-encoded [ui]ris because it reinterprets them, outside 
> uri.c:reencode_escapes(), probably in iri.c.
> wget --iri -mr http://www.liteirc.net/mirrors/siyobik.info/reference.html
> 
> # works
> wget --no-iri -mr http://www.liteirc.net/mirrors/siyobik.info/reference.html
> 
> Correct [ui]ri: 
> http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%252FXLATB.html 
> (200)
> Incorrect [ui]ri: Correct [ui]ri: 
> http://www.liteirc.net/mirrors/siyobik.info/instruction/XLAT%2FXLATB.html 
> (404)
> # pcnt_decode(pcnt_decode(“%252F”) -> “%2F") -> “/"
> 
> Simple-but-incomplete hackaround: use --no-ri
> 
> To improve compatibility with mirroring international sites, the iri code 
> path could approximate behavior of url.c/url_parse() by avoiding unnecessary 
> modification to --mirror extracted [ui]ris, possibly around the time it 
> adds/dequeues them to/from the queue.
> 
> Best,
> Barry Allard


reply via email to

[Prev in Thread] Current Thread [Next in Thread]