[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Tilde issue with recursive download when IRI is enabled a
From: |
Eli Zaretskii |
Subject: |
Re: [Bug-wget] Tilde issue with recursive download when IRI is enabled and a page uses Shift JIS |
Date: |
Fri, 17 Feb 2017 11:10:40 +0200 |
> From: William Prescott <address@hidden>
> Date: Fri, 17 Feb 2017 03:34:20 -0500
>
> > I would also like to note that, even when the the document's links don't
> > contain a tilde, Wget will still fail to fetch the pages as long as there
> > is a tilde in the URL the Wget was called with.
>
> Let's consider the (UTF-8) URL "http://example.com/~foo/bar.html"
> bar.html is Shift_JIS encoded and contains:
> <meta http-equiv="Content-Type" content="text/html;charset=Shift_JIS">
> <a href="baz.html">Baz</a>
>
> (this time, bar.html is perfectly valid Shift_JIS and doesn't have a tilde)
>
> A recursive download will fail, because the relative URL appears to get
> processed as
> sjis_to_utf8(utf8_to_sjis("http://example.com/~foo/") + sjis("baz.html"))
> resulting in
> http://example.com/‾foo/baz.html
>
> I would have expected
> utf8("http://example.com/~foo/") + sjis_to_utf8("baz.html")
> resulting in
> http://example.com/~foo/baz.html
How should wget know that "http://example.com/~foo/bar.html" comes
from a UTF-8 encoding? Where should that piece of information come
from?
Re: [Bug-wget] Tilde issue with recursive download when IRI is enabled and a page uses Shift JIS, William Prescott, 2017/02/17
- Re: [Bug-wget] Tilde issue with recursive download when IRI is enabled and a page uses Shift JIS,
Eli Zaretskii <=
Re: [Bug-wget] Tilde issue with recursive download when IRI is enabled and a page uses Shift JIS, William Prescott, 2017/02/17
Re: [Bug-wget] Tilde issue with recursive download when IRI is enabled and a page uses Shift JIS, William Prescott, 2017/02/17