[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Problem with ÅÄÖ and wget
From: |
Ángel González |
Subject: |
Re: [Bug-wget] Problem with ÅÄÖ and wget |
Date: |
Thu, 03 Oct 2013 02:04:05 +0200 |
User-agent: |
Thunderbird |
On 24/09/13 10:38, Tim Ruehsen wrote:
Just for completeness: these guessing steps called "encoding sniffing
algorithm" are described in 12.2.2.2.
But only "In some cases, it might be impractical to unambiguously determine
the encoding before parsing the document.".
Yes, it allows to start parsing with one encoding, then abort and change
to a
different one.
I found this iso-8859-1 / windows-1252 issue mentioned on the Wikipedia
'windows-1252' page, but couldn't find it on the HTML Living Standard pages.
Could you give me a pointer, please ?
It's at the beginning of html parsing, it lists several encodings given
by the page
and the encoding you should use to parse them, saying it is a willful
violation.
What do you think, how can we address the iso / windows encoding issue (should
we ?) ? As I understood, it is only valid for HTML5...
It's just a matter of comparing the input encoding with a well-known
list and replace it.
Is there a practical need for the sniffing algorithm ?
If we want to deal with the "ÅÄÖ links" properly, we should do encoding
detection.
Do you know any real web sites / pages where the encoding is ambiguous ?
I consider those web sites broken. But I don't have numbers.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [Bug-wget] Problem with ÅÄÖ and wget,
Ángel González <=