Re: [Bug-wget] Unexpected character on a downloaded page

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Unexpected character on a downloaded page

From:	Angel Tsankov
Subject:	Re: [Bug-wget] Unexpected character on a downloaded page
Date:	Mon, 16 Jun 2014 14:08:01 +0300
User-agent:	Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

On 06/15/14 23:28, Ángel González wrote:

On 14/06/14 20:31, Angel Tsankov wrote:

Why does wget 1.15 (and 1.12) insert Â in several places in the copy
it makes of the following page:

http://www.helloquizzy.com/results/helen-fisher-personality-type-test/?var_Explorer=1&var_Negotiator=1&var_Director=1&var_Builder=1&fromCGI=1

Short answer: because that's what is at that page.

Long answer: That page contains several non-breaking spaces (ASCII 160,
U+00A0) which when encoded as UTF-8 result in the bytes C2 A0. If you
read the page as if it was iso-8859, you will view instead the byte C2
as the glyph Â.

The page correctly states it's in utf-8:
Content-Type: text/html; charset=utf-8
so it should be read in utf-8 mode.

(wget is doing nothing here, it's just receiving bytes and storing in
the file as-is)

Indeed, the browser (Firefox 27.0.1) displays the original page in UTF-8and the downloaded page in Windows-1252 (which turned out to be thefallback encoding for pages that do not declare their encoding). But if"wget is doing nothing here" why does the browser think that only theoriginal page declares its encoding?



Regards,

Angel Tsankov

[Prev in Thread]

Current Thread

[Next in Thread]

[Bug-wget] Unexpected character on a downloaded page, Angel Tsankov, 2014/06/14
- Re: [Bug-wget] Unexpected character on a downloaded page, Ángel González, 2014/06/15
  - Re: [Bug-wget] Unexpected character on a downloaded page, Angel Tsankov <=
    - Re: [Bug-wget] Unexpected character on a downloaded page, Ángel González, 2014/06/16
    - Re: [Bug-wget] Unexpected character on a downloaded page, Angel Tsankov, 2014/06/17

Prev by Date: Re: [Bug-wget] wget - cntlm incompatibility (wget 1.13 onwards)
Next by Date: Re: [Bug-wget] wget - cntlm incompatibility (wget 1.13 onwards)
Previous by thread: Re: [Bug-wget] Unexpected character on a downloaded page
Next by thread: Re: [Bug-wget] Unexpected character on a downloaded page
Index(es):
- Date
- Thread