bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Unexpected character on a downloaded page


From: Angel Tsankov
Subject: Re: [Bug-wget] Unexpected character on a downloaded page
Date: Tue, 17 Jun 2014 22:43:24 +0300
User-agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

On 06/17/14 00:10, Ángel González wrote:
On 16/06/14 13:08, Angel Tsankov wrote:
Indeed, the browser (Firefox 27.0.1) displays the original page in
UTF-8 and the downloaded page in Windows-1252 (which turned out to be
the fallback encoding for pages that do not declare their encoding).
But if "wget is doing nothing here" why does the browser think that
only the original page declares its encoding?

That's because it does so in the server headers:
  HTTP/1.1 200 OK
  Server: cloudflare-nginx
  Date: Mon, 16 Jun 2014 20:57:44 GMT
  Content-Type: text/html; charset=utf-8
  Transfer-Encoding: chunked
  Connection: keep-alive
  Set-Cookie: __cfduid=d013cee3c290f7e90e20da6d064d43b7b1402952264442;
expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/;
domain=.helloquizzy.com; HttpOnly
  Cache-control: private
  X-OKWS-Version: OKWS/3.1.27.0
  P3P: CP="NOI CURa ADMa DEVa TAIa OUR BUS IND UNI COM NAV INT",
policyref="http://www.helloquizzy.com/w3c/p3p.xml";
  X-XSS-Protection: 1; mode=block
  Set-Cookie: guest=13646369309274507826; Expires=Tue, 16 Jun 2015
20:57:44 GMT; Path=/; Domain=helloquizzy.com; HttpOnly
  CF-RAY: 13b9ebe4ce25024c-CDG

When you save the page contents, the headers are not available*. The
page might had additionally declared them in a meta tag in the <head>,
in which case firefox would have detected correctly the encoding from
the local page,

Indeed, other tools like HTTrack add a <meta> tag for the encoding (maybe taken from HTTP headers) and Firefox detects the correct encoding -- in this case, UTF-8 -- rather than falling back to a default one. I wish wget could do this, as well.

With thanks,

Angel Tsankov




reply via email to

[Prev in Thread] Current Thread [Next in Thread]