On 14/06/14 20:31, Angel Tsankov wrote:
Why does wget 1.15 (and 1.12) insert  in several places in the copy
it makes of the following page:
http://www.helloquizzy.com/results/helen-fisher-personality-type-test/?var_Explorer=1&var_Negotiator=1&var_Director=1&var_Builder=1&fromCGI=1
Short answer: because that's what is at that page.
Long answer: That page contains several non-breaking spaces (ASCII 160,
U+00A0) which when encoded as UTF-8 result in the bytes C2 A0. If you
read the page as if it was iso-8859, you will view instead the byte C2
as the glyph Â.
The page correctly states it's in utf-8:
Content-Type: text/html; charset=utf-8
so it should be read in utf-8 mode.
(wget is doing nothing here, it's just receiving bytes and storing in
the file as-is)