wget prints out information in unicode characters where ASCII could suff

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

wget prints out information in unicode characters where ASCII could suff

From:	ah
Subject:	wget prints out information in unicode characters where ASCII could suffice
Date:	Sat, 21 Mar 2020 14:40:40 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1

Hello,

When wget gets a page successfully (consider for example: wgetwww.gnu.org), it reports something like this:


...output omitted...
2020-03-21 14:00:41 (1.43 MB/s) - ‘index.html’ saved [1114171/1114171]

Please notice the two apostrophes enclosing the fetched filename are inunicode (U+2018 and U+2019, I guess?) whereas the ASCII apostrophecharacter ' is completely sufficient.


What inplications does that have, except from polluting the terminal?

For one, when a user tries to copy+paste the fetched filename (e.g.index.html) from wget's output, the apostrophes are either copied intothe buffer and that messes up further commands or the apostrophes arenot copied and the user needs to add apostrophes manually when pasting),e.g. try


ls ‘index.html’

it fails with

ls: cannot access '‘index.html’': No such file or directory

However, the single (ASCII) quotes are very important for a lot of usersin the case where filenames contain spaces or other characters that theshell does not like and need escaping. So it's a good idea to have them,but who would have thought that the devil is idle and decided to replaceall apostrophes in GNU software with unicode!

So, ideally (AFAIC) wget, on successful completion, should have printedthis:


2020-03-21 14:00:41 (1.43 MB/s) - 'index.html' saved [1114171/1114171]

(notice the single ASCII apostrophe for opening AND closing the filename)

and then the user could just copy that string and the apostrophes forfurther copy+paste.

I understand that there is danger in copy+paste-ing information from aprogram's output. But this is not relevant here as it is none of wget'sbusiness to deter users from copy-pasting its output. If that's a realconcern then consider printing the filename in hex or as an image orcall the copy-paste police and snitch the user when he/she attempts touse it.

But copy-paste is not the real issue here. There is another issue, farmore important: shell scripts processing wget's output.

That brings us to yet another case-in-point where this behaviour of wgetmakes our lives more difficult: using wget's output in a shell script inorder to find out the name of the fetched filed. Now, all of a suddenour shell scripts must deal with unicode characters too. This is a no-goscenario in many industrial places. A shell script may be classified assub-standard if it has to deal with unicode because of the cans of wormsthat opens.

In conclusion, my opinion is that this bug is one of the most unpleasantand dangerous bugs in wget as it pollutes the terminal with UTFcharacters when ASCII characters are more than enough to convey theinformation to the user. It opens not one but a tonne of cans of wormsand can have serious side effects to script processing in industry.

I would therefore URGE you to reconsider the use of unicode charactersfor mere aesthetic reasons especially when ASCII characters can be usedfor the same purpose. Aesthetics is a very subjective criterion as you know.

There must be serious reasons to give the KISS principle the capitalpunishment. Is this what GNU come to?

On a parallel note, please accept my congratulations for the very good,otherwise, software wget is. I am using it daily and I thank you (and Itoo have contributed to public domain software and with GNU licencing,spreading the karma of GNU)

bw,

[Prev in Thread]

Current Thread

[Next in Thread]

wget prints out information in unicode characters where ASCII could suffice, ah <=
- Re: wget prints out information in unicode characters where ASCII could suffice, Tim Rühsen, 2020/03/24

Prev by Date: Re: make install error
Next by Date: Re: wget prints out information in unicode characters where ASCII could suffice
Previous by thread: make install error
Next by thread: Re: wget prints out information in unicode characters where ASCII could suffice
Index(es):
- Date
- Thread