Re: [gnu.org #1363250] ASCII maintain.txt is no longer ASCII

From: John Darrington
Subject: Re: [gnu.org #1363250] ASCII maintain.txt is no longer ASCII
Date: Wed, 27 Feb 2019 08:29:43 +0100
User-agent: NeoMutt/20170113 (1.7.2)

Just before we go off at too many tangents, a bit of background info for
this discussion.

* ASCII is a well defined standard, and all ASCII is UTF-8 (but the
  converse is not true).

*  The command iconv -f UTF-8 -t ASCII file will fail unless all the
   characters in file are already ASCII.  Hence it isn't a very useful

* The coding standards say that we should prefer ASCII wherever
  possible.  If it is not possible, then we should use UTF-8.

I think that Therese is saying that there are some files which are using
UTF-8 when ASCII would have sufficed.


On Tue, Feb 26, 2019 at 12:06:56PM -0500, Alfred M. Szmidt wrote:
        > I have noticed that maintain.txt and maintain.info[1] are no longer 
        > ASCII, but in UTF-8. In particular they contain lots of easily 
        > UTF-8 quoting characters (single and double quotes) that break 
        > displaying them in non-UTF-8 terminals. This is a pity because the 
        > use of such simple formats is to be displayed in simple terminals.
     I'm not sure what is the definition of "ASCII" here, are you talking
     about "printable" characters?  In that case, the Info format has
     always contained non-printable/non-ASCII characters, most notably #o37
     for section splitting, the "#o0 #10 [" sequence for images, etc.  So
     these files have never been very readable on "simple text terminals"
     (what do you mean by that more exactly? VT100 dumb terminal?).
     For the text files, I think it still makes more sense to use UTF-8,
     the default locale these days on GNU/Linux is UTF-8, and many of the
     command line tools will output UTF-8 style quoting characters if that
     is so.  
     Could you run your files through iconv and convert them from UTF-8 to
     ASCII?  Maybe,
        iconv -f UTF-8 -t ASCII file...
        > Given that there is just one letter out of the ASCII range in 
        > maintain.{txt,info} (the '??' in 'risqu??'), could it be possible to 
        > these files as pure ASCII? Thanks.
     990 matches in 490 lines for "[^[:ascii:]]" in buffer: maintain.txt
     988 matches in 489 lines for "[^[:ascii:]]" in buffer: maintain.info
     These are mostly quotes, but you have bullets and copyright, em-dashes
     as well.

