nano-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nano-devel] nano and mixed encodings


From: Benno Schulenberg
Subject: Re: [Nano-devel] nano and mixed encodings
Date: Tue, 21 Jul 2015 21:45:23 +0200

On Tue, Jul 21, 2015, at 05:08, Mike Frysinger wrote:
> On 20 Jul 2015 22:02, Benno Schulenberg wrote:
> > [1] https://lists.gnu.org/archive/html/nano-devel/2009-02/msg00018.html
> 
> man that was a while ago ... i'd completely forgetten about it :).

I came across that email while looking through the mailing-list archive
for the discussion that led to 'is_file_writable()' (this in relation to
https://savannah.gnu.org/bugs/?29312).  And anything Unicode catches my
attention.


> > Vim apparently autoconverts the file when it finds bytes in there
> > that are not valid UTF-8 and then assumes it to be ISO-8859-x.
> 
> i wouldn't mind nano doing charset conversion if it was controllable, but
> i wonder how useful it is in practice anymore.  as time moves on, i feel
> like non-UTF8 encodings are getting more and more uncommon

I agree.  And thus also agree that it isn't worth the effort to make
nano able to convert encodings.


> > Running 'emacs foo' shows this:
> > 
> >   \303ş  \303ĵ  ú  ü
> > 
> > (That is in my Esperanto locale; in other UTF-8 locales it will show
> > the same as vim.)  Searching finds, of course, just one ú or ü.
> 
> esperanto isn't an encoding (jokes aside ;]).  what did you have locale
> set to exactly there ?  LANG=xx_yy.ZZZ ?

$ locale
LANG=eo.utf8
LANGUAGE=eo:nl
LC_CTYPE="eo.utf8"
LC_NUMERIC="eo.utf8"
LC_TIME="eo.utf8"
LC_COLLATE="eo.utf8"
LC_MONETARY="eo.utf8"
LC_MESSAGES="eo.utf8"
LC_PAPER="eo.utf8"
LC_NAME="eo.utf8"
LC_ADDRESS="eo.utf8"
LC_TELEPHONE="eo.utf8"
LC_MEASUREMENT="eo.utf8"
LC_IDENTIFICATION="eo.utf8"
LC_ALL=


> > So... when nano wants to be like Pico, it should find only the
> > validly encoded ú and ü.  The patch attached to the following
> > rereported bug (https://savannah.gnu.org/bugs/?45579) does this.
> 
> the replacement character depends somewhat on the terminal encoding.
> unicode has a nice replacement character specifically for this [1]
> which would be used for each invalid incoming byte.  so the display
> (when the terminal is using utf8) ideally would be:
>   ú  ü  �  �

That is already what current nano is showing when running in
a UTF-8 locale, so we're good there.  :)


> for other encodings that lack such a dedicated character, i guess
> a plain question mark is the best we can do ?

Currently nano can handle only (a) UTF-8, and (b) single-byte encodings.
And in single-byte encodings, as far as I know, all bytes are valid codes,
so it would never be necessary to show a question mark that isn't a real
question mark.  (Control codes nano will show with a ^ plus the character
that is 0x40 higher.)


> wrt matching, i think what you're going for with that patch is what
> we should be doing -- only match valid encoded bytes.

Okay.


Benno

-- 
http://www.fastmail.com - Or how I learned to stop worrying and
                          love email again




reply via email to

[Prev in Thread] Current Thread [Next in Thread]