[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Nano-devel] nano and mixed encodings
From: |
Benno Schulenberg |
Subject: |
Re: [Nano-devel] nano and mixed encodings |
Date: |
Tue, 21 Jul 2015 21:45:23 +0200 |
On Tue, Jul 21, 2015, at 05:08, Mike Frysinger wrote:
> On 20 Jul 2015 22:02, Benno Schulenberg wrote:
> > [1] https://lists.gnu.org/archive/html/nano-devel/2009-02/msg00018.html
>
> man that was a while ago ... i'd completely forgetten about it :).
I came across that email while looking through the mailing-list archive
for the discussion that led to 'is_file_writable()' (this in relation to
https://savannah.gnu.org/bugs/?29312). And anything Unicode catches my
attention.
> > Vim apparently autoconverts the file when it finds bytes in there
> > that are not valid UTF-8 and then assumes it to be ISO-8859-x.
>
> i wouldn't mind nano doing charset conversion if it was controllable, but
> i wonder how useful it is in practice anymore. as time moves on, i feel
> like non-UTF8 encodings are getting more and more uncommon
I agree. And thus also agree that it isn't worth the effort to make
nano able to convert encodings.
> > Running 'emacs foo' shows this:
> >
> > \303ş \303ĵ ú ü
> >
> > (That is in my Esperanto locale; in other UTF-8 locales it will show
> > the same as vim.) Searching finds, of course, just one ú or ü.
>
> esperanto isn't an encoding (jokes aside ;]). what did you have locale
> set to exactly there ? LANG=xx_yy.ZZZ ?
$ locale
LANG=eo.utf8
LANGUAGE=eo:nl
LC_CTYPE="eo.utf8"
LC_NUMERIC="eo.utf8"
LC_TIME="eo.utf8"
LC_COLLATE="eo.utf8"
LC_MONETARY="eo.utf8"
LC_MESSAGES="eo.utf8"
LC_PAPER="eo.utf8"
LC_NAME="eo.utf8"
LC_ADDRESS="eo.utf8"
LC_TELEPHONE="eo.utf8"
LC_MEASUREMENT="eo.utf8"
LC_IDENTIFICATION="eo.utf8"
LC_ALL=
> > So... when nano wants to be like Pico, it should find only the
> > validly encoded ú and ü. The patch attached to the following
> > rereported bug (https://savannah.gnu.org/bugs/?45579) does this.
>
> the replacement character depends somewhat on the terminal encoding.
> unicode has a nice replacement character specifically for this [1]
> which would be used for each invalid incoming byte. so the display
> (when the terminal is using utf8) ideally would be:
> ú ü � �
That is already what current nano is showing when running in
a UTF-8 locale, so we're good there. :)
> for other encodings that lack such a dedicated character, i guess
> a plain question mark is the best we can do ?
Currently nano can handle only (a) UTF-8, and (b) single-byte encodings.
And in single-byte encodings, as far as I know, all bytes are valid codes,
so it would never be necessary to show a question mark that isn't a real
question mark. (Control codes nano will show with a ^ plus the character
that is 0x40 higher.)
> wrt matching, i think what you're going for with that patch is what
> we should be doing -- only match valid encoded bytes.
Okay.
Benno
--
http://www.fastmail.com - Or how I learned to stop worrying and
love email again