vile
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [vile] problem with 'wide characters' (utf-8) under macosx


From: j. van den hoff
Subject: Re: [vile] problem with 'wide characters' (utf-8) under macosx
Date: Sat, 06 Dec 2014 21:45:04 +0100
User-agent: Opera Mail/12.12 (MacIntel)

On Sat, 06 Dec 2014 18:46:06 +0100, Thomas Dickey <address@hidden> wrote:

On Sat, Dec 06, 2014 at 11:15:36AM +0100, j. van den hoff wrote:
forgot to Cc the list. sorry for the noise, brendan ....

On Sat, 06 Dec 2014 07:49:29 +0100, Brendan O'Dea <address@hidden> wrote:

>On 6 December 2014 at 09:39, j. van den hoff
><address@hidden> wrote:
>>[...] I want to use it in the native `Terminal.app' coming
>>with macos. here's the problem: despite `Terminal.app' being set up for
>>utf-8 character encoding, vile displays
>>non-ascii characters by their hexcode such as \u00E4. [...]
>
>Hi Joerg,

hi brendan,

>
>Could you paste the contents of the buffer produced by :show-variables
>when you are in a file which has such a problem?

sure. I've saved this list for _both_ cases, editing from within urxvt
(where everything is fine and
from within `Terminal.app' (where it is displaying the utf-8 hexcodes). I
here only list the differences:

urxvt:                           Terminal.app:
======                           =============
$curcol = 1                   |  $curcol = 6
$encoding =                   |  $encoding = UTF-8
$lcols = 9                    |  $lcols = 14
$locale = de_DE               |  $locale = UTF-8
$pagelen = 50                 |  $pagelen = 56
$pagewid = 141                |  $pagewid = 181
$pid = 33249                  |  $pid = 33243
$term-cols = 141              |  $term-cols = 181
$term-encoding = utf-8        |  $term-encoding = locale
$term-lines = 50              |  $term-lines = 56
$wlines = 48                  |  $wlines = 54

Testing the port (which seems to be old - "9.8" - "9.8o" is current),
I don't see any encoding differences.

most of these differences are obviously irrelevant but the encoding
related values differ, too...
I think I will start to read up, what exactly they mean in `vile --help'
...

>
>The output of the locale command from the shell and the value of $TERM
>may also be useful.

the problem might lie in this area. in urxvt I get

LANG=
LC_COLLATE="C"
LC_CTYPE="de_DE.UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

I have something comparable in uxterm (started in OSX):

        LANG=
        LC_COLLATE="C"
        LC_CTYPE="en_US.UTF-8"
        LC_MESSAGES="C"
        LC_MONETARY="C"
        LC_NUMERIC="C"
        LC_TIME="C"
        LC_ALL=

However - see below.

where I explicitely set LC_CTYPE to that value in (the equivalent of)
.xinitrc so that it is defined when the x11 window manager starts up (but
is ignored, of course by Terminal.app...) in Terminal.app I get instead:

LANG=
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

I see - I have this in Terminal.app:

        LANG="en_US.UTF-8"
        LC_COLLATE="en_US.UTF-8"
        LC_CTYPE="en_US.UTF-8"
        LC_MESSAGES="en_US.UTF-8"
        LC_MONETARY="en_US.UTF-8"
        LC_NUMERIC="en_US.UTF-8"
        LC_TIME="en_US.UTF-8"
        LC_ALL=

In its preferences ("Advanced" tab), I have
        Character encoding: Unicode (UTF-8)
        Set locale environment variables on startup

I have exactly the same there but end up with the strange `locale' settings
including LC_CTYPE=UTF-8. this definitely is no longer a vile related question but do you have any idea from where Terminal.app derives it's information _what_ locale environement vars to set (even in your case they are not the same -- with
the lucky exception of LC_CTYPE -- as in uxterm).


Generally I don't set locale variables in my shell startup scripts
(for special cases, I set those in scripts around certain programs).

which conforms to what I can select under "character encoding" in the
`preferences' settings of that program. so it's not exactly the same
locale but my (limited) understanding of these things is that "UTF-8"
alone should suffice and the country specfic qualifier (de_DE for me) has
not much of an influence? (and both terminals identify as xterm-color).

Not exactly.  One might suppose that the names are well-standardized, but
they are not. By itself, for instance, "UTF-8" as a locale setting likely refers to an alias. The names that I'm accustomed to using are those found
using "locale -a".

understood -- but I have no idea whatsoever _how_ that `locale' setting
in Terminal.app comes about ...


vile's different from the other editors because it will (if available)
use the "de_DE" to infer a useful value for the "8bit" encoding.
(vile has built-in locale tables in case "de_DE" itself is not supplied
on your machine, so that can do this - about 70kb).

I see.


I experimented a little, and see that your locale settings are confusing
vile. You can see this best by ":show-printable" and looking at the bottom
of the page (codes are showing as hexadecimal).

Using "de_DE.UTF-8" throughout (actually LC_CTYPE is the important one),
I don't see the hexadecimal characters in "9.8" or the current version.

I see something similar but not quite: in Terminal.app and
with the `UTF-8' value for LC_CTYPE I can hexcodes for positions 128-159 (\x80 - \x9F)
and a verbatim `?' for positions 160-255. If I then manually set
LC_CTYPE=de_DE.UTF-8 in that Terminal.app window and restart vile I

1) still get the hexcodes for pos. 128-159 (but the same happens in urxvt)
2) get regular chars for 160-255
3) most important: the `??' problem when entering diacritical characters such as ü vanishes

only problem: I don't see any way to convince Terminal.app to use a valid (fully qualified) value for LC_CTYPE
in the first place...


This might be related to the "??" problem - I'm not sure.

bingo ;-), see above (thanks!). the whole remains confusing for me, though. for one, I don't understand in which way the LC_CTYPE=UTF-8 setting is confusing vile (since as explained at least after a redraw the entered `ü' (and similar) are rendered correctly in the buffer (while not being displayed in the show-printable output). but obviously there'll be some hidden reason for this. the other thing which remains unclear for me is how I manage to end up with LC_CTYPE=UTF-8 in Terminal.app in the first place.
but that's probably not a problem for this list...


("xterm-color" is problematic as well - a different topic).

is it? what do you recommend here?


the strange thing is that several other editors
recognize these settings in a way that utf-8 is displayed correctly in
bother terminal emulators.



--
Using Opera's revolutionary email client: http://www.opera.com/mail/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]