[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nano-devel] searching for unicode text: bug or feature

From: Mike Frysinger
Subject: Re: [Nano-devel] searching for unicode text: bug or feature
Date: Sun, 15 Feb 2009 00:15:00 -0500
User-agent: KMail/1.11.0 (Linux/2.6.28; KDE/4.2.0; x86_64; ; )

On Saturday 14 February 2009 22:32:27 Chris Allegretta wrote:
> On Sat, Feb 14, 2009 at 1:18 AM, Mike Frysinger wrote:
> > if i open a file that has a mix of iso-8859-1 characters and utf8
> > characters, i see some surprising behavior when my environment is utf8
> > based.  for example, if the file has iso-8859-1 umlauted characters (like
> > u-umlaut: ü) as well as the unicode version of it, then doing a search
> > for ü will match both. i'd expect the search to only match the unicode
> > variant.  i'm wondering if this is intended behavior, or if the search
> > code doesnt handle wide characters properly.
> Hmm, that's a toughy.  I don't even know how to generate keystrokes
> like that to test this out.  Is it something that can be done with an
> ANSI keyboard?

you might be able to cheat and just copy & paste a unicode character.  
personally, i use xmodmap to extend my US keyboard.  i setup my capslock key 
to be a modifier and have it convert normal chars to umlauted and such.  you 
can see the file i use here:
then run:
xmodmap Xmodmap
this should allow you to type umlauted chars by holding capslock and pressing 
a vowel.

as for how exactly this char gets interpreted depends on your locale settings.  
if you run `locale`, you can see your current settings.
LANG=en_US <your terminal program>
<you get iso-8859-1 chars>
LANG=en_US.UTF8 <your terminal program>
<you get unicode chars>

to create a testing file, you can do this with bash:
echo $'\374 \303\274' > foo
the first byte is ü in iso-8859-1 while the last two bytes are ü in utf8

> I would probably hazard without looking at the code
> that we should try and do what the big boys (vim and emacs) do in this
> instance.

if you do, take a grain of salt with them ... i know vim tries very hard to do 
automatic conversion between iso-8859-* and unicode and such.  best to not 
descend into that rat hole and just force people to use sane environments.  
otherwise we get into crappy workarounds that can never go away and some 
people report "this weird crap works for me" and so the code behavior randomly 
shifts around without any kind of spec to back it up.

Attachment: signature.asc
Description: This is a digitally signed message part.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]