[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nano-devel] searching for certain UTF8 characters also finds ghosts

From: Mark Majeres
Subject: Re: [Nano-devel] searching for certain UTF8 characters also finds ghosts
Date: Sat, 21 Mar 2015 18:08:02 -0700

On 3/21/15, Benno Schulenberg <address@hidden> wrote:
> On Sat, Mar 21, 2015, at 20:54, Mark Majeres wrote:
>> >> > Strangely this happens only for characters in the range
>> >> > U+00A0 to U+00BF, which baffles me.  (I've tested it with
>> >> > ?, ?, ?, ?, ? -- they all get doubly found -- but also
>> >> > with ?, ?, ?, ?, ?, ?, and ? -- they all are found once.)
>> >
>> > Hmm.  Your mail is UTF8-encoded, but these characters haven't
>> > come across properly.  Did you see them okay?
>> They came in the first time just like they look now :)
> How strange.  You are sending now via gmail, but maybe you use
> a non-utf8 locale on your machine?
it is possible my other email needs to be configured for utf8. I
funnel it all through gmail

>> Yes, shifting one multi-byte char to the left is a little bit of work.
>> It's not very pretty.
> Well, I have a patch for that.  Attached.  I can't notice any
> performance improvement anywhere, though.  It could still
> be made quite a bit faster by looking at the bytes themselves,
> but that requires some thinking and won't look pretty.

How about just providing a better initial starting point?

size_t before = pos-mb_cur_max();


reply via email to

[Prev in Thread] Current Thread [Next in Thread]