bug#17412: 24.3; Unicode key events broken, not usable in input method

From: Stefan Dorn
Subject: bug#17412: 24.3; Unicode key events broken, not usable in input method
Date: Tue, 6 May 2014 19:38:24 +0100

>> Digging around in keyboard.c, I found that read_char() only passes
>> events with keycode < 256 (line 3050ff) to input-method-function:
> Indeed, this has been in the input-method design from the start.
> I'd be interested to know why.  Handa?

I write a lot of linguistic analysis, and so added common IPA symbols
to my core keyboard layout, like ß, ł or æ. (I could type them through
an input method, but that would be slower and force me to use a
different typing method inside and outside of Emacs, which would slow
me down a lot.)

I recently set up a Cyrillic input method, but was surprised I
arbitrarily could use ß in quail but not ł, just because ß is below
the magic threshold. Unfortunately, merely turning off the conditional
in read_char() is not enough to get it to work.

More importantly, I also have most combining diacritic characters
(U+0301 ff) on keys and use them a lot. Switching them to some
"similar looking punctuation -> diacritic" input method would be
seriously annoying due to lots of conflicts (quoting a letter vs
umlauting it etc).

Most search features in Emacs don't do Unicode normalization, so ä (a
with umlaut) and ä (a with combining diacritic umlaut) don't match. I
added some normalization hacks to isearch and just force-normalize the
buffer when I save it, but wanted a more universal and clean solution.

I thought I could just set up a "letter + combining diacritic" ->
"normalized character" input method to fix most of this, but again
arbitrarily can't use any of the diacritics in quail.

>> [322] as key event seems strange to me. The XLib keycode for "ł" (as
>> reported by xev) is 0x1000142. Maybe Emacs cuts off the leading bit?
> 322 = U+0142, so it's really not strange at all: Emacs uses
> Unicode internally.

Ah, cool.

