help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

word boundaries in Asian languages


From: Eric Abrahamsen
Subject: word boundaries in Asian languages
Date: Mon, 19 Aug 2013 18:26:20 +0800
User-agent: Gnus/5.130008 (Ma Gnus v0.8) Emacs/24.3 (gnu/linux)

I use emacs for prose more than for programming, and I've been idly
fiddling with making it a better environment for editing other
languages, particularly Asian languages, particularly Chinese prose.

One of the really awkward things about editing Chinese prose in Emacs is
that word boundaries are bound to spaces -- in a language that doesn't
use spaces to delineate words, movement and editing commands are thus
restricted either to per-character, or per-punctuated-phrase. It's
unwieldy.

Accurately identifying word boundaries in Chinese is a subject of
academic research, but a couple of C libraries have emerged (I've pasted
a couple of likely links at the bottom).

Given that this level of programming is _way_ above my pay grade, I
raise the following totally hypothetical scenario. How likely is this:

1. I call "forward-word" (or some equivalent word-based command)
2. Emacs checks a variable like use-multilingual-words, or something to 
   that makes all the following optional.
3. It's true, so we check the script of the following character, and try
   a lookup in a variable that pairs scripts with C libraries that
   provide word-level commands for those scripts.
4. A library is present! Instead of the usual "forward-word", we now
   call a function from that library to identify the next word boundary.
   Point goes either to that spot, or to the end of a contiguous run of
   characters of the same script that we started in.

So external C libraries would have to be augmented with functions that
did word boundary location in a way that made sense to emacs, but
presumably the hard work would have already been done. Given my general
ignorance, how unlikely is all of this?

Thanks!
Eric

http://technology.chtsai.org/mmseg/
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.108.8593




reply via email to

[Prev in Thread] Current Thread [Next in Thread]