emacs-bidi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [emacs-bidi] detecting the wrong order of characters


From: Alex Schroeder
Subject: Re: [emacs-bidi] detecting the wrong order of characters
Date: Wed, 07 Nov 2001 16:13:24 +0100
User-agent: Gnus/5.090004 (Oort Gnus v0.04) Emacs/21.1 (i686-pc-linux-gnu)

Eli Zaretskii <address@hidden> writes:

>> if you can read Hebrew postscript, check this address out
>> http://www.cs.huji.ac.il/labs/learning/Info/Ps_Files/lecture2.ps
>> but think of orders pairs of letter.  with SOFIOT in hebrew, this should
>> work very nicely.

What does the above document say, exactly?

> Is something like that possible with Arabic?

I don't know how Arab is represented in Unicode.  Picking up my book,
however, I see that there are often three ways of writing a letter --
one for the beginning of words, one for within words, one for the end
of words.  Are these represented as different characters in Unicode
(or Latin 6)?  If so, then that information could be used.  If not,
language specific constellations must be looked for.  :(

This reminds me of the very simple language detection scheme I am
using.  A similar technique might also work to detect visual order --
all we need is a list of common direction-identifying sequences.

(defvar guess-language-rules
  '(("en" . "\\<\\(of\\|the\\|and\\|or\\|how\\)\\>")
    ("de" . "\\<\\(und\\|oder\\|der\\|die\\|das\\|wie\\)\\>") 
    ("fr" . "\\<\\(et\\|ou\\|[ld]es\\|que\\)\\>"))
  "Alist of rules to determine the language of some text.
Each rule has the form (CODE . REGEXP) where CODE is a string to
identify the language (probably according to ISO 639), and REGEXP is a
regexp that matches some very common words particular to that language.
The default language should be listed first.  That will be the language
returned when no REGEXP matches, as would happen for an empty
document.")

(defun guess-buffer-language ()
  "Guess language in the current buffer.
Adapted by Alex.
From: Jean-Philippe Theberge <address@hidden>
Subject: Re: Guessing a language.
Newsgroups: gnu.emacs.help
Date: 03 Mar 2000 16:46:41 +0100"
  (save-excursion 
    (goto-char (point-min))
    (let ((count (map 'list (lambda (x)
                              (cons (string-to-number
                                     (count-matches (cdr x))) (car x)))
                      guess-language-rules)))
      (cdr (assoc (car (sort (map 'list 'car count) '>)) 
                  count)))))

(defun guess-language ()
  "Guess language in the current buffer."
  (interactive)
  (message (guess-buffer-language)))


Alex.
-- 
http://www.emacswiki.org/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]