[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to make emacs auto-recognize utf-8 encoded files upon visiting

From: Charles Muller
Subject: Re: How to make emacs auto-recognize utf-8 encoded files upon visiting
Date: Fri, 27 Sep 2002 23:10:07 +0900 (JST)

Kai wrote:

> Maybe it is sufficient to install Mule-UCS?  I guess that TEI Emacs
> is Emacs with Mule-UCS pre-installed (plus some other packages).

TEI-Emacs does install Mule-UCS, but the reason for its ability to do what
it does must be more than that, because I always install Mule-UCS with my
Emacs, and they never render CJK fonts in Unicode until I install the TEI 
package. Since all of my
internet publication and data compilation has to be done in Unicode, that's
always the first thing I check. But I don't know enough about Lisp
programming to tell you exactly *what* the TEI package does to get this
working. No doubt Sebastian Rahtz, Christian Wittern, and some of the others
who wrote the package would be happy to tell you what the key routines are.

The first priority of the TEI people is to make sure that the SGML/XML/HTML
modes are working more precisely and comprehensively than in the standard
Emacs package,
since the target audience is mainly humanities scholars who are using
TEI-XML to mark up literary texts. For example, the way the PSGML is set up
in the standard package, it is hard to get it to determine the difference
between XML and SGML. They have also added a whole array of DTD's for
various purposes, including distinctions in XHTML/strict/transitional. There
is also an XSLT mode added that allows for adjustments and debugging. 

Then, on top of that, because there are so many of us working with mixed
international scripts (including CJK), apparently someone decided to figure
out how to get all the fonts properly recognized.

I am guessing that part of the problem facing the standard installation of
Emacs is that with any other traditional encoding outside of Unicode, such
as Big5, JIS, or KSC, you always have at least one full font set that is
traditionally mapped to the encoding. With Unicode, I don't know of a font
that is designed to work readily in Linux/Emacs, that covers all codepoints
(the way MS Arial Unicode does in Windows, for example). So a function needs
to be added which goes through the document and properly plugs in fonts for
each given codepoint. I am not well-enough versed at the technical end to be
able to explain how they have accomplished this over in Oxford.

I noticed a good bit of negative reaction toward TEI-Emacs when I first
mentioned it, where people expressed alarm about the TEI people not caring
about the about the GPL and not reporting to the GNU development team. I
think that these concerns come as a result of people not really checking
into what the package is, and what it does. It is not a new version of
Emacs, such as XEmacs. It is simply an add-on, that contains mode
enhancements, and some of its own new modes--just the way people are
accustomed to adding on calendar modes, e-mail packages, or whatever.

I know many of the dedicated people in the Text Encoding Initiative very
well, and there is not a bunch around who are more concerned about free
software and donating code. But when you really need to get a certain type
of application set up for a certain use, I don't think you can just write to
the GNU development team and then wait for a future version for it to be
implemented. And after all, the Lisp code for TEI-Emacs is just as openly
available as any other development based on Emacs. And they have also made a
concerted effort to have their add-on support GNU Emacs, rather than XEmacs,
so I really see it as a very positive development that should be learned
from, rather than disparaged.


Charles Muller  <address@hidden>
Faculty of Humanities,  Toyo Gakuen University
Digital Dictionary of Buddhism and CJKV-English Dictionary 
Mobile Phone: 090-9310-1787

reply via email to

[Prev in Thread] Current Thread [Next in Thread]