[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Emacs and Unicode (was Re: Combining German Umlauts...)

From: Gordon Deane
Subject: Emacs and Unicode (was Re: Combining German Umlauts...)
Date: Wed, 03 Dec 1997 23:16:43 +1100

On 25 Nov 1997, Valeriy E. Ushakov wrote:
> On Mon, Nov 24, 1997 at 10:55:35PM +0700, Victor Sudakov wrote:
[lout discussion snipped]
>   German @Language  { SomeLatin1Font   } @Font { ...german text... }
>   Russian @Language { SomeCyrillicFont } @Font { ...russian text... }
> It *is* a problem for a text editor you'll use to prepare Lout file.
> Usually text editor uses single font to display edited text.  This
> font is either latin1 (western european) or koi8-r (cyrillic).  Thus
> either your umlauts will be displayed as russian letters or your
> russian letters will be displayed as latin with diacritic marks.
> * Emacs can do that.
> You *can* teach Emacs to use different fonts for passages in diffrent
> langauges.  A good enhancement for a lout-mode would be a change
> language command that will put lout @Language command and switch the
> font (registry+encoding suffix in XLFD).  Of course, parsing @Language
> and switching fonts when file is found into a buffer and displayed is
> also necessary.

Trying to do this by sprinkling fonts about from Lisp is really hard.

Fortunately both Emacs and XEmacs 20 can be compiled to be internally Unicode. 
 Both are available now although they are still somewhat beta.  This is a much 
better solution to the problem.

The way it works is you edit a buffer of Unicode.  This is displayed using 
'font sets', which are a set of fonts with different encodings that between 
them cover most of the characters you need.  The idea is you don't need a 
single font to hold all glyphs if you have all of them in different fonts.

In particular you could have a 'font set' called Arial with both Cyrillic and 
Roman Arial-like fonts.

You can change the alphabet the keyboard maps to on the fly.

This should work pretty well.  But, until Lout can use Unicode it raises the 
following problems:

1) Saving:  Turning Unicode into valid Lout.  Do you automatically insert font 
changes (which requires you to mess about with the markup) or do you try to 
warn about character set mismatches?  Perhaps an option on both.

2) Loading 8-bit Lout documents into Unicode so that the Umlauts display 
correctly amid the Cyrillic :-)
This should be possible in lout-mode eventually.  Unfortunately you want this 
transformation to be non-destructive and preferably reversible which is hard.

3) Disk formats.  You need to be able to save 8-bit Lout for now; but it would 
also be nice to save (and work in) UTF-8 encoded Unicode.  Then the above two 
problems disappear for saving/loading (UTF-8 is fully reversible etc.)  but 
you have problem (1) when formatting.

Would it be worth writing a simple UTF-8 stripper along the lines of 
while input {
  read 16-bit char x
  is x in some Lout mapping Y?
  yes:  output the 8-bit representation of x in Y ie. Y(x) will
  print correctly in a font matching Y.
  no:  print an error

This probably works well for documents in a single encoding
unix$ utfstrip < myfile.lout | lout -
or do it in a Makefile.  (I know this is not ideal...)

This could be confusing where the choice of Y is not unique.  I think this is 
mostly x < 128 (ASCII) where Y=identity is probably equivalent to all the 
others anyway.  Is this the case?

I only speak English (or at least Australian ;-) but I hope I can build some 
of this into my Emacs mode eventually.



   Gordon Deane     |'You know you could find yourself charged with 
Engineering/Science | being a dominant species while under the      
Australian National | influence of impulse-driven consumerism, don't
    University      | you?'                - Alien Cop, Good Omens

reply via email to

[Prev in Thread] Current Thread [Next in Thread]