Re: [Devel] Re: Linux Console in UTF-8

freetype-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Devel] Re: Linux Console in UTF-8 - current state

From:	Antoine Leca
Subject:	Re: [Devel] Re: Linux Console in UTF-8 - current state
Date:	Wed, 09 Oct 2002 12:06:35 +0200

En Vadim Plessky va escriure:
> 
> |  > > And presumably FreeType2 will have, or acquire, the smarts for
> |  > > rendering the Arabic and Indic scripts properly.
> 
> I am wondering *how important* those Arabic and Indic scripts?
> While there is a certan number of people living in those countries, I doubt
> that they have a lot of computers, and nuymbe rof *Linux* users from that
> number is quaestionable, too.
> And when those things happen to change - we will see some people willing to
> contribute to free fonts for those languages. But this won't happen
> tomorrow...

A few informations about these are in order.

First, they are two quite different problems.

Arabic needs basically an engine to reorder the glyphs (like Hebrew), this is
not a great problem, bt as Werner says it has to stand (at least) one level
higher. I assume this point is solved (if it is not, it would before the
"5 years" term from Edward.)
Beyond, Arabic needs an engine that selects glyphs according to the position
inside the word (initial, medial, final); again, such an engine exists for the
simple case (Naskh style, no ligature outside l+a) for years.
Such an engine is theorically complicated by two factors, that sums up:
ligatures that may be present in the font, and in the case of Nastaliq style
(used for Urdu, but not normally for the Arabic language) by the fact that
word are written in an oblique way, so potentially complex relative
positionnings should be forecasted. This is where we lack software, and also
free software. But as you may infer from above:
 a) we are talking of a very reduced part of the usage of Arabic; I believe
for example that even Urdu could be written in Naskh in the context of a
console (since it would take less vertical space)
 b) we really need to be (some) levels higher than the present use of Freetype
(for example, to delimit word boundaries), which asks for perhaps difference
at interface level, a not-very-trivial problem as I see things...
Also, there is the fixed-width problem, that is not specific to Arabic (at
the contrary, you can even write ugly fixed-width Arabic, while you can't with
Thai) is another point to take in account.

As a result, I do not see Arabic as a very big problem, but rather as
something in order of magnitude of other "different" scripts, such as
Thai (and Lao) or Hebrew; certainly, there is a lower priority than CKJ(V),
but it will happen.


Indic scripts are quite different. There is no short track, unless you cut
down the problem using English, which is the "solution" that use the very
vast majority of the Indian computer users! By the way, don't forget that
even if the number of computers in India (or in Arabic-speaking countries)
may be low, they are not, by far, the only ones that may use their "native"
scripts: for example, there is a laaaaarge number fo Indians that live in
USA (and other countries), work as various classes of scientists, and owns
computers, plently of them running Linux... Another point is the increasing
use of Arabic (because of the emigrants) in various administrations,
particularly in Europe (including perhaps Russia); and the current trend is
clearly that they are "invited", for economic reasons, to switch to Linux...

Back to Indic scripts: first, there is not two main styles, as for Arabic,
but rather a dozen (including Sinhala or Birman); the mainly used,
(Deva)Nagari, used for the Hindi language, is probably one of the most
complex, and quite certainly the one that uses the biggest numbers of
ligatures, which inherently restrict the availability of fonts, and even
more of good-quality fonts. To render correctly, not only you should have
a mecanism to deal with ligatures, but there are a number of quite complex
rules of reordering: as a simple example, Hindi is written as
          i h diacritic_for_n d i
And when it comes to the use of r, things gets much more complex. As a
result, it is _very_ important to have an upper level, which means using
a different interface, quite distinct from the actual character=glyph=space
paradigm that is often seen used (I do not know first-hand how it is
interfaced in the current state-of-affairs.)

To add another complexity, there is no current agreement about the way to
encode Indic fonts. Besides proprietary glyph-based encodings (that clearly
do not scale up), the Apple scheme looks like a dead way, so the only
"solution" I see is the OpenType scheme, which fits more or less with
Unicode (but lags about 6-8 years later), and is initiated (and as I see
things, still currently "owned") by Microsoft, something that is not really
welcome in the Linux community ;-).
Another way is a repertoire a presentation forms. This works has been done
for Tamil (which is about the simpler script), but I do not see further
advances for the "others" scripts, probably because people do not agree
about the required and not-required ligatures.
Then, the requirement for console clearly add the problem of width, which
for Indic scripts (which have quite a number of "diacritics", beginning
with a number of vowels); another problem is that some glyphs are drawn
outside the box, that is they extend before or after the "limit" of the
character: obviously, displaying this on a text terminal is likely to be
difficult...

As a result, I do not believe that efforts for the Indic scripts are likely
to be successful for the very next years: this is probably more of a
long-term project; consequently, I believe that Indians will continue to
use English when speaking with computers for a few years...

BTW: I will be very happy to prove wrong; and if some people need a hand in
the Indic scripts area, I perhaps can help; but I found it too difficult to
drive it myself.

And the bottom line is; beware with the interface model, if you do not want
to be forced to destroy the whole castle in a few years.


Antoine

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Devel] read_lwfn, (continued)
- [Devel] Re: Linux Console in UTF-8 - current state, Edward H Trager, 2002/10/04
  - Re: [Devel] Re: Linux Console in UTF-8 - current state, Vadim Plessky, 2002/10/06
    - Re: [Devel] Re: Linux Console in UTF-8 - current state, David Starner, 2002/10/07
    - Re: [Devel] Re: Linux Console in UTF-8 - current state, Vadim Plessky, 2002/10/10
    - Re: [Devel] Re: Linux Console in UTF-8 - current state, David Starner, 2002/10/10
    - Re: [Devel] Re: Linux Console in UTF-8 - current state, Vasilis Vasaitis, 2002/10/07
    - Re: [Devel] Re: Linux Console in UTF-8 - current state, Antoine Leca <=
  - [Devel] Re: Linux Console in UTF-8 - current state, Werner LEMBERG, 2002/10/08
    - [Devel] Re: Linux Console in UTF-8 - current state, Edward H Trager, 2002/10/08
  - Re: [Devel] Re: Linux Console in UTF-8 - current state, Tom Kacvinsky, 2002/10/11

Prev by Date: [Devel] beating the patents - an idea
Next by Date: [Devel] read_lwfn
Previous by thread: Re: [Devel] Re: Linux Console in UTF-8 - current state
Next by thread: [Devel] Re: Linux Console in UTF-8 - current state
Index(es):
- Date
- Thread