[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Freecats-Dev] Re: Trados/other CAT, Python/Java, German/English
From: |
Henri Chorand |
Subject: |
[Freecats-Dev] Re: Trados/other CAT, Python/Java, German/English |
Date: |
Tue, 25 Feb 2003 01:05:45 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20021003 |
Dear Charles,
Congratulations on the excellent work you are doing as project
> leader.FreeCATS has acquired a great deal of momentum in its
> short period of existence, and I think most of the credit for
> that can be put at your door.
Well, thanks... what I tried to do was mainly information search &
communication.
Up to now, (I had read about it but it's striking to see how is proves
valid) I'm amazed at how much of this project is about:
- looking for existing projects
- assessing them (especially compatibility between them and with
Free CATS's goals)
- trying to make people with very different backgrounds cooperate.
> (...)
So import and export filters for TMX is a must. Are there any
descriptions of this file format available?
As found in the link list at the last page of Development Roadmap document:
http://www.lisa.org/tmx/
Seeing from Yves Champollion's latest feedback, not all CAT tools may
know how to handle well its last flavours, but to date, it remains one
of the few readily implemented open standards which allow true
interoperability between existing tools, like between Trados & Wordfast
for instance.
(...)
- I am more worried about lock-in, especially wrt. the
Python programming language. It is an excellent tool
for doing quick hacks that need OO, but it behaves almost
completely unlike any other programming language in its
semantics. If the reference implementation is Python,
we will find it difficult to support programmers who loathe
Python (and they do exist).
Of course - that's the trouble with having to choose a language ;-)
Certainly, but it doesn't follow that all languages are equal.
> The point I wish to make is that Python is something of an outlier
> in the family of languages, and while it is quite intuitive and
> flexible, not all the claims that the Python language enthusiasts
> make for it should be taken at face value.
Sure. Tcl also claims similar advantages, but I ended up feeling it was
too lightweight and a bit weird. Maybe these drawbacks may also apply to
Python to a lesser extent.
One of the (now extinct) languages I programmed the most with was Pick's
Basic. I remember how it was very handy to manipulate strings with it,
but I have nothing against a strongly typed language like C.
It has one of the strangest approaches to variable extent that I
> have ever seen, one that is often misdescribed as `lexical scoping'
> but would be better described as `a lexical tower of dynamic scopes';
> the approach is also quite expensive in terms of demanding many
> run-time dictionary lookups, and while good results have been
> achieved with a ruthlessly-optimising Python compiler, I think a
real performance penalty will be paid if we adopt python as our
scripting language as opposed to Tcl or Perl (Tcl has an excellent
compiler, and AFAIK Perl's compiler gets much better results than
Python's).
>>> - I am all for a Java implementation. Java has excellent
>>> libraries, and many PLs can target the JVM, including
>>> python, tcl and Scheme.
Anyway, I strongly hope Free CATS team will start work from an existing
project (OmegaT is the actual favourite), and if it's the case, then
we'll simply continue along the same lines, and the language will have
been already chosen.
As OmegaT is written in Java, that would settle it and you would be
happy with this option ;-)
Java isn't my favourite language, but agreeing on the Java runtime
doesn't preclude coding in another language. An option is to use
Jython (the Python-on-JVM implementation) to code quick hacks and
rewrite in Java. The language I am most productive in, Scheme, has
> an excellent JVM implementation, namely SISC. I don't know of a good
Perl implementation on the JVM, but that may just be my ignorance.
Jython would be handy in that we should probably be able to quickly test
a number of things with Python, while keeping the bulk of the code in
Java. I won't dwelve further into the more detailed background info you
provide, because I'm not competent for this.
(...)
I would like to code, but I have a rather full timetable over the next
two months and another free software project commitment which has
priority for me at the moment, so it is difficult for me to promise
anything definite in advance. Also how much contribution I make will
depend on which language is adopted. Despite my reservations above, I
would be willing to code in Python.
(...)
...but if *I* am the one to start coding, I will almost certainly code
in SISC (ie. the scheme-on-JVM I mentioned before), and I will not be
starting in the next few weeks.
As I see it, none of us should start coding now. My personal time is
also limited, running a small & busy company takes at least 50 hours a
week - and I also happen to be the happy father of 3 girls aged 11, 7 &
7 (the latter being twins, as you will have guessed).
The most important task seems to me to continue assessing other free
software projects in order to determine which one we may start from.
If we are careful enough about what our prerequisites really are and
what may be improved later, and if we keep a modular approach, I believe
we can't make blatant mistakes.
Apart from the technical features, the openess of mind and willingness
to cooperate in order to achieve the best possible solution are of prime
importance, and I consider the nice & high quality feedback provided for
instance by Keith Godfrey and Yves Champollion are very promising. If we
can establish a true cooperation, then we can be sure we'll make a great
product, and it may require a quite reasonable amount of NEW coding, at
least for a start - you'll remember what I said, we only have to begin
with what's available and make it work on a modular basis, after which
lots of other volunteers will come much more easily.
(...)
- I am assembling an argument that we will need to handle
hierarchical structure to get results with German->English
translation. More to follow, not necessarily all that soon.
Great - at least somebody kep working today while all of us talked.
I'm not sure what you mean with "hierarchical structure" here, but I
suppose I'll just have to wait a little and I'll see it. I hope it
nicely fits in the picture as a clever indexing feature on top of raw
segment storage in a TM.
Hope to send a message on this later this week. By hierarchical
structure I mean the parse trees that linguists represent using
> x-bar grammars, eg.:
Sentence
/ \
Noun phrase Verb phrase
/ \ / \
Determiner Noun Verb Noun phrase
| | | / \
The cat sat Prep Noun phrase
| / \
on Det Noun
| |
the mat
Thanks for your feedback. Am I right in recognizing what I call a
semantic layer?
I know this domain a bit, as I worked around two years altogether on
[the documentation of] SPIRIT software - you might have heard about it,
because it was a pioneer in electronic document management systems.
One of the lessons I learnt is that while very interesting, this kind of
approach represents a huge amount of work for each single natural
language added up to such a system.
To the best of my limited knowledge, it also requires building up a
dictionary for each such language, in order to be able to recognize
words and therefore to assign each of them a category (noun, verb,
etc.). Things also tend to get worse with technical documentation in
that you keep adding new entries in these dictionaries nearly all the
time (for each flavour of technical jargon you happen to be translating).
I consider this kind of approach may come at a later stage, and I hope
what Free CATS server will end up into can be compatible with such a
system, but I believe most team members will agree that it should be
done later on. I hope I don't sound pessimistic on this issue. Let me
know if I missed your point.
Of course, the question of taking into account the various forms of a
given word (conjugation for verbs, singular/plural forms for nouns &
adjectives) might make us end up with such a *simplified* semantic
layer. This is what we wanted to avoid by using N-Grams (words'
sub-strings) so as to use statistics' brute force ("cat" being a
substring of "cats") instead of an "intelligent" processing.
I hope Keith will soon detail us how he did things with OmegaT.
Cheers,
Henri
- [Freecats-Dev] Re: Interface/vote, Charles Stewart, 2003/02/19
- [Freecats-Dev] Trados/other CAT, Python/Java, German/English, Henri Chorand, 2003/02/19
- Message not available
- [Freecats-Dev] Re: Trados/other CAT, Python/Java, German/English, Henri Chorand, 2003/02/24
- [Freecats-Dev] Re: Trados/other CAT, Python/Java, German/English, Keith Godfrey, 2003/02/26
- [Freecats-Dev] Standalone client editor / OO integration (from Keith), Henri Chorand, 2003/02/26
Re: [Freecats-Dev] Re: Interface/vote, David N. Welton, 2003/02/19