[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freecats-Dev] Following Y.S.' first review of specification docs

From: Kemper DOC (Nerim)
Subject: [Freecats-Dev] Following Y.S.' first review of specification docs
Date: Mon, 23 Jun 2003 12:58:32 +0200

Hi all,

I quite agree with Yves Savourel's first comments. A few things I can say
while keeping it short:

> possible third type of match (..)
> 'perfect match' (I'm not sure how to call it) for the case
> where it's an exact match and you are able, somehow,
> to detect that the context is also identical.

I'll be naive here - this seems a bit theoretical to me. Yves (or anybody
else) may come up with a first draft as to how implement it, based on some
rules, and we'll certainly try to see how our preliminary (alpha) version
can benefit from it...

So even if I feel the idea goes in a good direction, I might be a bit
sceptical at first, but why not?

As a rule, I will only insist on modularity. My day-to-day work experience
shows that a number of agency customers (including very large ones) tend to
request that we don't review (so that they don't pay) 100% matches on
existing projects, even though in some of them, existing translations are
not exactly perfect.

> (...)
> I my opinion a large part of what TMs do is just a patch, a
> remedy for the symptom of the real problem: the fact that
> you don't know what part of the document has change
> between version 1 and 2. An updater module would be a
> huge step forward: a way to compare source doc version
> 1, source doc version 2, translated doc version 1, and
> create the translated version 2 with the delta left to edit
> or translate (and then, at that point the translator+TM
> takes over).

Well, pre-translating a new version of a document with the previous
version's TM looks like a possible way to achieve this to me.

> A note on the TM server repository: You seem to look into
> XML-databases with XML-based indexing engine. It's certainly
> a possibility, but don't discard more simple classic database as
> well.

Sure. In fact, at one stage, I thought about flat text files and custom
index files. We also identified several free native XML databases - Apache
Group's one plus another one:

Other XML stuff at:

And I also found a free text database, Berkeley DB:

> Berkeley DB is distributed under an open source license that
> permits its use in open source applications at no charge.
(it seems they also sell a XML version built on top of it...)

The thing is, we only want:
- to store strings (preferrably Unicode)
- to build custom indexes (preferrably ones that don't need to be regularly
regenerated when a database grows in order to keep performance high, like
- FAST search of fuzzy & perfect matches (well, as reasonably fast as is

> Something like mySQL for example is free and performing
> very well.

Why not. Choosing the server's development language might also influence the
choice of a DBMS.

> Related to fuzzy matching: I've also attached an old article
> ( that explains one way to create a simple TM
> engine.

Well, that's for Tim. Thanks for providing this link.

> You probably have heard of T-Remote, a product from
> Telelingua, which is basically the same think as your TM
> server. They have developed an interface and the
> 'connectors' to plug their workbench client to various
> existing TM suites.

Yes - only recently, after launching our own project, so for me, it was
merely an encouragement: other people had the same idea.

> Maybe they would be interested in some form of collaboration.

Well, of course, we could suggest them to go open source ;-)
More seriously, this has to be done. Any volunteer around?

> Philippe Mercier's article in one of the LISA newsletter

Yes, he wants to sell the concept :-)
Of course, to some extent, it might be possible to achieve this via VPNs and
a classical Wordfast / Trados / whatever, but this is not really what we

> One other minor thing: Getting files in SWX format made me
> smile.

I plead guilty. I chose it because I have a Linux PC at home and it was
really convenient to switch with my office's Windows PC. The only "offended"
people should be the Mac fans, as OO is (was?) still at beta stage on Macs.
Other than that, it has a rather small footprint (compared to M$ stuff at
least), is very stable and not less usable than Word.

> but I would have expected an open source project to have
> documentation output in a very common format like HTML.

Sure, we could also use it to publish HTML. Will be done some day.

> This actually made me think about a possible problem of open
> source projects. Many are a little bias toward Linux, Java, etc.
> in reaction against Microsoft often. But the mainstream of
> possible users are on Windows, and expect Windows-like
> applications.

Well, we know we will have to spend time and efforts on cross-platform
issues (client side). This is why implementing a translation client within
OO writer could make a lot of sense (assuming the issues are solved
satisfactorily). Web-based solutions are probably worse as far as productive
GUIs are concerned (see Marc Prior's recent comments).
Or, hopefully, some cross-platform (Win / Linux / Mac) GUI builders (like
wxWindows) are now mature enough, and some things can be parameterized in a
platform-dependent way - we could hope to obtain something suitable, even if
it's not as finalized as native GUIs.
Or, if we are modular enough, it might be conceivable to build two or three
GUIs on top of our libraries.



Attachment: Waikoloa
Description: Zip compressed data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]