[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Tetum-translators] Re: Carrion Marinade

From: Peter Gossner
Subject: [Tetum-translators] Re: Carrion Marinade
Date: Wed, 25 Feb 2004 06:22:41 +1030

On Wed, 25 Feb 2004 09:25:39 +1030  from a terminal far far away
<cromwell/>  wrote:
>Dear Peter and Lev, i'm not too proud to admit it, it's going to take
>me awhiles to absorb all this, it all looks pretty legit though from a
>brief scan.  Working on it.  

Der Wodka ist stark aber das Fleisch ist schwach

Hey Crommers..
Sorry if I gave the impression that I DO understand it .. 
I don't really. But I've never let something like that stop me before so
.. :)
I am starting to absorb the stuff Lev sent. (the links)
I have some good pdfs I can attach if you like (or URL)

I guess we need to identify the "types" first and agree on a naming
scheme / interpretation. The algorithms they tend to use are really
just abstracted methods / shorthand concept scopes if ya like.

Much has been achieved in this area for example wordnet and EMBROLA 
Festival etc. I think there is a difference though. (well those three
aren't really concerned with translations at all.. they just get the box
to say stuff or find associated words / phonemes )
The objective of machine translation of complex instruction sets. 
WOW now that's a big area. We
just need to map in a possible set of translations, Levs original
concept was to use parts of speech as the" selection" method.(well
that's how I understand it).

Nothing I have read leads me to think that Levs concept is any less good
than when I first read it.  Forget about "strange attractors " for now
it will just complicate matters .. I still reckon it's a good approach
there is a lot of other stuff to do first. 

I have not yet seen an example of an Empirical approach that works
though I really hope I am wrong !

I guess my design concept is to use Human intervention to make the
cultural choices (and teach things like parts of speech in context).
I would see this working at two levels:
1/ a default (and upgradable set)
2/ a user specific set of preferences
(which optionally could be exported back to the mother ship > adds to

This avoids using an (allbeit elegant) algorithmic approach beyond
"simple" mapping of syntax and spell checking. We can develop add
include such algorithms as we get happy with them. If we try to scope
all that stuff about Markov and friends we will run into conflicting
idea spaces.

Which is not to say that we should avoid them altogether.

Perhaps a non specific example may help here:

You probably have installed the (ancient) tool diction
(1997 is probably not that ancient :)

man diction and have a play.
Essentially it's just a grammar checker , note how I use the word "just"
very badly here.

If you have wordnet installed try wnb (wordnet browser)
or even good old ding (which has an interface to wordnet as well)

I am not saying these are tools we can or should directly use.
They just live in a similar space.

I am install a thing called Malaga which says it does language analysis
as well. may be useful.(couldn't find a home page yet)

>If you're going to pull yourself up by your own bootstraps then you'll
>need fingers, or something like them, so the types concept is pretty 
>important and as good a place as anywhere to start making mistakes, by 
>my wreckoning (all puns strictly intended -ed.).  Methinks the
>db/engine is going to have to be *comfortably* capable of building
>itself, it's going to have to be a veritable spider to generate all the
>mappings / relations / connections / etc required.

Building itself is not such an issue.
Checking that it's building itself right is !
but Yep. :)
(and not easy but doable I think)
>Context is hard.  I've looked at it a little and i'm not even sure what
> it means when i do it.  Nails look positively flaccid in comparison.  
>Sentance by sentence (see puns -ed.) sounds a good start, hasten 
>slowly as the goat-riders quote.
Yeah some proof of concept test stuff ..
Must stay modularised must stay modularised (my new chant)

>I honestly doubt the binary mode techniques are up to it.  It might be 
>time to re-review some of the tristate logics of the Poles and the 
>"maybe" logic of the Indians.  Brace yourselves. 

Hmm perhaps not.. Though why not :)
A hashed database (like sleepy cat) could do this. (not a binary mode
serach or bubble up or anything like that). Better is a full power
records based approach like postgres. Should be well capable of handling

Postgres will look where we tell it, or it will get sent to it's room
with no tele :) Point is the DB is not so critical it's the search and
write recipes that will matter. Could do this with python dictionaries
(or any associative arrays) for example.. but man would you need a lot
of RAM.

>Authoritative would be ok for NewSpeak and that's about it.  Regarding
>the whole strange attractors, been reading that material, fascinating,
>i don't think i was ready for it when i first came accross it but it's
>really firing some neurons now.

yeah it's all cool and seems to be "common sense" . 
Feels right intuitively and all that hippy stuff.

>As you say per XML, DTDs and their ilk, (really just glorified CSVs and
>everyone was doing their own tagging back in my uni days) are going to
>be the tools of trade, but to tackle context, let me rephrase that, if
>you're going to tackle CONTEXT, something like a living DTD will be
>required, a Pooh-Bear of a DTD that can go with the flow and even tell
>you when you shouldn't push it too far.  AI is probably another area to
>be explored.

Well yeah AI is .. (gulp) we all have lisp on board ...  as long as the
thing can learn and remember ... I am fast coming to the belief that
there is no such thing as AI.. but that' another matter.

As to the DTD's:
1/ maybe it's not the best approach but it does give us a direct and
solid path to walk on
2/ we could use XML-Schemas instead (DTD's on steriods) or even both
3/ whatever way I was imagining that there would be one for each
language that 'scoped" the DEFAULT language structure.
4/ Then you could include others that are either prewritten or
dynamicaly generated to scope contexts.
5/By saving and building (somehow sensible) combinations of them you
could evolve an "Active DTD" that was case sensitive.

The DTD would define the syntax >> the syntax would describe the space
for possible translations >> the user gets offered a best fit and some
weighted options >> the options selected get learnt from (sent back to
(merged)with the contexts/word-lists) ==> save to some config

A schema "fits" better in lots of ways as DTD really try to be absolute.

The great advantage of this approach from the coding point of view id we
don't need to worry about parsers (they exist for lots of languages Java
Perl Python C already)
>I'll have to take a serious look at Wordnet, it sounds most impressive,
>as per sleepycat.  All in my copious free time.  This is not so much a
>trip to the local store, more liken to space, i could get vertigo just
>considering what i do know, let alone such things as designing objects
>that have to work out what kind of objects to design, and i don't think
>that's much of an exaggeration.  It'll take an hive of classes with
>some pretty good communications skills all of their own.  Maybe i'm
>sounding a little pessimistic. let me know how far off the track you
>think i am.  I'm still keen to see that next dia.

Don't worry about the sleepy cat thing.. It's a database engine.. that's
all. (and everytime they update it stuff breaks !)

Yeah I should do a new set.
Give me a few days to catch up absorb Lev's new stuff.
Is the approach clear to you.. diagrammatically I mean ?

I am now thinking a little differently (already) and so I guess I will
see some sort of light soon.. 

Apparently the EEC are able to automate all their translation stuff
already so I will see what I can find out about that...

>Very best of regards,
>> >http://www.cs.byuh.edu/research/nakkhongkham/bac.html
(read and got it !)
>> >Summary of research on machine translation.
>> >
>> >www.cs.nyu.edu/~yap/classes/theory/02s/lect/l3/l.pdf 
(have now but yet to read)
>> >Excellent lecture on context-free languages and translation
>> >difficulties.
>> >
>> >www.cis.ohio-state.edu/~dwang/teaching/cis730/NLP.pdf 
(have but can't remember :(
>> >
>f- > >
(downloading now)
>> >www.cs.brown.edu/people/ec/papers/aimag97.ps 
(downloading now)

>> >
>> >Classes/papers on natural language processing.
>> >
>> >nlp.cs.jhu.edu/
>~cschafer/papers/riloff-schafer-yarowsky_coling2002.pdf> > 
have but yet to read properly

>>Worth looking at as a journal of activity.> >
>> >http://www.translation.langenberg.com/
>> >
Cool great resource. (nobody does Tetun :)

>> >Multiple language translation engines.
>> >
>> >http://www.cs.colorado.edu/~martin/slp.html
>> >http://www.cs.colorado.edu/~martin/SLP/slp-toc.html
>> >http://www.cs.colorado.edu/~martin/SLP/slp-web-resources.html
>> >

This site is cool I "broke in" and got lots of goodies ... wget  don't !
I have way too much stuff from there now. 
:) The docs on HAL and the
Wumbles(sic) thing are interesting
>> >Speech and Language Processing Book. Table of contents and first
>> >chapter only (fairly complete in it's own right!).
Yep read and understand about half of it :)
The intro and first sections are very cool.

>Chapter-by-chapter> >resources are pretty useful.
>> >
>> >Companion Website to Foundations of Statistical Natural Language
>> >Processing
>> >http://nlp.stanford.edu/fsnlp/
bookmarked :)

>> Ok.. had a look around all the sites above and am downloading all of
>the> colorado site:
way did that :)
<snipped guff>
LEVS mail box is full so I am sending this to the tetum list for
archiving .. (bloody spammers I bet)


Todays fortune:
<chrchr> datazone-work: Some people dominate the world because they
can't hold down a regular job and like the flexible hours that world
domination offers.     
< http://www.gnu.org/software/tetum/ >
< http://bigbutton.com.au/~gossner >
< address@hidden >

reply via email to

[Prev in Thread] Current Thread [Next in Thread]