aramorph-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Aramorph-users] XML tables


From: Pierrick Brihaye
Subject: Re: [Aramorph-users] XML tables
Date: Thu, 11 Aug 2005 11:51:51 +0200
User-agent: Mozilla/5.0 (Windows; U; Win98; fr-FR; rv:1.7.8) Gecko/20050511

Hi,

Ahmed El-dawy wrote:

  How are you? Hope you are fine. I don't know why you are not
responding to any mails

... because I'm just back from holidays :-)

> and I hope you are allright.

I am all right, thank you.

So... a few (quick) notes about what you've sent :

Prefixes :

<entry>
  <unvocalized>f</unvocalized>
  <vocalized>fa</vocalized>
  <morphological-category>Pref-Wa</morphological-category>
  <glosses>
   <gloss>and;so</gloss>
  </glosses>
  <grammatical-categories>
   <grammatical-category>fa/CONJ</grammatical-category>
  </grammatical-categories>
 </entry>

Here, you have *two* glosses :
<gloss>and</gloss>
<gloss>so<gloss>

Regarding :
  <grammatical-categories>
   <grammatical-category>>bi/PREP</grammatical-category>
  </grammatical-categories>

... it is not well-formed XML (because of the ">" character that must be escaped).

Stems :

<root materila="Ab">
is inconsistent with :
<!ATTLIST root material CDATA #REQUIRED>

Furthermore, I sugget you to be consistent with what is in the prefix dictionary (I mean, share the document types as much as possible) :

So, I'd prefer

<entry root="Ab">

(mark the "root" attributes as required).

<!ATTLIST lemma lemma-id CDATA #REQUIRED>

an id attribute hould be enough.

Well, that all for my *quick* answer, but remember that the docitionary format may be complex (see : http://www.nongnu.org/aramorph/english/dictionaries.html).

  Till now I have transformed all dictionaries and tables to XML and
also translated them to Arabic.

Is your XML valid ? Given the code above, it is doubtful...

I have also made an
XMLDictionaryHandler which parses XML tables, using digester from
Jakarta commons, and loads them into memory.

What does the digester adds to a sandart XML parser ?

  There's a small problem is that I have changed some code in the
Solution and other classes which romanize words. Now we don't have to
romanize words because dictionaries are Arabic.

Yes :-) That was in the TODO list.

 The problem is that
InMemoryDictionaryHandler will not work unless it romanizes the input
text before searching dictionaries.
  If you need the new changes please let me know, and I will send them ASAP.

You patch will be welcomed.

Cheers,

--
Pierrick Brihaye, informaticien
Service régional de l'Inventaire
DRAC Bretagne
mailto:address@hidden
+33 (0)2 99 29 67 78




reply via email to

[Prev in Thread] Current Thread [Next in Thread]