[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Aramorph-users] XML tables
From: |
Pierrick Brihaye |
Subject: |
Re: [Aramorph-users] XML tables |
Date: |
Thu, 11 Aug 2005 22:26:32 +0200 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; fr-FR; rv:1.7) Gecko/20040608 |
Hi,
Ahmed El-dawy a écrit :
... because I'm just back from holidays :-)
Welcome back.
Thanks !
Here, you have *two* glosses :
<gloss>and</gloss>
<gloss>so<gloss>
But I have understanded from the code that glosses are separated by (+)
not (;). See this line of code:
array = gloss.split("\\+");
Oooops ! Sorry, you're right : one gloss with 2 words here.
... it is not well-formed XML (because of the ">" character that must be
escaped).
This error is because I have made this XML snippet by hand. Of course
when I use XML Document to write the XML file, all these special
characters are escaped automatically by the XML serializer.
Fine !
Stems :
<root materila="Ab">
is inconsistent with :
<!ATTLIST root material CDATA #REQUIRED>
>
This is another error caused because of making the XML snippet by hand.
Fine. No problem...
So, I'd prefer
<entry root="Ab">
(mark the "root" attributes as required).
Do you mean that all entries are marked with the root attribute?
Only in the stems dictionary (even though some prefixes may be
linguistically derivated from roots).
So what
about the hierarchy of the stems dictionary? Please give me more
information for this point.
See : http://www.nongnu.org/aramorph/english/dictionaries.html#Stems.
The root "ktb" (";--- ktb" in the file) has *many* lemmas. However,
keeping a trace of it may help in writing a root analyzer (useful for
linguists ;-).
<!ATTLIST lemma lemma-id CDATA #REQUIRED>
an id attribute hould be enough.
That's an easy one.
... and a good pratice ;-)
Is your XML valid ? Given the code above, it is doubtful...
I hope it is valid.
An XML parser would complain if not.
> I have also made an
> XMLDictionaryHandler which parses XML tables, using digester from
> Jakarta commons, and loads them into memory.
What does the digester adds to a sandart XML parser ?
Digester is event based. This is faster and requires less memory when
the XML file is passes only once. The dictStems.xml file is about 32
MB!!! It would certainly make an Out of Memory Exception if it is all
loaded in memory.
Eeeer... the Java *standard* SAX parser does it, doesn't it ? A SAX
parser is really the thing we need here : big file, poor structure.
I will send it to you after fixing the points you have mentioned.
Great !
BTW, still as a quick answer : I think that the 3 compatibility tables
may be merged in one single file.
Best regards,
p.b.
- [Aramorph-users] XML tables, Ahmed El-dawy, 2005/08/11
- Re: [Aramorph-users] XML tables, Pierrick Brihaye, 2005/08/11
- Re: [Aramorph-users] XML tables, Ahmed El-dawy, 2005/08/11
- Re: [Aramorph-users] XML tables,
Pierrick Brihaye <=
- Re: [Aramorph-users] XML tables, Ahmed El-dawy, 2005/08/15
- Re: [Aramorph-users] XML tables, Pierrick Brihaye, 2005/08/16
- Re: [Aramorph-users] XML tables, Ahmed El-dawy, 2005/08/17
- Re: [Aramorph-users] XML tables, Pierrick Brihaye, 2005/08/17
- Re: [Aramorph-users] XML tables, Ahmed El-dawy, 2005/08/17
- Re: [Aramorph-users] XML tables, Ahmed El-dawy, 2005/08/17
- Re: [Aramorph-users] XML tables, Pierrick Brihaye, 2005/08/18
- Re: [Aramorph-users] XML tables, Ahmed El-dawy, 2005/08/18
- Re: [Aramorph-users] XML tables, Ahmed El-dawy, 2005/08/19
- Re: [Aramorph-users] XML tables, Pierrick Brihaye, 2005/08/19