Re: [Aramorph-users] XML tables

aramorph-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Aramorph-users] XML tables

From:	Pierrick Brihaye
Subject:	Re: [Aramorph-users] XML tables
Date:	Thu, 11 Aug 2005 22:26:32 +0200
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.1; fr-FR; rv:1.7) Gecko/20040608

Hi,

Ahmed El-dawy a écrit :

    ... because I'm just back from holidays :-)

Welcome back.


Thanks !

    Here, you have *two* glosses :
    <gloss>and</gloss>
    <gloss>so<gloss>
But I have understanded from the code that glosses are separated by (+)not (;). See this line of code:
array = gloss.split("\\+");


Oooops ! Sorry, you're right : one gloss with 2 words here.

    ... it is not well-formed XML (because of the ">" character that must be
    escaped).
This error is because I have made this XML snippet by hand. Of coursewhen I use XML Document to write the XML file, all these specialcharacters are escaped automatically by the XML serializer.


Fine !

    Stems :

    <root materila="Ab">
    is inconsistent with :
<!ATTLIST root material CDATA #REQUIRED>

This is another error caused because of making the XML snippet by hand.


Fine. No problem...

    So, I'd prefer

    <entry root="Ab">

    (mark the "root" attributes as required).
Do you mean that all entries are marked with the root attribute?

Only in the stems dictionary (even though some prefixes may belinguistically derivated from roots).

So whatabout the hierarchy of the stems dictionary? Please give me moreinformation for this point.

See : http://www.nongnu.org/aramorph/english/dictionaries.html#Stems.The root "ktb" (";--- ktb" in the file) has *many* lemmas. However,keeping a trace of it may help in writing a root analyzer (useful forlinguists ;-).

    <!ATTLIST lemma lemma-id CDATA #REQUIRED>

    an id attribute hould be enough.

That's an easy one.


... and a good pratice ;-)

    Is your XML valid ? Given the code above, it is doubtful...

I hope it is valid.


An XML parser would complain if not.

     > I have also made an
     > XMLDictionaryHandler which parses XML tables, using digester from
     > Jakarta commons, and loads them into memory.

    What does the digester adds to a sandart XML parser ?
Digester is event based. This is faster and requires less memory whenthe XML file is passes only once. The dictStems.xml file is about 32MB!!! It would certainly make an Out of Memory Exception if it is allloaded in memory.

Eeeer... the Java *standard* SAX parser does it, doesn't it ? A SAXparser is really the thing we need here : big file, poor structure.

I will send it to you after fixing the points you have mentioned.


Great !

BTW, still as a quick answer : I think that the 3 compatibility tablesmay be merged in one single file.


Best regards,

p.b.

[Prev in Thread]

Current Thread

[Next in Thread]

[Aramorph-users] XML tables, Ahmed El-dawy, 2005/08/11
- Re: [Aramorph-users] XML tables, Pierrick Brihaye, 2005/08/11
  - Re: [Aramorph-users] XML tables, Ahmed El-dawy, 2005/08/11
    - Re: [Aramorph-users] XML tables, Pierrick Brihaye <=
    - Re: [Aramorph-users] XML tables, Ahmed El-dawy, 2005/08/15
    - Re: [Aramorph-users] XML tables, Pierrick Brihaye, 2005/08/16
    - Re: [Aramorph-users] XML tables, Ahmed El-dawy, 2005/08/17
    - Re: [Aramorph-users] XML tables, Pierrick Brihaye, 2005/08/17
    - Re: [Aramorph-users] XML tables, Ahmed El-dawy, 2005/08/17
    - Re: [Aramorph-users] XML tables, Ahmed El-dawy, 2005/08/17
    - Re: [Aramorph-users] XML tables, Pierrick Brihaye, 2005/08/18
    - Re: [Aramorph-users] XML tables, Ahmed El-dawy, 2005/08/18
    - Re: [Aramorph-users] XML tables, Ahmed El-dawy, 2005/08/19
    - Re: [Aramorph-users] XML tables, Pierrick Brihaye, 2005/08/19

Prev by Date: Re: [Aramorph-users] XML tables
Next by Date: Re: [Aramorph-users] XML tables
Previous by thread: Re: [Aramorph-users] XML tables
Next by thread: Re: [Aramorph-users] XML tables
Index(es):
- Date
- Thread