lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi] [Fwd: Astonishingly, xmlwrapp has vanished; use libxml++ instead?]


From: Greg Chicares
Subject: [lmi] [Fwd: Astonishingly, xmlwrapp has vanished; use libxml++ instead?]
Date: Tue, 09 Aug 2005 17:00:06 +0000
User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)

[We were talking about replacing xmlwrapp, whose author no longer supports
it and has actually taken it off his website, so that over time it will
become difficult to find the version that lmi uses. The libxml++ library
seems like a plausible candidate. Large census files take a long time to
load, so speed is important. In that context, Vadim sent me the following
email, which I quote here with his permission.]

Sorry for a potentially stupid question but why use DOM if speed is really
important? The quality of DOM implementations may wary but I'd be surprized
if even the fastest DOM model could rival with a SAX library, especially
for big libraries. Also, according to everything I heard, the fastest XML
parser is expat, not libxml2.

GC> An original goal for xmlwrapp was to be able to use other C xml libraries
GC> than libxml2, but I thought Peter Jones abandoned that goal after he'd
GC> discussed it on his xmlwrapp mailing list and no one seemed interested.

 There is a project called Arabice which seems quite interesting from this
point of view: http://www.jezuk.co.uk/cgi-bin/view/arabica. It builds both
SAX and DOM APIs on top of either of expat, libxml, xerces or, under
Windows only, MSXML parser (which is, BTW, known to be quite fast).

GC> Years ago, I think I looked into the available C libraries and formed
GC> an impression that nothing would be much faster than libxml2, though
GC> now I can't say how good my analysis was.

 There are some benchmarks at http://xmlbench.sourceforge.net/index.php
which seem to support my claim about expat above. I.e. expat is the fastest
one, then libxml and, farther behind, xerces.

GC> But I doubt that a C++ wrapper has much effect on speed.

 If we're to believe these benchmarking results (e.g. see
http://xmlbench.sourceforge.net/results/benchmark200402/index.html) it can:
expat is the fastest parser on its own but expat+arabica is by far the
slowest one. I'm quite surprized about this but I didn't want to spend time
on rerunning the benchmarks myself unless you're really interested in this.

[...]

 Here is the result of this. I've looked at the 3 "classic" XML parsers:
expat (and C++ wrapper for it), libxml and xerces as well as another one
having good reputation and the already mentioned Arabica which builds on
top of the 3 others. I've also included xmlwrapp for reference. Here are
the details of all these projects if you want to check something by
yourself:

expat       http://expat.sourceforge.net/
expatpp     http://www.oofile.com.au/xml/expatpp.html
libxml      http://www.xmlsoft.org/
libxml++    http://libxmlplusplus.sourceforge.net/
xerces      http://xml.apache.org/xerces-c/index.html
TinyXML     http://www.grinninglizard.com/tinyxmldocs/index.html
Arabica     http://www.jezuk.co.uk/cgi-bin/view/arabica


 Please use fixed width font and 4-space tabs to view the tables below:

[GWC replaced tabs with spaces when quoting this email]

Table 1: overview

Parser        Popularity    Debian    Used by    Last release    Activity
----------------------------------------------------------------------------
expat        very high    Yes        Python,    2005-01-28        moderate
                                     Perl,
                                     Mozilla
expatpp      very low                           2003-07-26        very low
libxml       high        Yes         GNOME      2005-07-10        high
libxml++     average     Yes                    2005-02-13        low
xerces       high        Yes         Apache     2004-09-29        low
TinyXML      low                                2004-05-20        low
Arabica      very low                           2004-02-26        high
xmlwrapp     low                                2004-03-19        frozen


Table 2: technical comparison

Parser      Lang    Performance  Size    Portability  Features
----------------------------------------------------------------------------
expat       C            best    tiny    excellent    basic
expatpp     C++                          good [3]     basic
libxml      C            good    avg[2]  good         kitchen sink included
libxml++    C++          ????    ????    good [3]     same as above
xerces      "C+"[1]      good    big     excellent    extensive
TinyXML     C++          good    tiny    good         very basic
Arabica     C++          poor    avg     good [3]     extensive
xmlwrapp    C++          ????    avg     good         basic

Notes:

[1] "C+" means that they use so-called portable subset of C++, i.e. no
    exceptions, no templates, no STL -- more like "C with classes" than
    modern C++
[2] libxml is not big under Unix but it builds into 1.5MB DLL under Windows
    by default (the size can be reduced by almost half by omitting unneeded
    options)
[3] untested, according to web site information only

 My recommendations: if the size is important at all (e.g. any chance of
porting lmi to PDA-class devices (not joking, I seriously consider such
possibility)) or if the speed is *really* important _and_ if only basic XML
parsing is required (and not anything more like XPath, XSchema and
validatio, XPointer, XInclude &c), then expat would be the best choice.
It's robust, used in many many high profile projects, very fast and
extremely small. Unfortunately there is no decent C++ wrapper for it as
expatpp to be abandoned and Arabica appears to be done very poorly
performance-wise. So if you want to use it we'd need to write our own C++
wrapper, just as we did in wxWidgets for XRC.

 Otherwise and by default, the best choice is libxml2: it's very fast,
widely used, supports just about everything and is very actively developed.
It's not clear how much all this applies to libxml++ but, at the very
least, it seems to be still developed and even though its popularity is
much lower than that of libxml itself (which is itself lower than expat)
it's still a successful project.

 None of the other projects has any noticeable advantages. Certainly xerces
seems like a solid library and, being used by Apache, it can be supposed to
be well-engineered but it's really not a "real" C++ library (of course, the
same could be said about wxWidgets but then we luckily have fewer
concurrents ;-). TinyXML doesn't seem to much tinier than expat and I don't
see why would we use it. As for Arabica, it is the example of what I'd have
done myself as it seems the most elegant solution from the engineering
point of view (separate XML parsing itself from SAX/DOM API which can be
implemented on top of it) and, indeed, there is a possibility that one day
we do something like this in wxWidgets where we'd definitely support
multiple backends in plugins. But I don't think you're really interested in
being able to switch between XML parsing backends and so it hardly presents
enough advantages to offset the risk or relying on yet another not very
well deployed library.

 So after spending 2 hours on exploring all the alternatives I can only
come up with 2 proposals: (a) use expat with our own C++ wrapper outsde it
or (b) do what you initially proposed and just go with libxml++.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]