On Fri, Mar 03, 2006 at 09:04:48AM +0000, Mike Taylor wrote:
Hmm. Well, compared with the previous truly astonishing time of 40604
seconds, that's a better than fivefold improvement, which is not a bad
start. But, still -- more than one second a record, we still have
_plenty_ of scope for improvement here.
How busy is your disk now?
It's a remote machine ... do you have suggestions for a utility that
measures disc usage on the fly?
So it's definitely better without the search, but there is still the
question of XML ... being able to import raw marc (which would only
take a few seconds) would be really nice ...
I agree with Seb that the XML is unlikely to be culprit here: the
actual indexing is the only thing I can think of that would show the
pattern you see of taking longer as the database grows.
OK ... but if you look back at that benchmark, the majority of our
time is now spent converting from marc21 to MARCXML (it seems the
most proc intensive part of this is the conversion from MARC-8
encoding to UTF-8). So even if Zebra is quite fast indexing XML,
we still have quite a bit of overhead getting the records into
XML. I suppose I should do a test where I pre-process the records
(convert from MARC to XML) and _then_ import. Whadya think?