koha-zebra
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Koha-zebra] Re: Import Speed


From: Sebastian Hammer
Subject: [Koha-zebra] Re: Import Speed
Date: Fri, 03 Mar 2006 10:49:17 -0500
User-agent: Mozilla Thunderbird 1.0.7 (Macintosh/20050923)

Joshua Ferraro wrote:

On Fri, Mar 03, 2006 at 09:04:48AM +0000, Mike Taylor wrote:
Hmm.  Well, compared with the previous truly astonishing time of 40604
seconds, that's a better than fivefold improvement, which is not a bad
start.  But, still -- more than one second a record, we still have
_plenty_ of scope for improvement here.

How busy is your disk now?
It's a remote machine ... do you have suggestions for a utility that
measures disc usage on the fly?

So it's definitely better without the search, but there is still the
question of XML ... being able to import raw marc (which would only
take a few seconds) would be really nice ...
I agree with Seb that the XML is unlikely to be culprit here: the
actual indexing is the only thing I can think of that would show the
pattern you see of taking longer as the database grows.
OK ... but if you look back at that benchmark, the majority of our
time is now spent converting from marc21 to MARCXML (it seems the
most proc intensive part of this is the conversion from MARC-8 encoding to UTF-8). So even if Zebra is quite fast indexing XML,
we still have quite a bit of overhead getting the records into
XML. I suppose I should do a test where I pre-process the records
(convert from MARC to XML) and _then_ import. Whadya think?
If that really is the case, we should probably look more aggressively into enabling Zebra to import MARC directly via the network interface. I don't know what the issues are, but it must be doable.

That being said, I find it nearly incomprehensible that a mapping from MARC to MARCXML or MARC8 to UTF-8 should be as demanding as these numbers indicate.

--Seb

Cheers,


--
Sebastian Hammer, Index Data
address@hidden   www.indexdata.com
Ph: (603) 209-6853





reply via email to

[Prev in Thread] Current Thread [Next in Thread]