Re: [Koha-zebra] RE: [Koha-devel] Building zebradb

koha-zebra

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Koha-zebra] RE: [Koha-devel] Building zebradb

From:	Sebastian Hammer
Subject:	Re: [Koha-zebra] RE: [Koha-devel] Building zebradb
Date:	Sun, 12 Mar 2006 18:22:29 -0500
User-agent:	Mozilla Thunderbird 1.0.7 (Macintosh/20050923)

Tümer Garip wrote:

Hi again Sebastian,

You are a gem. You are absolutely right about the <collection> wrapper.
The MARC::File::XML module produces MARCXML with this wrapper. Having
removed this wrapper now I can get iso2079 out of zebra with no problem
at all.

Glad you got that sorted.

I'll play with version 1.4 during this week.

And here is something more for those using Windows platform.
I used to report that sorting does not work. Well it does work. The
problem was I have utf8 characters in my
sort.chr table. The windows notepad puts some hidden character to the
beginning of a file if it contains utf8 characters. Zebra does not like
that and gives syntax error. So I used another package to produce my
sort.chr Everything is now OK.

:-)

--Seb

I have tp get back and start using 1.4 now

Thanks
Tumer

-----Original Message-----
From: Sebastian Hammer [mailto:address@hiddenSent: Sunday, March 12, 2006 5:03 PM
To: Tümer Garip
Cc: address@hidden; Adam Dickmeiss
Subject: Re: [Koha-zebra] RE: [Koha-devel] Building zebradb


Tümer Garip wrote:
Hi,

To clear some issues with Sebastian:
Again, I'd be keen to know which Zebra version you're running?
1- I am using Zebra version 1.3.34
It might be a good idea to take 1.4 for a spin.. there's been somechanges to the ISAM system. It should otherwise do everythiung thecurrent server does, and more.
Why not ask for the records in ISO2709 from Zebra if that's what you
want to work with (when records are loaded in MARCXML, it can spit
out
either 2709 or XML)? Or is it just that you want to have them in
MARC::Record? At any rate, ISO2709 is a more compact exchange format,
so
it seems to make sense to use it.
2- Definitely correct and theoretically yes, but I keep saying thatwhen you feed zebra with MARCXML you CAN NOT get back MARC records. All
you get back is the leader and lots of blank space. I tried it withYAZ-client as well. I have reported this as a bug on indexdata list but
never got an answer.
I wouldn't be too surprised if this was caused by the <collection>wrapper.. Zebra 1.4 can be configured to look for data at a certainlevel of the DOM tree -- 1.3 assumed the root element is the rootelement of the record.
I get update times around .5-.7 seconds,
3- .4 or .5 is what we get as well. Stupidly enough write the file todisk use zebraidx and you get .04-.09 and even faster times. What magic
zebraidx use that ZOOM does not know I donno.
The magic is simply updating multiple records at the same time, in onego -- something that is not possible though ZOOM today.
Now *that* is nasty. Do you have *any* way of consistently recreating
the problem? Even if you don't, I'm sure Adam and the Zebra crew
would
like to see some stack traces of those crashes!!
4- The problem is intermittent. I'll try to provide details when itreoccurs. What it says is "Fatal error: inconsistent register" fromthen on you have to throw the zebradatabase away and rebuildeverything. You cannot fallback to the working state. Shadow system was
supposed to do this. If something is gone wrong when you commit itshould not go and mess the whole database but just refuse to do it andlet you wipe off the last update commit operation. But you can not dothat. So unless I have missed something shadow files are useless.
Well, they are useless when the problem is caused by a bug in Zebra, asseems to be the case here. Clearly, the error is not detected until it's
too late. It would be very helpful if we could discover some sequence of

updates that reliably recreated this..
but I'd see this as another good reason for taking 1.4 for a spin. As Isay, there have been notable changes made to the indexing subsystem --it may be that this is a problem that has been fixed or eliminated bysome other change.
Here is some more to think about zebra:
1- The MARCXML we feed into zebra is<collection><record></record></collection> package. When you get itback it is still like that as supposed to be. But if you feed zebrawith iso2709 marc records and ask back from zebra xml records you getback <record></record> package no <collection> wrapping around it.Although this is not a problem currently, I still do not like zebradoing that. Its MARC to XML conversion should follow standarts ratherthan create its own.
Check the standard. The <collection> wrapper is optional. For my part, I
never use the <collection> wrapper when I deal with single records. Idon't know if Zebra does the right thing with it, but it seems to workfor you otherwise..
--Seb
Thanks for your quick response
Regards,

Tumer

-----Original Message-----
From: Joshua Ferraro []
Sent: Sunday, March 12, 2006 7:06 AM
To: address@hidden; address@hidden
Subject: [Koha-devel] Building zebradb


Tumer,
Sebastian's been good enough to respond to your post (I forwarded thisto the koha-zebra list). If you get a change, could you join koha-zebra
(if you're not already on it) and follow up -- I've a feeling it couldprove to be a very productive thread.
Cheers,

Joshua

----- Forwarded message from Sebastian Hammer <address@hidden>
-----


Hi Joshua,

Thanks for this feedback, it's very interesting. Clearly some of the
issues you describe (i.e. a lack of stability around upadets) indicatesoftware problems, but there's also some interesting ideas for possible
refinements or new developments which I think would be really useful to
get into the general development plans for the software... there aremore folks at ID involved in Zebra development that I'd like to get
into
these thoughts... I dunno if a wiki or just a larger zebra-dev list is
in order, but it's something to think about.

Joshua Ferraro wrote:
----- Forwarded message from Tümer Garip <address@hidden> -----


Hi,
We have now put the zebra into production level systems. So here is
some experience to share.

Building the zebra database from single records is a veeeeery looong
process. (100K records 150k items)
Yes, that confirms my expectations. We could think about building  some
kind of buffering into for first-time updating, or else the logic has
to
be in the application, as you've seen... the situation is particularly
grim if shadow-indexing is enabled during indexing and every record iscommitted, since this causes a sync of the disks, which could take up
to
a whole second.

Also, I'm not sure which version of Zebra you're using? I've been foing
some performance-testing of Zebra for the Internet Archive, and notedquite a difference between 1.3 and 1.4 (the CVS version) which is
really
where all the development happens.
Best method we found:

1- Change zebra.cfg file to include

iso2079.recordType:grs.marcxml.collection
recordType:grs.xml.collection

2- Write (or hack export.pl) to export all the marc records as one big
chunk to the correct directory with an extension .iso2079 And systemcall "zebraidx -g iso2079 -d <dbnamehere> update records -n".
This ensures that zebra knows its reading marc records rather than xml
and builds 100K+ records in zooming speed. Your zoom module always
uses
the grs.xml filter while you can anytime update or reindex any big
chunk of the database as long as you have marc records.
Good strategy, I think.. but of course it's weird and awkward to haveto
use two different formats, especially when they both have limitations.
We really must look into handling ISO2709 from ZOOM.
Mind you, version 1.4 should be able to read multiplecollection-wrapped
MARCXML records in one file, but only (AFAIK) in conjunction witht the
new XSLT-based index rules. I *would* like to try to develop a good way
to work with bibliographic data in that framework.
3-We are still using the old API so we read the xml and use
MARC::Record->new_from_xml( $xmldata ) A note here that we did not had
to upgrade MARC::Record or MARC::Charset at all. Any marc createdwithin KOHA is UTF8 and any marc imported into KOHA (oldmarc_subfield_tables) was correctly decoded to utf8 with char_decode
of
biblio.
Why not ask for the records in ISO2709 from Zebra if that's what you
want to work with (when records are loaded in MARCXML, it can spit outeither 2709 or XML)? Or is it just that you want to have them inMARC::Record? At any rate, ISO2709 is a more compact exchange format,
so
it seems to make sense to use it.
4- We modified circ2.pm and items table to have item onloan field and
mapped it to marc holdings data. Now our opac search do not call mysql
but for the branchname.
5- Average updates per day is about 2000 (circulation+cataloger). Ican
say that the speed of the zoom search which slows down during a commit
operation is acceptable considering the speed gain we have on thesearch.
Again, I'd be keen to know which Zebra version you're running?

Beause the Internet Archive will be doing similar point-updates for
records (only a *lot* more often than 2000 times per day) I have beenlooking a lot at the update speed for these small changes that onlyaffect a single term in a record (like a circulation code).. In my test
database of 150K records, 10K average size, I get update times around.5-.7 seconds, which just seems intuitively faster than it should haveto be. I'm going to nudge the coders to see if we can possibly do thisbetter.
6- Zebra behaves very well with searches but is very tempremental with
updates. A queue of updates sometimes crashes the zebraserver. When
the
database crash we can not save anything even though we are usingshadow
files. I'll be reporting on this issue once we can isolate the
problems.
Now *that* is nasty. Do you have *any* way of consistently recreating
the problem? Even if you don't, I'm sure Adam and the Zebra crew wouldlike to see some stack traces of those crashes!!
--Sebastian
Regards,
Tumer



_______________________________________________
Koha-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/koha-devel

----- End forwarded message -----


--
Sebastian Hammer, Index Data
address@hidden   www.indexdata.com
Ph: (603) 209-6853

[Prev in Thread]

Current Thread

[Next in Thread]

[Koha-zebra] RE: [Koha-devel] Building zebradb, Tümer Garip, 2006/03/12
- Re: [Koha-zebra] RE: [Koha-devel] Building zebradb, Sebastian Hammer, 2006/03/12
  - Re: [Koha-zebra] RE: [Koha-devel] Building zebradb, Sebastian Hammer, 2006/03/12
  - RE: [Koha-zebra] RE: [Koha-devel] Building zebradb, Tümer Garip, 2006/03/12
    - Re: [Koha-zebra] RE: [Koha-devel] Building zebradb, Sebastian Hammer <=
    - Re: [Koha-zebra] RE: [Koha-devel] Building zebradb, Paul POULAIN, 2006/03/15
- [Koha-zebra] Re: [Koha-devel] Building zebradb, Paul POULAIN, 2006/03/15
  - [Koha-zebra] RE: [Koha-devel] Building zebradb, Tümer Garip, 2006/03/16

Prev by Date: RE: [Koha-zebra] RE: [Koha-devel] Building zebradb
Next by Date: [Koha-zebra] Some more benchmarks figures
Previous by thread: RE: [Koha-zebra] RE: [Koha-devel] Building zebradb
Next by thread: Re: [Koha-zebra] RE: [Koha-devel] Building zebradb
Index(es):
- Date
- Thread