koha-zebra
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Koha-zebra] A few Zebra Questions


From: Sebastian Hammer
Subject: Re: [Koha-zebra] A few Zebra Questions
Date: Wed, 04 Jan 2006 20:51:16 -0500
User-agent: Mozilla Thunderbird 1.0.7 (Macintosh/20050923)

Mike Rylander wrote:

On 1/4/06, Sebastian Hammer <address@hidden> wrote:

[big ol' snip]

Another question that immediately occurs is: _what_ speed issues?
Have you actually seen any?  Do you have any numbers?


I'd like to hear the answer to this too. But my sense is that updating a
single record in a multimillion record database does take some
significant period of a time -- much more than updating a single row in
an RDBMS, for sure. It matters if you're scaling to a major library with
multiple circulation desks.

Warning: rant follows. :)
Not much of a rant, was it? Good comments, though.  :-)

This is exactly the concern, unless I misunderstand the OP.  With a
centralized system running, say, 250+ libs with more than 1,500 circ
and reference desk clients it would be one of the primary speed
related concerns.
Yes indeed. We had this same discussion early on. At that time, I suggested that Zebra wasn't really intended to be a transaction-oriented system.

I believe the desire here is for Koha to both scale to large
installations, and also offer advanced search/filter options.  Keeping
the item status as close to the item and record identifiers obviously
increases the flexibility of searches and filters, but it imposes a
much greater maintenance cost.  So with the knowledge that it would be
slower than in an RDBMS, the question becomes "how much slower, and
where is the tipping point?".
Quite. Fortunately, it's dead easy to find the tipping point with the new Perl-Zoom API.

1) Index a suitably large database -- I'd say a few million bib records.
2) Write a little Perl loop that randomly fetches one record using a random barcode search or similar, changes it slightly (something roughly equivalent to flipping a circ status bit, etc), and updates it again.
3) Run this, and see how it goes.

You know roughly how fast someone at a checkout counter works... so that will give you a good idea of how fast this needs to work.. a few hundred to a thousand circ desks going flat out and scanning barcodes every second or two is going to generate a *lot* of transactions -- on top of the regular OPAC and admin client traffic.. Even assuming that each of your 1,500 workstations only generate one event every 30 seconds, that's still 50 events per second, and a circ station produces a lot more traffic than an average OPAC user.

Part of that depends on what the most important filter would be. IMHO, the most important status/state related item information would
be those variables that affect item visibility to the patron, so I'll
make an example of that.

If you don't want items that are LOST or MISSING, or records that only
have items in those states, to show up in the OPAC (because the
patron, by definition, cannot use them), then that can be condensed
into a "patron visibility" flag on the record.  It may be worth the
cost of updating that flag when it is calculated to have changed, and
not otherwise.  This gives you the functionality from the specific use
case above, but it limits the flexibility of the system.  Staff don't
get to search directly for items that are LOST or MISSING, just
records that wouldn't show up because the constituent items are all in
that state.

The thing to watch out for when denormalizing data to increase speed
is that you'll do it over and over again.  Using the example above,
there are probably 20 flags one could invent to solve specific issues
like that, but then you've got to check, calculate, and possibly
update the value of all of those flags on every item update.  At some
point the denormaliziation costs too much in the application layer,
and you might as well just move the raw data into the records,
updating at every change.
True.

I don't think that distinguishing between different non-available conditions would, in itself, provide a performance hit... it'd be fine to include the equivalent of a bitmask in the record -- a series of tokens denoting different conditions... up to a point. Adding a boolean of 20 different conditions onto each end-user search might start to stretch things a little bit if you also want to handle 50-100 searches per second. :-)

So, the first step is to probably to design some use cases.  If they
seem to be comprehensive and the required data are easily identified,
then tests can be done and a decision can be made as to whether any of
this is worth the update costs inside zebra, and which plan is
"better".
I think this sounds like a pretty good plan. For the purposes of testing use cases, we can probably help determine which things will impact performance and which things won't. But the bottom line is, the only way to find out is to try it. If a relatively simple test indicates that using Zebra as it is now for handling these transaction-type things, then that is important information.. the results of that test might provide us at ID with some ideas for optimizations, and it might give y'all some inspiration for functionality that is or is not realistic to support.

Right now we really have no information whatsoever.. even our own performance tests have mostly focused on *adding* records, not updating them. It is possible that doing a minor mod on a record is in fact much, much faster than I've been saying, then that changes the discussion a little bit, I think.

--Sebastian


--Sebastian

_/|_   ___________________________________________________________________
/o ) \/  Mike Taylor  <address@hidden>  http://www.miketaylor.org.uk
)_v__/\  "Press any key to continue or any other key to quit" -- Jeff
      Covey.




--
Sebastian Hammer, Index Data
address@hidden   www.indexdata.com
Ph: (603) 209-6853




_______________________________________________
Koha-zebra mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/koha-zebra



--
Mike Rylander
address@hidden
GPLS -- PINES Development
Database Developer
http://open-ils.org


_______________________________________________
Koha-zebra mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/koha-zebra


--
Sebastian Hammer, Index Data
address@hidden   www.indexdata.com
Ph: (603) 209-6853






reply via email to

[Prev in Thread] Current Thread [Next in Thread]