Re: [Koha-zebra] A few Zebra Questions

koha-zebra

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Koha-zebra] A few Zebra Questions

From:	Sebastian Hammer
Subject:	Re: [Koha-zebra] A few Zebra Questions
Date:	Wed, 04 Jan 2006 20:51:16 -0500
User-agent:	Mozilla Thunderbird 1.0.7 (Macintosh/20050923)

Mike Rylander wrote:

On 1/4/06, Sebastian Hammer <address@hidden> wrote:

[big ol' snip]

Another question that immediately occurs is: _what_ speed issues?
Have you actually seen any?  Do you have any numbers?

I'd like to hear the answer to this too. But my sense is that updating a
single record in a multimillion record database does take some
significant period of a time -- much more than updating a single row in
an RDBMS, for sure. It matters if you're scaling to a major library with
multiple circulation desks.


Warning: rant follows. :)

Not much of a rant, was it? Good comments, though.  :-)

This is exactly the concern, unless I misunderstand the OP.  With a
centralized system running, say, 250+ libs with more than 1,500 circ
and reference desk clients it would be one of the primary speed
related concerns.

Yes indeed. We had this same discussion early on. At that time, Isuggested that Zebra wasn't really intended to be a transaction-orientedsystem.

I believe the desire here is for Koha to both scale to large
installations, and also offer advanced search/filter options.  Keeping
the item status as close to the item and record identifiers obviously
increases the flexibility of searches and filters, but it imposes a
much greater maintenance cost.  So with the knowledge that it would be
slower than in an RDBMS, the question becomes "how much slower, and
where is the tipping point?".

Quite. Fortunately, it's dead easy to find the tipping point with thenew Perl-Zoom API.


1) Index a suitably large database -- I'd say a few million bib records.

2) Write a little Perl loop that randomly fetches one record using arandom barcode search or similar, changes it slightly (somethingroughly equivalent to flipping a circ status bit, etc), and updates itagain.

3) Run this, and see how it goes.

You know roughly how fast someone at a checkout counter works... so thatwill give you a good idea of how fast this needs to work.. a few hundredto a thousand circ desks going flat out and scanning barcodes everysecond or two is going to generate a *lot* of transactions -- on top ofthe regular OPAC and admin client traffic.. Even assuming that each ofyour 1,500 workstations only generate one event every 30 seconds, that'sstill 50 events per second, and a circ station produces a lot moretraffic than an average OPAC user.

Part of that depends on what the most important filter would be.IMHO, the most important status/state related item information would

be those variables that affect item visibility to the patron, so I'll
make an example of that.

If you don't want items that are LOST or MISSING, or records that only
have items in those states, to show up in the OPAC (because the
patron, by definition, cannot use them), then that can be condensed
into a "patron visibility" flag on the record.  It may be worth the
cost of updating that flag when it is calculated to have changed, and
not otherwise.  This gives you the functionality from the specific use
case above, but it limits the flexibility of the system.  Staff don't
get to search directly for items that are LOST or MISSING, just
records that wouldn't show up because the constituent items are all in
that state.

The thing to watch out for when denormalizing data to increase speed
is that you'll do it over and over again.  Using the example above,
there are probably 20 flags one could invent to solve specific issues
like that, but then you've got to check, calculate, and possibly
update the value of all of those flags on every item update.  At some
point the denormaliziation costs too much in the application layer,
and you might as well just move the raw data into the records,
updating at every change.

True.

I don't think that distinguishing between different non-availableconditions would, in itself, provide a performance hit... it'd be fineto include the equivalent of a bitmask in the record -- a series oftokens denoting different conditions... up to a point. Adding aboolean of 20 different conditions onto each end-user search might startto stretch things a little bit if you also want to handle 50-100searches per second. :-)

So, the first step is to probably to design some use cases.  If they
seem to be comprehensive and the required data are easily identified,
then tests can be done and a decision can be made as to whether any of
this is worth the update costs inside zebra, and which plan is
"better".

I think this sounds like a pretty good plan. For the purposes of testinguse cases, we can probably help determine which things will impactperformance and which things won't. But the bottom line is, the only wayto find out is to try it. If a relatively simple test indicates thatusing Zebra as it is now for handling these transaction-type things,then that is important information.. the results of that test mightprovide us at ID with some ideas for optimizations, and it might givey'all some inspiration for functionality that is or is not realistic tosupport.

Right now we really have no information whatsoever.. even our ownperformance tests have mostly focused on *adding* records, not updatingthem. It is possible that doing a minor mod on a record is in fact much,much faster than I've been saying, then that changes the discussion alittle bit, I think.


--Sebastian

--Sebastian

_/|_   ___________________________________________________________________
/o ) \/  Mike Taylor  <address@hidden>  http://www.miketaylor.org.uk
)_v__/\  "Press any key to continue or any other key to quit" -- Jeff
      Covey.

--
Sebastian Hammer, Index Data
address@hidden   www.indexdata.com
Ph: (603) 209-6853




_______________________________________________
Koha-zebra mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/koha-zebra



--
Mike Rylander
address@hidden
GPLS -- PINES Development
Database Developer
http://open-ils.org


_______________________________________________
Koha-zebra mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/koha-zebra


--
Sebastian Hammer, Index Data
address@hidden   www.indexdata.com
Ph: (603) 209-6853

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Koha-zebra] A few Zebra Questions, Sebastian Hammer, 2006/01/01
- Re: [Koha-zebra] A few Zebra Questions, Mike Taylor, 2006/01/04
  - Re: [Koha-zebra] A few Zebra Questions, Sebastian Hammer, 2006/01/04
    - Re: [Koha-zebra] A few Zebra Questions, Mike Rylander, 2006/01/04
    - Re: [Koha-zebra] A few Zebra Questions, Sebastian Hammer <=
    - Re: [Koha-zebra] A few Zebra Questions, Paul POULAIN, 2006/01/05
    - Re: [Koha-zebra] A few Zebra Questions, Mike Taylor, 2006/01/05

Prev by Date: Re: [Koha-zebra] A few Zebra Questions
Next by Date: Re: [Koha-zebra] zebra extended services problem
Previous by thread: Re: [Koha-zebra] A few Zebra Questions
Next by thread: Re: [Koha-zebra] A few Zebra Questions
Index(es):
- Date
- Thread