koha-zebra
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Koha-zebra] Re: Import Speed


From: Sebastian Hammer
Subject: [Koha-zebra] Re: Import Speed
Date: Thu, 02 Mar 2006 16:42:04 -0500
User-agent: Mozilla Thunderbird 1.0.7 (Macintosh/20050923)

Joshua,

Done right, a first-time update of 5000 records ought to take less than a minute, so there is definitely room for improvement.

The big question in my mind is whether the network interface as it stands is suitable for bulk updates.. we might need Adam's input on that. The primary problem is not so much the XML as the fact that we are updating records one at a time, which is Ok if that's what you mean to do, but it's terrible if you mean to update things in bulk.

--Seb

Joshua Ferraro wrote:

On Thu, Mar 02, 2006 at 04:40:16PM +0000, Mike Taylor wrote:
Date: Thu, 2 Mar 2006 07:44:22 -0800
From: Joshua Ferraro <address@hidden>

There's your culprit, then.  You're spending 39751 of your 40604
seconds doing needless searches, and 853 seconds (14 minutes) doing
the actual updates.  Rip out the searches and you should get a 47-fold
speed increase.

Why are you doing the search?  So far I can see, it's just a probe to
see whether the connection is still alive.  But you don't need to do
that: just go ahead and submit the update request, you'll find out
soon enough if the connection's dead and you can re-forge it then if
necessary.
Here's what the connection manager looks like now:

       if (defined($context->{"Zconn"})) {
               $Zconn = $context->{"Zconn"};
               return $context->{"Zconn"};
       } else {
               $context->{"Zconn"} = &new_Zconn();
               return $context->{"Zconn"};
               }
So ... no search ... if one is defined it just returns it and if
it's not alive I assume the app will just crash (no fault tolerance
built into the script).

And here's the new benchmark for those 5000 records:

5000 MARC records imported in 7727.84231996536 seconds

dprofpp tmon.out                           Exporter::export_ok_tags has -1 
unstacked calls in outer
AutoLoader::AUTOLOAD has -1 unstacked calls in outer
Exporter::Heavy::heavy_export has 12 unstacked calls in outer
bytes::AUTOLOAD has -1 unstacked calls in outer
Exporter::Heavy::heavy_export_ok_tags has 1 unstacked calls in outer
POSIX::__ANON__ has 1 unstacked calls in outer
POSIX::load_imports has 1 unstacked calls in outer
Exporter::export has -12 unstacked calls in outer
utf8::AUTOLOAD has -1 unstacked calls in outer
utf8::SWASHNEW has 1 unstacked calls in outer
Storable::thaw has 1 unstacked calls in outer
bytes::length has 1 unstacked calls in outer
POSIX::AUTOLOAD has -2 unstacked calls in outer
Total Elapsed Time = 6617.861 Seconds
 User+System Time = 706.1013 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
21.4   151.3 817.46 103492   0.0001 0.0008  MARC::Charset::marc8_to_utf8
18.0   127.3 416.36 126313   0.0000 0.0000  MARC::Charset::Table::get_code
17.1   121.0 121.08 126295   0.0000 0.0000  Storable::mretrieve
10.9   77.27  0.000 126295   0.0000 0.0000  Storable::thaw
10.1   71.52 71.521 126313   0.0000 0.0000  SDBM_File::FETCH
8.42   59.48 117.80 252590   0.0000 0.0000  Class::Accessor::__ANON__
8.26   58.31 58.317 252590   0.0000 0.0000  Class::Accessor::get
7.21   50.88 467.25 126313   0.0000 0.0000  MARC::Charset::Table::lookup_by_ma
                  1                         rc8
6.15   43.39 97.718 126295   0.0000 0.0000  MARC::Charset::Code::char_value
4.87   34.35 34.354 126295   0.0000 0.0000  MARC::Charset::_process_escape
2.71   19.10 19.101 126313   0.0000 0.0000  MARC::Charset::Table::db
2.26   15.98 30.245 728288   0.0000 0.0000  MARC::Record::field
2.10   14.79 14.794 802346   0.0000 0.0000  MARC::Field::tag
1.94   13.69 857.27  25241   0.0005 0.0340  MARC::File::XML::record
1.44   10.15 11.456 714137   0.0000 0.0000  MARC::Field::subfields

So it's definitely better without the search, but there is still
the question of XML ... being able to import raw marc (which would only take a few seconds) would be really nice ...

(Mind you, 14 minutes still seems very slow for 5000 poxy records.  I
think there are bulk-update cache issues going on here as well.)


--
Sebastian Hammer, Index Data
address@hidden   www.indexdata.com
Ph: (603) 209-6853





reply via email to

[Prev in Thread] Current Thread [Next in Thread]