ifile-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Ifile-dev] performance improvements


From: Dave Marquardt
Subject: [Ifile-dev] performance improvements
Date: 28 Jan 2003 15:12:00 -0600
User-agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.1 (Cuyahoga Valley)

My .idata database has grown to 251 categories and has in the range of
16,500 words and 1.7 megabytes in it, and I've become annoyed with how
slow ifile is on my Sun SPARC Solaris system.

I've come up with some performance improvements that I submit here for
your perusal.  I did some profiling to come up with places to attack.
First, I noticed that ifile_read_db() and ifile_write_db() and their
descendants are the big hitters, not surprisingly.  ifile_write_db()
looks about as efficient as you can make it without a lot of
investment in major rewriting.

I noticed many calls to realloc() in ifile_read_db() (heavily
optimized and inlined by Sun's compiler).  I took a look, and it seems
to me that we could optimize setting up the extendable array for word
frequencies, so I created a macro that would allocate the whole array,
set all the values to 0, and then update the array's metadata.  That
got rid of most of the realloc() calls within ifile.

Secondly, I was concerned with the number of copies of data when
reading and parsing the database.  The current implementation has this
sort of path:

 disk -> stdio buffers -> automatic variable -> small malloc'd space

I replaced stdio with a large malloc'd buffer and a single read, and
stopped using the automatic variable.  Now the path is

        disk -> large malloc'd buffer -> small malloc'd space

The result of this work can be seen when comparing the logfiles of the
"stock" version in CVS and my modified version.  I show you them here:

Stock version:

ifile 1.2.1 called
Reading /export/home/davemq/.idata from disk...
Read 251 categories, 16578 words.  Time used: 1.080 sec
Reading message from standard input...
Read 1 message(s).  Time used: 0.010 sec
Trimmed 18 words due to lack of frequency
Writing /export/home/davemq/.idata to disk...
Wrote 251 folders, 16580 words.  Time used: 0.600 sec

My version:

ifile 1.2.1 called
Reading /export/home/davemq/.idata from disk...
Read 251 categories, 16548 words.  Time used: 0.770 sec
Reading message from standard input...
Read 1 message(s).  Time used: 0.000 sec
Trimmed 3 words due to lack of frequency
Writing /export/home/davemq/.idata to disk...
Wrote 251 folders, 16550 words.  Time used: 0.590 sec

When comparing words per second, I show a 40% improvement:
        Stock: 15,350 words / sec
        Mine:  21,491 words / sec
        Ratio of Mine/Stock: 1.40005

I'm attaching unified diffs from CVS.

The next thing I'd like to try is mmap() and see if that makes any
improvement.  If I get a reasonable mmap() implementation, I may look
at creating an option to write a machine-specific binary database.
Then I can just mmap() in the database, manipulate it, then close it.
That would dispense with all of the parsing.  Of course, such a
database on a little endian machine wouldn't be usable on a big endian
machine and vice versa, but it should be fast!

I guess another option would be to invent some sort of portable binary
database format, but that's pretty far down on my list right now.

Thanks!

Attachment: DIFFS
Description: DIFFS

-- 
Dave Marquardt
Round Rock, TX

reply via email to

[Prev in Thread] Current Thread [Next in Thread]