wp-mirror-list
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Wp-mirror-list] Performance experiments


From: Jason Skomorowski
Subject: [Wp-mirror-list] Performance experiments
Date: Sat, 24 Nov 2012 15:53:37 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0

Didn't do anything like actual profiling, but get the impression wp-mirror
was far from saturating the CPU, RAM, disk or even bandwidth that I had
available. So I think there is a lot of room to speed things up. I wonder if
anyone has experimented with some of the following:



cURL:

* metalink -- cURL has support for this metalink standard, rfc5854, which
lets you specify a list of files including hashes. Might lets us skip the
validation step!

* pipelining -- does wp-mirror/wikix use HTTP 1.1 pipelining for ichunks?
<http://curl.haxx.se/dev/readme-pipelining.html>

* SPDY -- does Wikimedia Commons support SPDY? Header compression,
out-of-order replies on the same connection, etc. could help considerably.
Seems there is an intent to implement this in cURL; in the meantime, gURL:
<https://github.com/mtourne/gurl>

* less granular synchronization -- rather than rely so much on fine grained
consistency, why not force flush/sync at the end of each ichunk/xchunk to
mark them complete and in the event of the crash just restart the incomplete
chunks?

* ensure sequential writes -- for some deployments there will be plenty of
bandwidth, perhaps surpassing the random write speed of the drive. Maybe
some sort of queue for images coming in from multiple connections would be
useful to build a sequential stream? Though, perhaps caching should take
care of this if we can maximise use of it with an alternate approach to
consistency?



MySQL:

* I wonder if the doublewrite buffer is involved in some of the concurrency
slowdown you're seeing with compressed tables? It seems to have problems
with compressed pages in the built in InnoDB:
<http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-downgrading-issues-doublewrite.html>.
You mention

* Queueing with haproxy or similar -- if concurrency is a problem mainly
because of the database, perhaps it would be useful to pipeline queries
(though it's entirely likely they're generated fast enough to keep MySQL
churning). <http://flavio.tordini.org/a-more-stable-mysql-with-haproxy>

* MariaDB -- The MariaDB fork of MySQL in the wake of the Oracle acquisition
has developed some new features and apparently can be more performant.
Notably via its InnoDB variant, XtraDB.



Lots of memory (32Gb costs something like $120) -- Are there useful ways to
allocate ample memory resources beyond a large innodb_buffer_pool_size and
innodb_log_file_size?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]