[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] Netsync performance improvement patch
From: |
Eric Anderson |
Subject: |
[Monotone-devel] Netsync performance improvement patch |
Date: |
Wed, 10 Aug 2005 00:29:30 -0700 |
This is the patch promised in my previous message that makes it
possible to get through performance test in a reasonable amount of
time.
Summary: The attached patch changes the recieve buffer from a string
to a string_queue. This changes an O(n^2) algorithm to an O(n)
algorithm. The practical effect on a smallish database is a 3.48x
CPU usage reduction on the pull side. On a somewhat extreme case, it
resulted in a 24.7x CPU reduction on the pull side.
This patch will apply cleanly against the 0.22 revision of monotone
with the accounting patch applied. If you don't have the accounting
patch applied, the ChangeLog patch will fail.
The ChangeLog entry (included in the patch):
2005-08-09 Eric Anderson <address@hidden>
* Changes to significantly improve network pull performance *
string_queue.hh: created to store pending data and allow for
efficient removal from the front. The string queue automatically
reduces its buffer size if it is very empty.
* hmac.{cc,hh}: Add in a version of chained_hmac::process that can
operate on a string_queue for use during read.
* netcmd.{cc,hh}: update netcmd::read to use a string_queue rather
than a string, update all the regression tests also. This required
the somewhat ugly creation of a read_string function because the
netcmd read and write functions are no longer using the same type.
* netio.hh: introduce functions for operating on a string_queue. They
are identical to the equivalent string functions except for the type
of the argument.
* netsync.cc: Use a string_queue rather than a string for storing the
input and output buffers.
Detailed discussion:
The current recieve side algorithm appends "packets" to a string as
they come in over the network. Once an entire packet has been
recieved, the packet is removed from the front of the string. If a
number of small packets arrive, and then a larger packet before the
smaller packets are removed, then the net effect will be to copy the
large packet multiple times before the smaller packet can be
processed. In the worst case this makes the amount of copies O(n^2)
where n is the amount of data put into the buffer. The problem seems
to be exacerbated the larger the amount of data that needs to be
transferred.
The solution to the problem is to maintain a string_queue that stores
the data. Instead of copying the data to remove something from the
front, a pointer is simply moved forward. This is almost the standard
implementation of a double-ended queue using an array with rotating
pointers in the array. The only difference is that because a large
amount of the monotone netsync code assumes that the recieve data is
contiguous, when data would be appended that would overflow the end of
the buffer, the remaining data is copied to the front.
In addition, to avoid maintaining a large buffer after processing a
single large file, the string queue will resize itself downward if the
amount of data occupying it is significantly less than the buffer
size.
There is one ugly part of the patch. The regression tests in
netcmd.cc using a string buffer to test writing and reading netcmd's.
To make this work, and to properly exercise the removal from a string
queue function, a function only used by the regression tests was
created which converts the string to a string_queue, uses the existing
function and then converts the string_queue back into a string. This
ugliness could be eliminated by converting the output buffer to being
a string queue instead of a list of string, offset pairs. This change
would probably have no effect on the performance, and would happen to
simplify the output code a small bit because the output code would no
longer have to track the fact that it has multiple output buffers.
The patch also notes a TODO in the netsync code where a large file
will be copied twice in a row; it was not clear what the appropriate
fix would be so the problem was just noted and left alone. The patch
also has two QUESTIONS in it where there were pieces in the existing
code that seemed either redundant or unnecessary, however I didn't
want to remove them because I may have been missing something.
The patch has no performance or memory effect on almost all of the
tests in the performance test because they are either very small (just
the 0.19 version of monotone), or a single file. The resident set
size is effectively unchanged by the patch, and the process size
decreases in a few cases.
For the case of having all the files from 0.10-0.22 in the checkout,
we have a 3.48x CPU usage reduction on the pull side, and 1.08x on the
serve side. The copies and allocations on the pull side dropped by
54x and 39x respectively.
For the case of having all test files in the repository (an extreme
test), we have a 24.7x CPU usage reduction on the pull size, and 1.04x
on the serve side. The copies and alllocations on the pull side
dropped by 192x and 189x respectively. In this case, moving a 279MB
database via netsync caused 1.1 TiB of data to be copied, and took 41
CPU minutes. Note that a 279MiB database is not completely
unreasonable. I have a 117MiB database from real usage right now, and
they only get larger.
--- Performance Before:
monotone 0.22 (base revision: 28058ae3e850229a5d8fae65415cbbf82b435377)
Maximum (MiB) Copied Malloc
*Test* Operation CPU(s) Size Resident (MiB) (MiB)
--------------- --------- ------ ------- ------- ------- -------
zero_small add files 0.02 7.29 2.75 0.3 1.0
zero_small commit 0.28 7.62 4.07 8.5 10.8
zero_small checkout 0.03 7.36 3.52 0.4 1.3
zero_small serve 0.25 9.38 4.78 5.5 10.4
zero_small pull 0.05 9.38 4.71 0.6 2.0
zero_large add files 3.91 420.42 414.09 1447.6 101.0
zero_large commit 8.79 308.04 204.42 711.7 317.0
zero_large checkout 4.45 208.66 204.50 401.1 208.2
zero_large serve 7.23 210.41 205.79 606.6 221.1
zero_large pull 6.76 310.55 205.79 602.6 310.5
random_medium add files 0.25 51.67 45.34 126.3 11.0
random_medium commit 2.79 73.93 69.79 196.8 114.9
random_medium checkout 1.94 56.67 39.18 77.5 64.6
random_medium serve 3.50 58.37 46.64 162.9 107.8
random_medium pull 2.53 85.85 80.96 229.2 150.0
halfzero_large add files 3.77 414.81 409.91 1739.4 101.0
halfzero_large commit 18.07 413.28 409.19 1302.4 660.2
halfzero_large checkout 12.35 345.79 274.10 586.2 413.2
halfzero_large serve 21.20 347.49 307.53 1091.7 583.2
halfzero_large pull 17.14 465.63 460.70 1444.6 868.9
random_large add files 3.75 414.81 409.33 1963.3 101.0
random_large commit 28.71 651.73 647.56 1894.2 1038.4
random_large checkout 20.68 480.88 341.64 771.3 616.2
random_large serve 35.50 482.55 407.45 1577.2 940.2
random_large pull 28.12 753.71 748.77 2286.6 1459.7
monotone add files 0.38 9.23 4.45 26.5 11.1
monotone commit 3.33 12.74 9.06 131.5 109.5
monotone checkout 1.57 19.60 15.68 55.8 45.3
monotone serve 3.04 20.68 15.76 122.4 99.4
monotone pull 3.41 15.18 10.32 138.8 137.0
mt_multiple add files 4.86 12.29 7.40 238.9 103.5
mt_multiple commit 29.04 39.20 35.25 893.7 678.6
mt_multiple checkout 14.84 25.71 21.79 477.1 367.8
mt_multiple serve 10.12 19.99 14.95 320.0 261.0
mt_multiple pull 88.47 53.00 37.57 30291.3 20461.8
mt_bigfiles add files 3.71 73.76 69.06 1318.2 106.8
mt_bigfiles commit 17.61 50.80 42.32 1141.2 466.4
mt_bigfiles checkout 10.28 50.20 41.49 519.6 261.2
mt_bigfiles serve 20.97 52.14 42.58 934.4 404.9
mt_bigfiles pull 17.24 56.63 47.30 1167.3 608.4
everything add files 19.94 500.49 495.10 6707.3 519.4
everything commit 104.94 658.30 654.24 6098.8 2997.4
everything checkout 68.16 497.13 392.59 2831.1 1693.8
everything serve 105.79 497.19 416.21 4905.3 2363.4
everything pull 2480.6 875.84 779.97 1168339 654884
--- Performance After:
Test CPU: Intel(R) Pentium(R) M processor 1700MHz
monotone 0.22 (base revision: 072f38da9450e2e2e406332a480c8c7a50736f8b)
Maximum (MiB) Copied Malloc
*Test* Operation CPU(s) Size Resident (MiB) (MiB)
--------------- --------- ------ ------- ------- ------- -------
zero_small add files 0.02 7.30 2.72 0.3 1.0
zero_small commit 0.27 7.63 4.04 8.3 9.8
zero_small checkout 0.03 7.37 3.49 0.4 1.3
zero_small serve 0.26 9.34 4.81 5.5 12.0
zero_small pull 0.04 9.39 4.74 0.6 2.0
zero_large add files 3.84 420.43 415.03 1447.6 101.0
zero_large commit 8.73 308.08 203.94 711.3 316.4
zero_large checkout 4.42 208.67 204.48 401.1 208.2
zero_large serve 7.28 210.42 205.82 606.5 222.6
zero_large pull 6.72 310.64 205.78 602.4 310.1
random_medium add files 0.26 51.68 43.94 126.3 11.0
random_medium commit 2.80 73.94 69.74 196.9 115.2
random_medium checkout 1.95 56.68 39.16 77.5 64.6
random_medium serve 3.47 58.42 46.66 162.4 105.4
random_medium pull 2.56 85.68 80.99 219.2 137.4
halfzero_large add files 3.77 414.82 407.79 1739.4 101.0
halfzero_large commit 18.00 413.29 409.14 1302.6 660.8
halfzero_large checkout 12.34 345.80 274.08 586.2 413.5
halfzero_large serve 21.67 347.38 307.55 1092.1 583.2
halfzero_large pull 16.94 465.37 460.75 1394.5 806.2
random_large add files 3.61 414.82 409.60 1963.3 101.0
random_large commit 28.50 651.74 647.52 1893.8 1039.7
random_large checkout 20.30 480.88 341.62 771.3 615.6
random_large serve 35.27 482.50 407.48 1577.1 942.3
random_large pull 27.93 753.48 748.80 2186.5 1335.0
monotone add files 0.38 9.24 4.42 26.5 11.1
monotone commit 3.28 12.64 9.03 131.4 108.1
monotone checkout 1.58 19.75 15.72 55.8 45.3
monotone serve 3.02 20.78 15.52 101.5 91.1
monotone pull 3.42 15.52 10.69 130.2 104.4
mt_multiple add files 4.86 12.88 7.56 238.9 103.5
mt_multiple commit 29.05 39.21 35.21 894.0 680.8
mt_multiple checkout 14.73 25.65 21.77 477.9 368.3
mt_multiple serve 9.38 19.80 14.98 319.5 254.0
mt_multiple pull 25.41 42.98 38.43 557.2 524.5
mt_bigfiles add files 3.70 73.77 69.03 1318.2 106.8
mt_bigfiles commit 17.61 50.81 42.28 1141.7 468.8
mt_bigfiles checkout 10.34 50.20 41.46 519.6 261.2
mt_bigfiles serve 21.03 52.16 42.60 934.2 405.8
mt_bigfiles pull 17.29 56.27 47.29 1134.3 566.4
everything add files 19.77 500.50 494.79 6707.3 519.4
everything commit 105.09 658.31 654.20 6098.6 2997.3
everything checkout 66.96 497.14 392.41 2832.1 1694.9
everything serve 101.61 496.60 415.63 4664.8 2237.3
everything pull 100.17 764.49 759.98 6095.8 3469.3
netsync-string-queue.patch
Description: Binary data