[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] Patch to speed up add operation
From: |
Eric Anderson |
Subject: |
[Monotone-devel] Patch to speed up add operation |
Date: |
Wed, 24 Aug 2005 11:49:47 -0700 |
Summary: The attached patch changes the function which determines if a file
is binary from operating on a string to operating on a filename. This
avoids lua reading the entire file into memory if we can determine that
it is binary in the first few characters. The patch also creates a
function that sets the "non-binary" characters rather than having it
be completely hardcoded. A speedup of ~5-6x resulted on adds, with a
larger speedup in extreme cases of a single large binary file.
Memory usage was reduced by >10x in any case with large files in it, and
was slightly reduced for the case with lots of small text files.
No effect on other operations in either memory or CPU usage.
Changelog entry:
2005-08-21 Eric Anderson <address@hidden>
* file_io.cc, file_io.hh, lua.cc, std_hooks.lua: determine if a
file is binary by looking at it incrementally, rather than reading
it in entirely. Prepare for making it possible to control what
characters are considered "binary"
Detailed discussion:
The current method for determining if a file is binary involves
reading in the entire file into memory in lua, and then calling a C++
function to determine if that string is binary. Lua is stunningly
inefficient at reding in a large file (reading in 100MiB copies
1.4GiB).
Instead of reading the file in Lua, we instead pass the filename to
the C++ function, read the file in chunks and if any chunk is binary
we stop immediately.
I also changed the guess_binary function to use a boolean array of
"binary characters" rather than a fixed string. The array is
initialized by calling a set_char_is_binary() function. If this seems
like a good change to people, then instead of calling that function
from C++, we can call a lua hook which can setup the list of binary
characters.
Since guess_binary() took a string, I used the slightly questionable:
&string[0] trick to get a writeable pointer to the string in
file.read(&buf[0],bufsize) in monotone_guess_binary_filename_for_lua.
Does anyone know of a way to just read up to some number of bytes into
a string? If not, should guess_binaries signature be changed to
guess_binary(unsigned char *data, int datalen) and a char[bufsize] array
be used instead of a string?
Performance analysis:
Test CPU: Intel(R) Pentium(R) M processor 1700MHz
monotone 0.22 (base revision: 072f38da9450e2e2e406332a480c8c7a50736f8b)
Maximum (MiB) Copied Malloc
*Test* Operation CPU(s) Size Resident (MiB) (MiB)
---------------- --------- ------ ------- ------- -------- --------
zero_small add files 0.0 7.30 2.72 0 1
zero_large add files 3.9 420.43 413.42 1448 101
random_medium add files 0.3 51.68 46.88 126 11
random_medium_20 add files 5.0 69.20 64.43 2521 201
halfzero_large add files 3.8 414.82 408.55 1739 101
random_large add files 3.8 414.82 407.60 1963 101
monotone add files 0.4 9.24 4.42 27 11
mt_multiple add files 5.1 12.30 7.32 239 104
mt_bigfiles add files 3.7 73.77 69.03 1318 107
mixed_1 add files 1.9 125.50 118.29 676 61
mixed_4 add files 6.9 126.02 120.17 2417 210
mixed_12 add files 19.6 128.62 121.93 7012 587
everything add files 45.7 502.46 497.21 16111 1297
Maximum (MiB) Copied Malloc
*Test* Operation CPU(s) Size Resident (MiB) (MiB)
---------------- --------- ------ ------- ------- -------- --------
zero_small add files 0.0 7.48 2.77 0 1
zero_large add files 0.0 7.33 2.77 0 1
random_medium add files 0.0 7.33 2.77 0 1
random_medium_20 add files 0.0 7.33 2.78 0 1
halfzero_large add files 0.0 7.33 2.77 0 1
random_large add files 0.0 7.33 2.77 0 1
monotone add files 0.2 7.87 3.25 1 13
mt_multiple add files 3.4 11.18 6.68 22 126
mt_bigfiles add files 0.3 7.48 2.78 0 1
mixed_1 add files 0.2 7.87 3.25 1 13
mixed_4 add files 1.0 8.99 4.44 7 48
mixed_12 add files 3.0 11.02 6.43 18 119
everything add files 7.2 16.04 10.49 41 245
incremental-binary-test.patch
Description: Binary data
- [Monotone-devel] Patch to speed up add operation,
Eric Anderson <=