[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Monotone-devel] roster handling speedups
From: |
Timothy Brownawell |
Subject: |
[Monotone-devel] roster handling speedups |
Date: |
Fri, 12 Feb 2010 22:09:20 -0600 |
User-agent: |
Mozilla-Thunderbird 2.0.0.22 (X11/20091109) |
I've been doing various things to make pulls faster, here's a brief
overview of the recent changes, and some other thoughts further down:
* the roster cache has a minimum size of 3
At least 2 cache slots are required for the cache to be at
all useful, and expanding it to 3 reduces the number of
misses by another 75% or more. Further expansions have much
smaller effects.
* node_map is now a COW type, and shared nodes are copied on write
A large portion of our time was spent copying rosters. Also,
roster_merge can take advantage of this to do a little less work.
* marking_map is now a COW type, and shared markings are also cow
Another large portion of time was spent copying marking_maps.
* make_roster_delta_t can take a changeset as a hint for non-merges
If you've only touched 5 nodes out of 10,000, making a delta
is much faster if you can just look at those 5 nodes instead
of having to scan all 10,000 for differences. This doesn't work
for merge revisions, because for example the node in
a* a*
\ /
a
will not be part of the cset because it still has the same value,
but the marking in the child will be different that the marking
in either parent.
* put_roster_for_revision() doesn't do redundant sanity checking
This would call make_roster_for_revision(), which returns a
sanity-checked roster. Then it would call calculate_ident() on that
roster, which would sanity-check it again. Now calculate_ident()
takes a flag to not do the sanity checking.
* misc speedups
- roster_t::check_sane() was doing 2 loops, now it only does one
- is_{dir,file}_t now have a flag to check, instead of calling
dynamic_cast and checking the result
- dfs_iter:advance_top() had multiple top() calls, that g++ couldn't
optimize into a single call due to being non-const. Amazingly
enough, this is called enough to actually show up on profiles.
- roster_t::print_to() is hand-coded, instead of calling basic_io
* roster cache instrumentation
The roster cache can now log hit/miss information to a file, if given
the --roster-cache-performance-log=<file> option. This was to help
figure out what size to use for the first item here.
There's also a branch net.venge.monotone.tbrownaw.pull-reduced-sanity
with a new option --sanity-check-percent=<0..100> taken by pull and
clone. This makes it run check_sane() and verify the manifest_id on only
that percent of incoming rosters, instead of all of them. I'm guessing
we don't want this in mainline.
There are some further possibilities that I'm not pursuing right now:
* There's a lot of time spent in encode_hexenc, while printing
manifests. Perhaps these could be stored instead of being always
recomputed. Maybe file_node could remember its hexified id, or maybe
doing something with id::symtab and the vocab macros and then
teaching the hexenc/hexdec functions about it would work.
* roster_t::check_sane() checks the following:
- root_dir is set (fast)
- there are no loop in the node tree and
- all nodes are attached (walking the tree takes nodes.size() steps)
- node names match what their parents call then
- file nodes have a content hash
- attr (live?, value) pairs make sense
- nodes know their ID
Which leads to the following ideas:
+ Perhaps attrs could be hidden behind get/set/clear functions that
enforce (live?, value) sanity, so it doesn't have to be checked?
+ Perhaps the node's parent ID and dir_node's child map could be
hidden similarly, so that the parent/child name matching doesn't
have to be checked?
+ print_to() needs to walk the tree, so we could make it it verify
that all nodes are attached and there aren't any loops. Is
check_sane() called in places we don't also print (which includes
anything that checks the manifest_id)?
+ print_to() prints the file hash, so it can just check that it
isn't empty at the same time
+ limit the number of places that nodes are inserted into the
ndoe_map or node::self is touched (and make it private) to one or
two functions, so that this doesn't need to be checked
Basically, rewrite things into encapsulated "obviously no bugs" form,
and then fold the remaining necessary runtime checks into code that
has to look at the stuff being checked anyhow.
* The roster cache doesn't respect the COW. If there were a good,
*fast* (don't just scan every node of every roster) way to figure out
the actual space usage, it could know to hold many more rosters than
it currently does. Maybe just a global count of the number of
existing nodes would be enough? How many non-cached rosters do we
actually generally hold while putting things into the cache (and
remember that copies of things in or going into the cache don't
count, because of the COW)? This would require changing how the
size estimator works for rosters, while not changing how the size
estimator for the delta cache works.
* rosters and nodes (and marking_maps and markings) have a cow_version
field, that increments on new/copied rosters (and marking_maps), and
help identify when a node (or marking) is not shared. Perhaps it
could be handled in such a way that when doing a merge, a
multimap<cow_version, node_id> or similar could be scanned forward
from the lower of the parent's cow_versions and identify all nodes
*and markings* that need to be touched by the merge (rather than
being copied from either parent arbitrarily). Perhaps this could also
be used in the roster_delta code. In the common case that the rosters
being merged are some of the most recent ones, this would work very
well. In the case that one is rather old, it would probably still
work well since many files are more-or-less never touched.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Monotone-devel] roster handling speedups,
Timothy Brownawell <=