[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [igraph] Efficiently selecting a node in python igraph
From: |
Tamás Nepusz |
Subject: |
Re: [igraph] Efficiently selecting a node in python igraph |
Date: |
Sat, 10 May 2014 21:57:20 +0200 |
Hi,
> - find node object based on keyword attribute
Okay, this is a bit undocumented and must be improved for the next version, but
here’s the trick: the “name” vertex attribute is indexed g.vs.find() is the
right way to go but keep in mind that it has O(n) complexity since node
attributes are not indexed -- except the “name” attribute, which is indexed.
igraph will take advantage of this if you invoke g.vs.find() with a positional
argument only *and* the type of this positional argument is string. (Integers
won’t work because g.vs.find(X) will return the node with internal ID equal to
X). So, this is what I have on my machine:
In [1]: g=Graph.Barabasi(100000)
In [2]: g.vs["name"] = map(str, range(100000))
In [3]: g.vs["another_attribute"] = map(str, range(100000))
In [4]: %timeit g.vs.find(another_attribute="99999")
100 loops, best of 3: 17.8 ms per loop
In [5]: %timeit g.vs.find(name="99999")
100 loops, best of 3: 18.6 ms per loop
but:
In [6]: %timeit g.vs.find("99999")
100000 loops, best of 3: 2.13 us per loop
The first issue here is that this behaviour is undocumented; the second is that
it should work even if you specify g.vs.find(name=“99999”) since the two
operations are essentially equivalent. An additional advantage of using a
“name” attribute with type string is that you can use these names wherever
igraph expects a node or node ID, _without_ having to look up the node
explicitly using find(). However, using the “name” attribute comes at a cost
since its internal index has to be invalidated every time you modify the graph
and then rebuilt the next time you try to look up a node by name.
> - add new node to graph (add_vertex) [since this returns none I need to
> "find" the new node obj]
Here you can take advantage of the fact that the newly added node always
receives the first available integer node ID, so its ID will be g.vcount()-1
after the addition. This eliminates the need for .find() since you can simply
use g.add_edge(paper_id_node, g.vcount()-1).
> Currently this is very slow. Any tips?
As Gabor said: igraph’s data structure is optimized for static graphs, so
adding edges one by one is slower than adding them in batches. Maybe it is
faster to do something like this:
id_gen = UniqueIdGenerator()
edgelist = []
for paper_id_1, paper_id_2 in whatever_generator_generates_your_edges:
edgelist.append((id_gen[paper_id_1], id_gen[paper_id_2]))
g = Graph(edgelist)
g.vs[“name”] = id_gen.values()
--
T.