[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [igraph] Negative Values in PageRank
From: |
Tamás Nepusz |
Subject: |
Re: [igraph] Negative Values in PageRank |
Date: |
Sat, 2 Nov 2013 21:09:12 +0100 |
Hello,
> Actually at each computation, I'm interested in the value of a specific node,
> and the relativeness of this value to the other nodes (I normalize the values
> to 0-1 scale). What I wanted to understand, is whether the values, even
> though negative, represent the "Real" picture – I mean, if the relativeness
> between the nodes is satisfying.
Unfortunately I cannot say that; if ARPACK indeed failed to converge, then
there is nothing we can guarantee about the result. ARPACK is sort of a “black
box” to us as well (that’s why we cannot fix this bug easily) - basically, what
happens behind the scenes is that we invoke the eigensolver of ARPACK, which
asks us to do a matrix multiplicaiton every now and then. We are completely
unaware of what’s going on behind the scenes in ARPACK. If we are lucky, then
ARPACK returns a converged eigenvector. If we are unlucky, ARPACK failed to
converge and we are left with a partial result, but we don’t know which parts
of the vector have converged and which parts have not.
There are multiple things that you can do here:
1) First of all, whenever you invoke the pagerank() method of the graph, check
whether ARPACK has converged in the end. This can be done by retrieving the
value of arpack_options.iter, which should return the number of iterations
ARPACK has taken during the last invocation. If it is equal to or larger than
arpack_options.maxiter, then there is reason to believe that ARPACK did not
converge properly - so at least you now have a way of detecting if something
went wrong.
2) You can also try increasing arpack_options.maxiter; the default value is
3000, but we have recently found a graph for which it was too small (but ARPACK
managed to compute the converged eigenvector in ~15000 iterations).
3) We are experimenting with replacing ARPACK with the excellent PRPACK package
in the near future; PRPACK is specifically designed to compute PageRank scores
efficiently and it seems to be a lot better in doing that task than relying on
ARPACK. If you don’t mind compiling the C core of igraph on your own, you
should check out the 0.7-prpack branch of igraph from Github (see
https://github.com/igraph/igraph/tree/0.7-prpack) and see whether it works
better.
> And another question – is there any optimization for running multiple page
> rank on the same graph, where the graph is growing every time? The running
> takes a bit more longer than I expected.
I think that one can reasonably expect the PageRank scores of the “old” and
“new” graph to be close to each other if you are doing only a simple addition
or removal of nodes during the growth of the graph. If you don’t mind altering
the C code of igraph, you can tweak the igraph_pagerank function to accept an
“initial guess” for the PageRank values. ARPACK would then start calculating
the eigenvector from the initial guess instead of starting from scratch, and
this could potentially save some time since you could simply supply the old
PageRank scores as an estimate when you are calculating the PageRank scores of
the new graph. However, this cannot be done by using the Python interface only,
you have to dig deeper and alter the C core of igraph.
—
T.