igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] Error pickling (very) large graphs: OSError: [Errno 22] Inv


From: Tamas Nepusz
Subject: Re: [igraph] Error pickling (very) large graphs: OSError: [Errno 22] Invalid argument
Date: Tue, 18 Apr 2017 22:06:52 +0200

Hi,

This could be a bug in the pickle implementation (not in igraph, but in Python itself):

https://stackoverflow.com/questions/31468117/python-3-can-pickle-handle-byte-objects-larger-than-4gb
https://bugs.python.org/issue24658

The workaround is to pickle the object into a string, and then write that string in chunks less than 2^31 bytes into a file.

However, note that pickling is not a terribly efficient format -- since it needs to support serializing an arbitrary set of Python objects that may link to each other and form cycles in any conceivable configuration, it has to do a lot of extra bookkeeping so that object cycles and objects embedded within themselves do not trip up the implementation. That's why the memory usage rockets up to 35 GB during pickling. If you only have a name and an additional attribute for each vertex, you could potentially gain some speed (and cut down on the memory usage) if you brew your custom format -- for instance, you could get the edge list and the two vertex attributes, stuff them into a Python dict, and then save the dict in JSON format:

def graph_as_json(graph):
    return {
        "vertices": {
            "name": graph.vs["name"],
            "pt": graph.vs["pt"]
        },
        "edges": graph.get_edgelist()
    }

with open("output.json", "w") as fp:
    json.dump(graph_as_json(graph), fp)

You could also use gzip.open() instead of open() to compress the saved data on-the-fly. You'll also need a json_as_graph() function to perform the conversion in the opposite direction.


T.

On Tue, Apr 18, 2017 at 9:25 PM, Nick Eubank <address@hidden> wrote:
Hello all,

I'm trying to pickle a very large graph (23 million vertices, 152 million edges, two vertex attributes), but keep getting an `OSError: [Errno 22] Invalid argument` error. However, I think that's erroneous, as if I subsample the graph and save with exact same code I have no problems. Here's the traceback:


g.summary()
Out[8]: 'IGRAPH UN-- 23331862 152099394 -- \n+ attr: name (v), pt (v)'

g.write_pickle(fname='graphs/with_inferred/vz_inferred3_sms{}_voz{}.pkl'.format(sms, voz))
Traceback (most recent call last):

 File "<ipython-input-9-6b5409a79251>", line 1, in <module>
   g.write_pickle(fname='graphs/with_inferred/vz_inferred3_sms{}_voz{}.pkl'.format(sms, voz))

 File "/Users/Nick/anaconda/lib/python3.5/site-packages/igraph/__init__.py", line 1778, in write_pickle
   result=pickle.dump(self, fname, version)

OSError: [Errno 22] Invalid argument


g=g.vs[range(3331862)].subgraph()

g.write_pickle(fname='graphs/with_inferred/vz_inferred3_sms{}_voz{}.pkl'.format(sms, voz))

    [success]

The graph takes up about 10gb in memory, and the pickle command expands Python's memory footprint to about 35gb before the exception gets thrown, but I'm on a machine with 80gb ram, so that's not the constraint. 

Any suggestions as to what might be going on / is there a work around for saving? 

Thanks!

Nick

_______________________________________________
igraph-help mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/igraph-help



reply via email to

[Prev in Thread] Current Thread [Next in Thread]