igraph-help
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [igraph] Graphml Import / Export, keeping boolean values and using i


From: Tamás Nepusz
Subject: Re: [igraph] Graphml Import / Export, keeping boolean values and using id as index
Date: Sun, 28 Apr 2013 21:02:20 +0200

Hi,

> By using Igraph python have imported a graphml file that I created with 
> TinkerGraph. First, in the imported graphml file nodes have boolean values, 
> however after the import they don't have them anymore: 
Can you post your GraphML file or any other GraphML file that reproduces the 
issue? I tried the following on my machine and it worked:

In [1]: g=Graph(1)
In [2]: g.vs[0]["test"] = True
In [3]: g.write_graphml("test.graphml")
In [4]: g2=load("test.graphml")
In [5]: g2.vs[0]["test"]
Out[5]:  True

> How can I import grahml files by keeping boolean values? When I created the 
> graphm file in TinkerGraph I used the id attribute as index.
> How could Igraph use the id attribute of the node as index after the import? 
> As you can see, the index is 2317 in Igraph, and not 363. 
igraph vertex indices have nothing to do with the indices of the nodes in the 
GraphML file, especially because GraphML node IDs are not guaranteed to be 
integers, so in the vast majority of cases, igraph would have to invent its own 
IDs anyway (since the C core of igraph always uses a range of integers from 0 
to |V|-1 where |V| is the number of vertices).

> <node id="n2317">
>       <data key="v_propernoun">4.94066e-324</data>
>       <data key="v_noun">4.94066e-324</data>
[...snip...]
> </node>
>     
> Again, the question, how can I use the original id attribute (363) instead of 
> n2317?
You can't (for reasons mentioned above) and I don't really see why you would 
want to keep the same ID. The IDs are just internal references within the 
GraphML file with no semantic meaning. If your IDs *do* have some semantic 
meaning, just duplicate them as a vertex attribute and use the attribute 
instead.

> Is there a way not putting v before the name of attributes and n before the 
> id?
No, there isn't, for different reasons. The v_ prefix before the attribute 
names are required because theoretically you could have a graph where both the 
edges and the vertices have an attribute with the same name. To avoid such 
conflicts, igraph prefixes the identifiers of the vertex attributes with v_ and 
the identifiers of edge attributes with e_. Note, however, that this does *not* 
mean that the attribute itself is renamed since "key" is also just an internal 
reference within the GraphML file. The *name* of the attribute corresponding to 
a specific key is given by the attr.name attribute of the corresponding <key> 
tag in the GraphML file. E.g., in the example I listed above, the "test" 
attribute is declared like this in the GraphML file:

<key id="v_test" for="node" attr.name="test" attr.type="boolean"/>

You can see that its internal ID within the GraphML file is "v_test" to avoid 
conflicts with possibly existing edge attributes with the same name, but the 
_name_ in the attr.name attribute of the <key> tag is declared as "test", and 
not "v_test".

As for the "n" prefix before the node IDs, it is also required because IDs that 
start with digits are not valid IDs in XML files (although of course many 
parsers accept them). This is not immediately obvious but it follows from the 
following W3C recommendation:

http://www.w3.org/TR/REC-xml/

See, in particular, Section 3.3.1 
(http://www.w3.org/TR/REC-xml/#sec-attribute-types) which says that "Values of 
type ID must match the Name production", and Section 2.3 
(http://www.w3.org/TR/REC-xml/#NT-Name), which says that "The first character 
of a Name must be a NameStartChar", and the BNF rule for NameStartChar is also 
defined as:

NameStartChar ::=       ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | 
[#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | 
[#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | 
[#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

So, in a nutshell, igraph uses the "n" prefix for node IDs because node IDs 
cannot start with numbers.

All the best,
Tamas


reply via email to

[Prev in Thread] Current Thread [Next in Thread]