Integer and floating point types in DOT files

Hello,

I am currently working on the DOT format exporter in igraph, and I am trying to uderstand why a certain hack is present in the codebase.

igraph only supports generic numerical attributes, but does not distinguish between integer and floating point types. The DOT writer currently has code that attempts to write integral values as integers, i.e. write 123 instead of 1.23e2.

According to the DOT format spec, at the highest level all values in the DOT language are just strings. It is not clear at what point they are interpreted as specific types such as int or double.

Question 1: If an attribute that is specified as needing to be int is given in a floating point format, is that a problem for Graphviz? Can you please show an example of when and how it becomes a problem?

Question 2: Is it fine to unconditionally quote all floating point numerical values when writing DOT files? Graphviz cannot parse values such as 1.23e4 without quotes, so unconditional quoting would be a simple way to get around this.

Thanks in advance for any advice on this topic.

  1. As you say, at the foundational level, there are only strings. The layout programs like DOT call functions like late_int and late_double (from, say, common_init_node) to do the conversions and also handle the situation that an attribute was never defined in a graph file that was read. So, in reading the code, late_int calls strop() so that’s going to get the integer portion of a floating point number. There won’t be any rounding, if that was expected.

  2. Definitely, it is fine to unconditionally quote all attributes including numerical values.

Thanks for the links to the code, this makes it quite clear that 1e3 would not be interpreted as the integer 1000. I’ll make sure that igraph is well-behaved when writing integral values in DOT files.

I was not around at the time it was written, but I presume parsing was more expensive or humans were expected to read the output. Nowadays I doubt printing 42 instead of 42.000 makes much of a difference.

According to my reading of the source code, if one uses 42.0 in a place where an int is expected, parsing should fail, and the default value should be substituted automatically.

It looks like late_int() calls strtol(), which would parse 42 and stop at the . character. late_int() does verify that the whole string was parsed by strtol(), and if not, it seems to return a default value,

    if (p == endp || rv > INT_MAX)
        return defaultValue; /* invalid int format */

I’m actually having some trouble verifying this empirically, as I am not yet fluent with GraphViz. Could someone help with this?

(That said, we now do make sure to always print integral values as 10000000 and not 1e7 or 10000000.00 in igraph.)

Sorry, I misread the preceding chain and thought we were talking about generalized backends like e.g. SVG, not specifically the DOT backend.

I would assume none of it is well specified. That is, the DOT format spec does not precisely match what the code does and empirically what the code does is what matters. We’ve had a variant of this problem before when people have implemented third party DOT parsers based on the spec and then discovered they disagree with Graphviz’ parser.

I would suggest you take one of two approaches:

  1. Mimic exactly what Graphviz’ DOT exporter is doing.
  2. Print attributes differently based on type, as determined from the attribute documentation.

I would guess (1) is simpler as it doesn’t require you to update your exporter whenever new Graphviz attributes are added. Though admittedly this is very rare. Depending on your license and/or implementation language, you may be able to copy-paste the exporter code with minor edits.