Definitive Python bindings for performing graph layout

tkim · March 17, 2024, 5:48pm

I’m looking for a minimal installation of graphviz and some associated python bindings that will allow me to make use of the various graphviz layout engines, but, rather than generate a rendering directly, I’d like to pass the generated x,y attributes and pass them on to a separate rendering library.

I’ve installed the python graphviz module, and can use that module’s render to create an image, but what I really want to do is extract each node’s x,y positions from some kind of internal representation. Is there a neat and tidy method out there for doing that?

deeplook · March 18, 2024, 10:44am

I’ve tried this approach: generate my graph with the Python graphviz package, and render it to “plain-ext” format like this:

mygraph = make_my_graph()
plain_source = mygraph.pipe(format="plain-ext").decode("utf-8")

Then parse plain_source with some pretty simple code of your choice, like this:

from typing import Tuple 

def parse_graphviz_plain_ext(plain_ext_str: str) -> Tuple[dict, dict]:
    """Parse Graphviz plain output into nodes and edges."""
    nodes = {}
    edges = []
    
    for line in plain_ext_str.strip().split('\n'):
        parts = line.split()
        type = parts[0]
        
        if type == 'node':
            node_id = parts[1]
            x = float(parts[2])
            y = float(parts[3])
            width = float(parts[4])
            height = float(parts[5])
            label = parts[6]
            # Additional node attributes can be added here
            nodes[node_id] = {
                'x': x, 'y': y, 'width': width, 'height': height, 'label': label
            }
            
        elif type == 'edge':
            src_id = parts[1]
            dst_id = parts[2]
            points = int(parts[3])
            edge_points = []
            for i in range(points):
                x = float(parts[4 + i*2])
                y = float(parts[5 + i*2])
                edge_points.append((x, y))
            label = parts[4 + points*2] if len(parts) > 4 + points*2 else ""
            # Additional edge attributes can be added here
            edges.append({
                'src': src_id, 'dst': dst_id, 'points': edge_points, 'label': label
            })
    
    return nodes, edges

This works in principle, but in practice I’m running into various issues when it comes to specifying precisely the output size, using a combination of graph attributes like size, dpi, ratio, etc.

mark · March 18, 2024, 11:11am

There is a json output format. That might be easiest to parse

deeplook · March 18, 2024, 1:08pm

Nice, didn’t know that. I wouldn’t say the JSON is easier to make sense of, but the interesting part to me is that the coordinates seem to be very different (likely in scale) between PLAIN and JSON output.

steveroush · March 18, 2024, 2:34pm

Try “dot” format (DOT | Graphviz)

tkim · March 18, 2024, 9:52pm

Thanks for all the useful replies!

I’m using <table> quite extensively (my primary use-case is to generate entity-relationship diagrams) and am now getting my positions out in a parsable format - not decided between plain-ext, json or dot formats - for tabular content, there’s a fair amount of variation.

In the json output there are lots of technical op codes that seem to be painting the individual cells with tables, which I’d not expected, but not sure that impacts matters.

Once again, many thanks!

(I’d post my output, but because I’m using html-style shapes, I don’t think the forum’s parsing my content 100%)

steveroush · March 18, 2024, 11:01pm

This is a devilish detail question. I suggest you produce an example file or two manually & run them through dot -Tdot, dot -Tjson, … to see what your options are.

If you just want the pos (X,Y of the node center, in points) of each node, use dot format. Trivial to parse.
If you want edges, pretty much the same answer, but somewhat more challenge to parse (see splineType | Graphviz)
if you want positions of cells in an HTML table, much more challenging or nigh impossible, xdot or json might allow success, but maybe not. (I’d have to do some research)

So:

When you say extract each node’s x,y positions, just what do you want?
Even is you can get the position(s) you want, how do you plan on tying that back to a Graphviz generated HTML table? That you are seemingly not planning on using directly? i.e. what good is a position if you do not know the size, shape, text & font info, etc?

tkim · March 18, 2024, 11:54pm

Hi Steve, yes, you’re bang on the money with your insight - for now, just getting the vanilla x,y position is enough to get me through to the next stage.

My journey to get this far has taken a few different turns. I’m putting together an rdf graph-based toolkit that ingests data describing data models of varying levels, data transformation processes, and data equivalence mappings to support a range of use-cases.

One such use-case is the simple creation of entity-relationship models from data dictionaries. I’d written a wrapper around a python module called rdf2dot that did roughly what I wanted, but in parallel, found a really useful javascript/d3-based graphing library called gravis that generates really nice interactive force-directed graph layouts.

These work really well for unstructured exploration and visualisation of raw and semi-curated rdf nodes and edges, with a fairly rich set of annotations that can be applied for styling and labelling.

But there’s very much a need for a series of more controlled, layered or hierarchical layouts, I was hoping to be able to leverage the graphviz layout engines for determining node-placement, then letting the gravis library render my scenes, while retaining some of the same styling and look-and-feel so as to present a single user-interface over web (flask server backend).

One feature of gravis I was going to misuse was the image-painting feature it has, which I was going to populate with rendered versions of the html-style tables that graphviz generates. I’ve switched between writing code to generate SVG objects based on tables and managed to generate half-decent foreignObject’s populated with HTML tables, but styling is not stable for foreignObjects in SVG, at least, not when parsed (mangled!) through the disparate set of tooling at my disposal.

I’ve not arrived at a solution for that yet, one cludge could be to get dot to render the html for each table separately to say a .png, which I can attach to the interactive gravis graph definition - whilst also getting a good measure of its width, height attributes, which should match those in the more complete rendered layout.

Hope that’s not too tedious an explanation, the short answer is, I want to mix the best of the d3-style visualisation features (interactivity/click+drag etc), with those from graphiz (hierarchical layouts and tidy/stable HTML-style table renderings)

There’s probably more besides, but this outlines the general course I’m hoping to navigate in terms of my exploration/investigation.

Some images of test-visualisations I’ve put together to give an idea of what I’m trying to do:

A dot (rdf2dot) generated visualisation of a git-style diff operation performed against two versions of the same data model:

image4326×1089 460 KB
A gravis rendered visualisation of the raw rdf underpinning the same datamodel with annotated selection

Screenshot from 2024-03-18 23-48-561614×1224 195 KB

What I want to do is mix aspects of the rendering in 1. with some of the user-interface benefits of the rendering in 2 (which you can’t see here in this static image, but which include animated click-and-drag, pan and zoom, and individual node-focus, updating the information panel below the visualisation)