Filtering a dot file to only nodes and edges of a subgraph?

I have a dot file with ~400+ nodes and ~1200 edges. Some of the nodes are grouped into clusters with edges within and spanning clusters. Because of the nature of the data, the generated graph is too complex to be meaningful as is.

I’m looking for a repeatable mechanism to filter the data set to the nodes in a cluster, edges to / from these nodes, and the nodes that are connected thru these edges to the cluster. I can probably do this with hours of manual editing of the original file or by coding myself a small tool but I’m curious whether this is an already solved problem.

Any pointers and examples would be greatly appreciated.

I apologize if the question is inappropriate for this forum as it is about filtering a dot file rather than graphviz itself.

  • Totally appropriate
  • Specifically already solved? I don’t think so. I think this is a new request.
  • Are there any nested clusters?
  • Can you share your starting input file, either complete or a subset?
  • My understanding of the algorithm:
    • for each cluster, do
      • create a new Root graph (and a new file)
      • include in this new graph
        • the cluster structure
        • every node within the cluster
        • every edge that originates and/or terminates at one of the above nodes (i.e. this will also add nodes from outside the cluster)

IIRC we wanted to make this convenient in gvpr but I can’t remember the details now. The -i option emits a node-induced subgraph, which seems relevant.

Also ccomps -C is relevant.

Emden Gansner might be able to help answer this question.

Here is the result of a barely tested solution that uses layers (FAQ | Graphviz). If this looks useful, I’ll provide the code that adds layers attributes to any (?) graph, based on top-level clusters.
The test graph:

And the 4 generated graphs (one per top-level cluster & one for the Root):

I was not aware of the Layers feature. Seems very applicable and promising. Thank you for the pointer!!!

I’ll play with it a bit as it might be a solution on its own, although it’ll require some editing of the input data which I can do. I’ll update once I’ve tried it.

Two questions:

  1. Is there a way to force a redraw or repositioning of the nodes once the layer has been applied? I realize it’s probably not in the intent of layers but given a very large and densely populated original graph, a layer ends up as a very large canvas sparsely populated. A redraw would collapse the new graph into something much tighter.

  2. I’m unclear what dictates whether a cluster appears or not. Their rectangles seem to appear if their nodes are not labeled, even if the layer filters out those nodes, but don’t appear if the nodes are labeled yet also filtered out from the current layer.a

#2 seems like something I could workaround by labeled every node appropriately.

#1 more problematic.

After many attempts to output either a better laid out single layer or a .dot file with only one layer and no positioning (which I would then reprocess thru dot), I concluded that the layer filtering logic must be implemented after the graph layout engine and likely right before the handoff to the SVG/PS renderers.

Since the layout issue was nagging at me, I eventually dove into GVPR. I’d never used it or anything similar and couldn’t find many examples to help get started but I did find one on another site that I then hacked and cleaned up into a solution for my problem.

I’m posting it here in case it can be useful to others in the future. You would save it into a program file called FindClusterAndNeighborNodes.gvpr and invoke it as:

gvpr -f FindClusterAndNeighborNodes.gvpr -o outputfile.dot -a <name_of_target_cluster> inputfile.dot

This program will:

  1. look for a subgraph with your parameter name
  2. find all subgraphs
  3. copy all nodes in your subgraph and the subgraph itself
  4. for every edge that starts or ends in your subgraph, copy the other node, the nodes parent subgraph if there is one, and create the edge.

There may be simpler ways to do this in gvpr.

BEG_G
{
    // temp variables
    graph_t graphTemp;
    edge_t edgeTemp;
    node_t nodeTemp;
    int i;

    // variables with purpose
    graph_t origCluster; // target cluster in original graph
    graph_t targetCluster; // target cluster once created in target graph
    graph_t subgraphs[int];

    // Initialization
    targetCluster = NULL;
    origCluster = NULL;

    graphTemp = fstsubg($G);
    while (graphTemp != NULL)
    {
        if (graphTemp.name == ARGV[0])
        {
            // We'll refer to this in the rest of the program
            origCluster = graphTemp;
            break;
        }
        graphTemp = nxtsubg(graphTemp);
    }

    graphTemp = fstsubg($);
    i = 0;
    while (graphTemp != NULL)
    {
        if (substr(graphTemp.name, 0, 1) != "%")
        {
            subgraphs[i++] = graphTemp;
        }
        graphTemp = nxtsubg(graphTemp);
    }    
}
N[(origCluster != NULL) && isSubnode(origCluster,$)]
{
    if (targetCluster == NULL)
    {
        // Create a new subgraph for the target cluster 
        // into the target graph 
        targetCluster = subg($T, origCluster.name);
        copyA(origCluster, targetCluster);
    }

    // Copy the node to the target graph and in the same target cluster
    nodeTemp = node($T, $.name);
    copyA($, nodeTemp);
    subnode(targetCluster, nodeTemp);
}
E[isSubnode(origCluster, $.tail) != 0]
{
    node_t tailNode = node($T, $.tail.name);

    // Copy the head node to the target graph and 
    // in the same target cluster
    nodeTemp = node($T, $.head.name);
    copyA($.head, nodeTemp);

    for (i = 0; i < #subgraphs; ++i)
    {
        if (isSubnode(subgraphs[i], $.head))
        {
            graphTemp = subg($T, subgraphs[i].name);
            copyA(subgraphs[i], graphTemp);

            subnode(graphTemp, nodeTemp);
            break;
        }
    }

    edgeTemp = edge_sg($T, tailNode, nodeTemp, $.name);
    copyA($, edgeTemp);
}
E[isSubnode(origCluster, $.head) != 0]
{
    node_t headNode = node($T, $.head.name);

    // Copy the tail node to the target graph and 
    // in the same target cluster
    nodeTemp = node($T, $.tail.name);
    copyA($.tail, nodeTemp);

    for (i = 0; i < #subgraphs; ++i)
    {
        if (isSubnode(subgraphs[i], $.tail))
        {
            graphTemp = subg($T, subgraphs[i].name);
            copyA(subgraphs[i], graphTemp);

            subnode(graphTemp, nodeTemp);
            break;
        }
    }

    edgeTemp = edge_sg($T, nodeTemp, headNode, $.name);
    copyA($, edgeTemp);
}