Print only specified node and it's dependencies from large graph

I have a tool that generates DOT-files representing package dependencies for a large project. Brainlessly generating a pdf on that graph gives a pdf with a size of 26 x 0.5 meters o_O

As a first step I would like to generate graphs showing only a specified node and all of it’s dependencies (sub-set of the main graph).

How can I achieve that?
From what I’ve found so far, gvpr seems to be a good bet, although I’m not sure where to start with gvpr (I probably just need the “keyword” to find the obvious?)

How are you generating the original dot file? Might be easiest to filter there. There is some prior art here for software dependency graph querying, for example
https://docs.bazel.build/versions/main/query-how-to.html.

The gvpr man page includes an example that “Prints the depth-first traversal of the graph, starting with the node whose name is ARGV[0], as an indented list.” You could modify this to put the results into a node-induced subgraph and print the subgraph (so the output would include any attributes on the input objects) or if the graph doesn’t have attributes, just print the result in graphviz format.

Here is slightly-modified and lightly-tested version of the “cliptree” gvpr program that comes with the Graphviz package source. It is something of a mystery to me, but it seems to work. Save this file as subGraph.gvpr.

/* Construct subgraph reachable from node ARGV[0] by forward edges */
BEG_G {
  node_t r = node($,ARGV[0]);

  $tvroot = r;
  $tvtype = TV_fwd;
  //print ("//  Staring node: ", r.name);
}
N{
  print ("// subgraph node: ", $.name);
  $tvroot=NULL;
  subnode($T,$);
}
E{
  print ("// subgraph edge: ", $.name);
  subedge($T,$);
}

It needs a dot-language source file as input. Change your generating program to produce dot output format instead of pdf.
Here is the (Linux) command line:
gvpr -f subGraph.gvpr myfile.dot > subgraph.dot

Other thoughts:

  • Wow, surprising that pdf will support such a large file
  • How big (nodes and edges) is your graph? (gc myfile.dot will give node & edge counts)
  • There are some tricks to reduce the physical size of a output graph, but maybe not enought for this graph.

Original dot-file is made by 3.-party tool. I do have source, but in my mind this must be a post-processing job. The language is Ada btw.

Thanks, I will try this as soon as I get back to the office.

The output is already dot, the pdf was generated by graphviz.

I don’t have the code right here, but I believe there are about 700 nodes (don’t know amount of edges). I think 4-500 nodes have the same rank.

Yes, making sub-graps and clusters are likely the next step.

It’s reasonable to extract a smaller graph from a larger one.

You can see the need for better interactive tools for exploring large(r) networks. If this were a tree, one would probably use one of those tree walkers in d3.js

It would not be a terribly huge leap for us to do this.

Stephen

I agree, as a strawman, I feel it should be possible to adapt the general bazel query language to general graphviz graphs, so long as there are ways to identify nodes. This is just the query language I have the most experience with.

Or maybe the graph databases have a query language. Or maybe GraphQL.

Can you provide your source Graphviz input?
Also there is a Graphviz program unflatten (https://graphviz.org/pdf/unflatten.1.pdf) that might helpfully rearrange your result.

640 nodes and 4600 edges gave the 26 x 0.5 meters pdf, but found that I hadn’t processed all…
The total added up to 1170 nodes and 11269 edges, generating a pdf at roughly 63.5 x 1.1 meters pdf.

steveroush’s gvpr program makes it possible for me to focus on dependencies for individual applications (the example I just used reduced the graph to 87 nodes and 368 edges). I will now work on collapsing related nodes to simplify the graphs further.

I have put an anonymized version of the graph at https://pastebin.com/43MRvjsq (for a week)
Length of labels are the same, but anonymized.

Thanks for all your help :smile:

35 ranks, lots of edges per rank.
On average 20 edges for every node - makes for a full graph.
dot will run faster if you add this to your commandline -Gnslimit=2 -Gnslimit1=2 -Gmaxiter=5000 -Gmclimit=.4

For anyone coming to this thread from a google search, as I did, here’s my annotated version of @steveroush’s script that might help some understand how it works (that and reading through gvpr’s man page):

// Construct subgraph reachable from node ARGV[0] by forward edges.
// This is from <https://forum.graphviz.org/t/print-only-specified-node-and-its-dependencies-from-large-graph/1011/4>

BEG_G {
  $tvroot = node($, ARGV[0]);
  // Depth-first traversal following forward edges only. Starts at $tvroot.
  $tvtype = TV_fwd;
}

N {
  // Setting the root of the next graph to NULL means "stop after we've
  // processed the current graph." This behavior is part of $tvtype.
  // We set the root to NULL multiples times -- once for every node -- though
  // setting it only once would suffice.
  $tvroot = NULL;
  // Add the current node ($) to the target subgraph ($T).
  subnode($T, $);
}

E {
  // Add the current edge ($) to the target subgraph ($T).
  subedge($T, $);
}

To give credit where credit is due, I just tweaked the cliptree script in the repository. I believe that @erg wrote these scripts as well as gvpr itself.