tl;dr - Is there some tool I should be using instead of gvpr to do what gvpr does?
I am trying to use gvpr to take a very large graph and reduce it to something narrowly focused on a few nodes, but I’m finding gvpr very frustrating.
The problem:
I’m trying to debug a problem with
This part of closure-compiler creates a very large directed graph of all the property names in use in a JavaScript program. The goal is to find property names that never appear on the same object, so they can safely be renamed by the compiler to use the same name. A bug is causing this graph to be wrong such that 2 properties that do appear on the same object get renamed to the same name, so one overwrites the other.
I can modify the code just a bit to make it spit out the whole graph as a dot file.
The graph is far too large to reasonably analyze manually, so I want to use gvpr
to take this graph, find the nodes that refer to the problem property names, explore edges
in both directions to find all nodes that are on paths that lead to or away from them,
and output just those nodes and their connecting edges.
I struggled a lot to figure out exactly how $tvroot
and $tvnext
work to control traversals.
The man page is great at giving details about individual methods, but
falls down on giving me a clear model of how each traversal actually runs.
- What happens if you change $tvtype somewhere other than in a BEG_G clause?
- If I set $tvnext or $tvroot to a node that gets visited before gvpr decides it needs a new starting point, will it just pick an arbitrary, unvisited node to start from instead?
- When is the appropriate time to use $tvnext instead of just setting $tvroot to the next root you want?
My intention was to
- Do a flat traversal to grab the nodes of interest and stick them in a subgraph.
- Do a forward traversal for each of those nodes to find all nodes reachable starting from them - another subgraph.
- Do a backward traversal for each of the nodes of interest to find all nodes that can reach them - another subgraph.
- Clone all those subgraphs into $T & call the method that adds all the edges that connect them from the original graph.
But, I can’t see a really clean way to do this.
I had something sort of working that involved me having to double-check whether the next node of interest has already been visited, or discover when I visit it without actually switching to its own traversal, and then update $tvroot or $tvnext to the next node on my list.
Surely there’s some standard coding pattern to say “traverse DFS forward starting from each node in this list / subgraph”?
Still, I was sort of getting there, when gvpr started doing some just plain buggy things.
- I’m setting one of the nodes of interest to be colored red
$.color = colorString;
, but it’s actually gettingcolor=""
in the output dot file.
I’ve even put stderr prints in the script to verify that I really set it to “red” just before printing out.
In the script the same line also sets other nodes to be “green”, and they look just fine.
- The label strings in the output dot file have started including characters that aren’t there in the input graph. Garbage characters.
Is gvpr just not capable of handling my really long strings?
This started happening when I changed my strategy a bit to use subgraphs and clone() at the end instead of trying
to just add all my nodes directly to $T
as I found them.
Maybe the clone() is the problem?
So, as my tl;dr says above, am I just using the wrong tool here?
Is there some newer, better, and better-supported way to do what I’m trying to do?
Thanks,
Bradford