Optimizing dot files by removing redundant information

I’m using dot graphs for communicating and documenting complex programming logic to non-programmers and my future self.

After a longish search I finally found a usable interactive dot editor, namely (don’t laugh, this really is the best :heart:) dotty.

The output of dotty is quite chatty, but today I found that the canon output format strips away any redundant information. :sparkles:

But dotty has the habit to create random node names instead of using the label as node name.

So even after optimization using canon I end up with something like

digraph g {
...
        n0      [label=A];
        n1      [label=B];
        n0 -> n1;
}

where I actually would prefer

digraph g {

A;
B;
A -> B;
}

Is there a tool around which can do that?

Shamless plug: the Graphviz Visual Editor. It currently cannot do what you ask since it also inserts nodes named nX where X is just a serial number, but if you have a suggestion for how an interface should work that would allow you to do what you want, I might implement it.

I can think of two ways to achieve this:

  1. Let the user select different schemes for automatically naming inserted nodes, where one could be A, B, …, Z and continue with AA, BB, … or A1, B1, C1, …
  2. Let the user enter the node name before inserting it.

Thanks for the link. This looks nice, and while playing around works pretty good as well.

My idea regarding node labelling would be to replace the node name, e.g. n1, with its label once a label different from \N is set for that node.

  1. Not all labels are valid node IDs. E.g. spaces are not allowed.
  2. The only way to add a label today is to write it in the DOT source. Are you suggesting that the editor should detect that and change the node ID accordingly?

Not all labels are valid node IDs. E.g. spaces are not allowed.

You can enclose it in double quotes

digraph g {

"Hi there!";
"Good morning";
"Hi there!" -> "Good morning";
}

The only way to add a label today is to write it in the DOT source. Are you suggesting that the editor should detect that and change the node ID accordingly?

That could be one way.

1 Like

Do your graphs contain sub-graphs? Do they contain more than 1 left-brace and 1 right-brace {}

@steveroush No just one graph. dotty had problems with subgraphs.

If there is no ready made tool, I guess I don’t need to reinvent the wheel start writing a parser for dot files, or? Any good APIs/packages (C/C++/Python/perl) I could leverage?

This GVPR program is lightly tested. It is not a general-purpose solution (it drops all subgraphs), but seemingly does what you want.
Command line:
gvpr -f simpleRename.gvpr myfile.gv
Here is simpleRename.gvpr:

/*
  Generate copy of input graph, replacing names with labels
  NOTE: DOES NOT COPY SUBGRAPHS!
  NOTE: ugly if you use record OR html nodes
*/
BEGIN {
int id = 0;
int cnt[];
string names[string];

string mapn (string inname, string lbl){
   string s;

   s = names[inname];
   if (lbl==""){
     print("// Error:: node ", inname, " has no label");
     s=inname;
   }if (cnt[lbl]>0){
     print("// Error:: duplicate label >", lbl, "<");
     s=inname;
   }else{
     if (s == "") {
       s = lbl;
       cnt[lbl]=1;
      }
   }
   names[inname] = s;
   //print("// mapn returning: ", s);
   return s;
}

string getmapn (string inname){
   string s;

   s = names[inname];
   if (s == "") {
     print("// Error:: Edge had mapping problem with: ", inname);
   }
   names[inname] = s;
   return s;
}

}

BEG_G {
  graph_t g;
  node_t aNode;
  
  g=copy (NULL, $G);
  for (aNode=fstnode($G);aNode;aNode = nxtnode(aNode)){
    node (g, mapn(aNode.name, aNode.label));
  }
}
E { edge (node (g, getmapn($.tail.name)), node (g, getmapn($.head.name)), ""); }
END_G {
  write (g);
}
1 Like

Thanks @steveroush , that is great start. I knew my dormant AWK skills are there for something.

I’m attaching a refined version of your script. It handles \N as labels for nodes and also keeps edge labels.

/* vim: set ft=awk:
 *
 * Generate copy of input graph, replacing names with labels
 * NOTE: DOES NOT COPY SUBGRAPHS!
 * NOTE: ugly if you use record OR html nodes
*/

BEGIN {
  int id = 0;
  int cnt[];
  string names[string];

  string mapn (string inname, string lbl)
  {
     string s;

     s = names[inname];

     /* print("// mapn returning: >", s, "< with inname >", inname, "< and lbl >", lbl, "<"); */

     if(!strcmp(lbl,""))
     {
       print("// Error:: node ", inname, " has no label");
       s = inname;
     }
     else
     {
       if(!strcmp(lbl,"\\N"))
       {
         lbl = inname;
       }
     }

     if(cnt[lbl] > 0)
     {
       print("// Error:: duplicate label >", lbl, "<");
       s = inname;
     }
     else
     {
       if(!strcmp(s, ""))
       {
         s = lbl;
         cnt[lbl] = 1;
       }
     }

     names[inname] = s;

     return s;
  }

  string getmapn (string inname)
  {
     string s;

     s = names[inname];
     if (s == "") {
       print("// Error:: Edge had mapping problem with: ", inname);
     }

     return s;
  }
}

BEG_G {
  graph_t g;
  node_t aNode;

  g = copy (NULL, $G);
  for (aNode = fstnode($G);aNode;aNode = nxtnode(aNode)){
    node (g, mapn(aNode.name, aNode.label));
  }
}

E {
  edge_t e = edge (node (g, getmapn($.tail.name)), node (g, getmapn($.head.name)), "");
  e.label = $.label;
}

END_G {
  write (g);
}

My test input

digraph g {
        n0      [label=A];
        n1      [label=B];
        C       [label="\N"];
        n0 -> n1 [label = "Yes"];
        n1 -> C [label = "No"];
}