Optimizing dot files by removing redundant information

t-b · April 11, 2021, 6:40pm

I’m using dot graphs for communicating and documenting complex programming logic to non-programmers and my future self.

After a longish search I finally found a usable interactive dot editor, namely (don’t laugh, this really is the best ) dotty.

The output of dotty is quite chatty, but today I found that the canon output format strips away any redundant information.

But dotty has the habit to create random node names instead of using the label as node name.

So even after optimization using canon I end up with something like

digraph g {
...
        n0      [label=A];
        n1      [label=B];
        n0 -> n1;
}

where I actually would prefer

digraph g {

A;
B;
A -> B;
}

Is there a tool around which can do that?

magjac · April 11, 2021, 7:24pm

Shamless plug: the Graphviz Visual Editor. It currently cannot do what you ask since it also inserts nodes named nX where X is just a serial number, but if you have a suggestion for how an interface should work that would allow you to do what you want, I might implement it.

I can think of two ways to achieve this:

Let the user select different schemes for automatically naming inserted nodes, where one could be A, B, …, Z and continue with AA, BB, … or A1, B1, C1, …
Let the user enter the node name before inserting it.

t-b · April 11, 2021, 8:42pm

Thanks for the link. This looks nice, and while playing around works pretty good as well.

My idea regarding node labelling would be to replace the node name, e.g. n1, with its label once a label different from \N is set for that node.

magjac · April 11, 2021, 11:02pm

Not all labels are valid node IDs. E.g. spaces are not allowed.
The only way to add a label today is to write it in the DOT source. Are you suggesting that the editor should detect that and change the node ID accordingly?

t-b · April 12, 2021, 12:17pm

Not all labels are valid node IDs. E.g. spaces are not allowed.

You can enclose it in double quotes

digraph g {

"Hi there!";
"Good morning";
"Hi there!" -> "Good morning";
}

The only way to add a label today is to write it in the DOT source. Are you suggesting that the editor should detect that and change the node ID accordingly?

That could be one way.

steveroush · April 12, 2021, 2:33pm

Do your graphs contain sub-graphs? Do they contain more than 1 left-brace and 1 right-brace {}

t-b · April 14, 2021, 1:26pm

@steveroush No just one graph. dotty had problems with subgraphs.

t-b · April 14, 2021, 1:31pm

If there is no ready made tool, I guess I don’t need to reinvent the wheel start writing a parser for dot files, or? Any good APIs/packages (C/C++/Python/perl) I could leverage?

steveroush · April 14, 2021, 3:53pm

This GVPR program is lightly tested. It is not a general-purpose solution (it drops all subgraphs), but seemingly does what you want.
Command line:
gvpr -f simpleRename.gvpr myfile.gv
Here is simpleRename.gvpr:

/*
  Generate copy of input graph, replacing names with labels
  NOTE: DOES NOT COPY SUBGRAPHS!
  NOTE: ugly if you use record OR html nodes
*/
BEGIN {
int id = 0;
int cnt[];
string names[string];

string mapn (string inname, string lbl){
   string s;

   s = names[inname];
   if (lbl==""){
     print("// Error:: node ", inname, " has no label");
     s=inname;
   }if (cnt[lbl]>0){
     print("// Error:: duplicate label >", lbl, "<");
     s=inname;
   }else{
     if (s == "") {
       s = lbl;
       cnt[lbl]=1;
      }
   }
   names[inname] = s;
   //print("// mapn returning: ", s);
   return s;
}

string getmapn (string inname){
   string s;

   s = names[inname];
   if (s == "") {
     print("// Error:: Edge had mapping problem with: ", inname);
   }
   names[inname] = s;
   return s;
}

}

BEG_G {
  graph_t g;
  node_t aNode;
  
  g=copy (NULL, $G);
  for (aNode=fstnode($G);aNode;aNode = nxtnode(aNode)){
    node (g, mapn(aNode.name, aNode.label));
  }
}
E { edge (node (g, getmapn($.tail.name)), node (g, getmapn($.head.name)), ""); }
END_G {
  write (g);
}

t-b · June 7, 2021, 1:46pm

Thanks @steveroush , that is great start. I knew my dormant AWK skills are there for something.

I’m attaching a refined version of your script. It handles \N as labels for nodes and also keeps edge labels.

/* vim: set ft=awk:
 *
 * Generate copy of input graph, replacing names with labels
 * NOTE: DOES NOT COPY SUBGRAPHS!
 * NOTE: ugly if you use record OR html nodes
*/

BEGIN {
  int id = 0;
  int cnt[];
  string names[string];

  string mapn (string inname, string lbl)
  {
     string s;

     s = names[inname];

     /* print("// mapn returning: >", s, "< with inname >", inname, "< and lbl >", lbl, "<"); */

     if(!strcmp(lbl,""))
     {
       print("// Error:: node ", inname, " has no label");
       s = inname;
     }
     else
     {
       if(!strcmp(lbl,"\\N"))
       {
         lbl = inname;
       }
     }

     if(cnt[lbl] > 0)
     {
       print("// Error:: duplicate label >", lbl, "<");
       s = inname;
     }
     else
     {
       if(!strcmp(s, ""))
       {
         s = lbl;
         cnt[lbl] = 1;
       }
     }

     names[inname] = s;

     return s;
  }

  string getmapn (string inname)
  {
     string s;

     s = names[inname];
     if (s == "") {
       print("// Error:: Edge had mapping problem with: ", inname);
     }

     return s;
  }
}

BEG_G {
  graph_t g;
  node_t aNode;

  g = copy (NULL, $G);
  for (aNode = fstnode($G);aNode;aNode = nxtnode(aNode)){
    node (g, mapn(aNode.name, aNode.label));
  }
}

E {
  edge_t e = edge (node (g, getmapn($.tail.name)), node (g, getmapn($.head.name)), "");
  e.label = $.label;
}

END_G {
  write (g);
}

My test input

digraph g {
        n0      [label=A];
        n1      [label=B];
        C       [label="\N"];
        n0 -> n1 [label = "Yes"];
        n1 -> C [label = "No"];
}

Topic		Replies	Views
Use character from node ID as label Help	1	412	June 13, 2023
How to make node renaming by gvpr? Help	8	66	December 5, 2024
Concentrate vs. labels Help	5	69	December 15, 2024
Looking for tools outputting DOT files Help	7	189	October 20, 2024
Node label text in DOT JSON output truncated when it contains \\ Help	2	290	October 31, 2023

Optimizing dot files by removing redundant information

Related topics