Removing Unnecessary Whitespace from Dot Graph

Hi there,

I am attempting to graph the following:

digraph ERD {
graph [ rankdir = "LR" ];
ranksep=1;
"DEPARTMENT" [ label="<DEPARTMENT> DEPARTMENT|<PK_DEPARTMENT>deptcode \l | <F_DEPARTMENT> self* \ldeptcode \ldeptname \l " shape = "record", style = "rounded" ];
"COURSE" [ label="<COURSE> COURSE|<PK_COURSE>cnum \l | <F_COURSE> self* \lcnum \lcname \ldepartment* \l " shape = "record", style = "rounded" ];
"PROFESSOR" [ label="<PROFESSOR> PROFESSOR|<PK_PROFESSOR>pnum \l | <F_PROFESSOR> self* \lpnum \lpname \loffice \ldepartment \l " shape = "record", style = "rounded" ];
"CLASS" [ label="<CLASS> CLASS|<PK_CLASS>term \l | <F_CLASS> self* \lcourse* \lterm \lsection \lprofessor* \l " shape = "record", style = "rounded" ];
"ENROLLMENT" [ label="<ENROLLMENT> ENROLLMENT|<PK_ENROLLMENT> | <F_ENROLLMENT> self* \lstudent* \lclass* \l " shape = "record", style = "rounded" ];
"SCHEDULE" [ label="<SCHEDULE> SCHEDULE|<PK_SCHEDULE>time \l | <F_SCHEDULE> self* \lclass* \lday \ltime \lroom \l " shape = "record", style = "rounded" ];
"MARK" [ label="<MARK> MARK|<PK_MARK>grade \l | <F_MARK> self* \lenrollment* \lgrade \l " shape = "record", style = "rounded" ];
"STUDENT" [ label="<STUDENT> STUDENT|<PK_STUDENT>snum \l | <F_STUDENT> self* \lsnum \lsname \lyear \l " shape = "record", style = "rounded" ];

"COURSE":"F_COURSE"->"DEPARTMENT":"PK_DEPARTMENT" [arrowhead = normal] [label="generic label"];
"PROFESSOR":"F_PROFESSOR"->"DEPARTMENT":"PK_DEPARTMENT" [arrowhead = normal] [label="generic label"];
"CLASS":"F_CLASS"->"COURSE":"PK_COURSE" [arrowhead = normal] [label="generic label"];
"ENROLLMENT":"F_ENROLLMENT"->"CLASS":"PK_CLASS" [arrowhead = normal] [label="generic label"];
"SCHEDULE":"F_SCHEDULE"->"CLASS":"PK_CLASS" [arrowhead = normal] [label="generic label"];
"MARK":"F_MARK"->"ENROLLMENT":"PK_ENROLLMENT" [arrowhead = normal] [label="generic label"];
}

However, the dot engine places the nodes such that there is an excessive amount of whitespace in the graph. It ends up looking like this:

For example, the PROFESSOR table could have easily been placed above the COURSE table to save on space. A similar optimization could have been made with the SCHEDULE table. Additionally, the DEPARTMENT table could have been placed on the left of COURSE and PROFESSOR, rather than the right.

From reading about this on the internet, I came across possibly setting the rankdir. However, I need it to be LR because otherwise the nodes themselves are oriented sideways.

Is there any way to allow the arrows to go both directions (not just from left to right, or vice versa?) to save on space? Or some other attributes I could specify to minimize the amount of unnecessary whitespace my graph has? I don’t want to make the nodes, font, or arrow lengths smaller though.

Thank you!

Here is a more compact graph (see below).
There are ways to avoid using rankdir, but it seems appropriate here.
Yes, there are ways to have arrows going forward and/or backward (see dir | Graphviz), but it probably would not make for a more compact graph.
I used rank=same (rank | Graphviz) & invisible edges (style=invis) (style | Graphviz) to snug things up:

digraph ERD {
graph [ rankdir = "LR" ];
ranksep=1;
"DEPARTMENT" [ label="<DEPARTMENT> DEPARTMENT|<PK_DEPARTMENT>deptcode \l | <F_DEPARTMENT> self* \ldeptcode \ldeptname \l " shape = "record", style = "rounded" ];
"COURSE" [ label="<COURSE> COURSE|<PK_COURSE>cnum \l | <F_COURSE> self* \lcnum \lcname \ldepartment* \l " shape = "record", style = "rounded" ];
"PROFESSOR" [ label="<PROFESSOR> PROFESSOR|<PK_PROFESSOR>pnum \l | <F_PROFESSOR> self* \lpnum \lpname \loffice \ldepartment \l " shape = "record", style = "rounded" ];
"CLASS" [ label="<CLASS> CLASS|<PK_CLASS>term \l | <F_CLASS> self* \lcourse* \lterm \lsection \lprofessor* \l " shape = "record", style = "rounded" ];
"ENROLLMENT" [ label="<ENROLLMENT> ENROLLMENT|<PK_ENROLLMENT> | <F_ENROLLMENT> self* \lstudent* \lclass* \l " shape = "record", style = "rounded" ];
"SCHEDULE" [ label="<SCHEDULE> SCHEDULE|<PK_SCHEDULE>time \l | <F_SCHEDULE> self* \lclass* \lday \ltime \lroom \l " shape = "record", style = "rounded" ];
"MARK" [ label="<MARK> MARK|<PK_MARK>grade \l | <F_MARK> self* \lenrollment* \lgrade \l " shape = "record", style = "rounded" ];
"STUDENT" [ label="<STUDENT> STUDENT|<PK_STUDENT>snum \l | <F_STUDENT> self* \lsnum \lsname \lyear \l " shape = "record", style = "rounded" ];

"COURSE":"F_COURSE"->"DEPARTMENT":"PK_DEPARTMENT" [arrowhead = normal] [label="generic label"];
"PROFESSOR":"F_PROFESSOR"->"DEPARTMENT":"PK_DEPARTMENT" [arrowhead = normal] [label="generic label"];
"CLASS":"F_CLASS"->"COURSE":"PK_COURSE" [arrowhead = normal] [label="generic label"];
"ENROLLMENT":"F_ENROLLMENT"->"CLASS":"PK_CLASS" [arrowhead = normal] [label="generic label"];
"SCHEDULE":"F_SCHEDULE"->"CLASS":"PK_CLASS" [arrowhead = normal] [label="generic label"];
"MARK":"F_MARK"->"ENROLLMENT":"PK_ENROLLMENT" [arrowhead = normal] [label="generic label"];

// make sure nodes line up vertically (not fully necessary, but it works)
{rank=same STUDENT MARK}

{rank=same ENROLLMENT SCHEDULE}
{rank=same COURSE PROFESSOR}

// use invisible edges to rearrange nodes
  ENROLLMENT -> PROFESSOR [style=invis]
  STUDENT -> SCHEDULE  [style=invis]  
}

Giving:

Thank you for the suggestions! These are very helpful.

Since I am actually generating the code automatically, I wouldn’t really have a way to see the graph beforehand so I can decide which tables to give the same rank to and which tables should have an invisible line connecting them.

Is there a logic to how you decided which tables should be grouped together? Perhaps by looking at the foreign key relationships between them?

I’m not sure, but think we invented ratio=compress to solve this problem.

See ratio | Graphviz

Amazingly, we don’t have any pictures to explain this - just a lot of words.

I changed my code to this:

digraph ERD {
graph [ rankdir = "LR", ratio = compress ];
ranksep=1;

I didn’t notice any changes and after reading through the link I realize that I need to set a specific size for the graph. Unfortunately, I don’t think I can do this either because graphs with many nodes would shrink down to undesirable sizes. Or am I misunderstanding this attribute?

Is there anything else that can be set in a relatively automatic way?

Thank you!

Based on a sample of 1, I may be close to an automated solution.
First, is this graph OK? It is your starting input with the only changes being size and ratio attributes.


The problem is determining the correct value for size. It is fairly easy to set the starting value, but then it gets a bit messy. I’m still working on it.
Questions:

  • roughly how many graphs?
  • what OS (or OSs)?
  • does the solution just have to work for you or will it be shared with many users?

That graph looks great, but I think that would be pretty much the limit on width. As in, I think the next few nodes should extend the height, not width.

Here is the example I am attempting to replicate (not made with Graphviz):

In response to your questions:

  • I’m not sure what you mean by how many graphs? The number of nodes (representing tables) within each graph would vary but would likely not be more than 15.
  • I am currently using Windows, and it would also have to be compatible with Linux. Would this be an issue? I’m not familiar on the differences in Graphviz based on OS. The solution would then ideally work for everyone.

Thank you so much for your help!

  • “How many graphs?” - Assuming 1 graph per database, how many databases are you wanting to model/graph? 5 databases, 50, 500?
  • if you set ranksep to .75 or even .5, the resulting graph will tighten-up in the X dimension.
  • if you are wanting to replicate the node placement of the graph “not made with Graphviz”, in an automated way - I think that would be very difficult with Graphviz. (see below for the “naive” dot version)
  • if you just want to approximate the style of “not made with Graphviz”, I think you could get fairly close, but close is in the eye of the beholder
  • setting rankdir=LR is going to tend to generate a horizontal graph. In many ways, dot is not the best engine for a network like yours because your network lacks “natural” ranks (my opinion). I tried the other Graphviz engines, but saw nothing that grabbed me.
  • except for the occasional bug or OS oddity, Graphviz produces the same results on all OSs
  • I asked about OS because my ratio=compress solution requires more than one execution of dot per graph (2 to 6 executions) and a tiny bit of scripting to make it all happen. (The script/program computes the size values)
  • Try using “html-like labels” instead of “record labels” for your nodes. That should allow better placement of edges and styling of concatenated keys
  • finally, I think there are other ERD diagramming tools. You might consider them also

I see! It’s going to be more like 5 graphs. This is for a small research project so performance is really not of any concern. I’d rather the graph be somewhat close to the original image.

I’m curious what your fairly close option is? I, of course, do not need the graph to be identical. I just wanted to get rid of some of the width of the graph and thought it was interesting that the engine did not do more to optimize the use of negative space.

Would you recommend using a script to figure out ratio=compress? I’m still not sure what this entails. Or, would you recommend html-like labels? I’m not familiar with this format, would you be able to show me what the translation of my code would be if you think it’s a better approach?

Thank you!

  • None of the Graphviz engines have “space management” as the primary goal. Very roughly speaking, dot uses “rank” as the primary driver (see https://graphviz.org/pdf/dotguide.pdf) and neato (kind of) uses total edge length as the primary driver (see https://graphviz.org/pdf/neatoguide.pdf) [There will not be a test on the two documents, but they are worth a quick once over]
  • html labels and ratio address two very different things, not an either/or situation.
  • here is a video of the same graph made using varying size attributes: output.mp4 - Google Drive [hoping that the video shows the visual changes better that lots of words]
  • here is a shell script & a GVPR program (part of Graphviz package) that will give you sizes to try:
f=myfile.gv; 
T=png; 
F=`basename $f .gv`;
dot -Tdot $f |
  gvpr -f squish.gvpr|
  while read size;do 
    dot -T$T -G"size=$size" -G"ratio=compress" $f >$F.$size.$T; 
  done

The script runs dot once to determine the “native” size of the graph, listing progressively smaller sizes (in the non-rank dimension) to “squish” the graph, and runs dot repeatedly to create these squished graphs. Here is squish.gvpr:

BEG_G{
  string z[int];
  float  dx, dy, f, incr;
  incr=.5;
  split($G.bb,z,",");
  print($G.bb," ",z[0]," ",z[1]," ",z[2]," ",z[3]);
  dx=(float)z[2]-(float)z[0];  // compute delta X
  dy=(float)z[3]-(float)z[1];  // compute delta Y
  sizex=dx/72.;  //convert points to inches
  sizey=dy/72.;
  if (hasAttr($G, "rankdir")){
    if ($G.rankdir=="LR"||$G.rankdir=="RL"){
    print ("// LR");
      for (f=sizey;f>=2.;f-=incr)
        print (sizex,",",f);
      exit(0);
    }
  }
  for (f=sizex;f>=2.;f-=incr)
    print (f,",", sizey);
}
  • If “squishing” your five graphs produces a “good enough” result, let’s put off html nodes until another day

Thank you so much for this. I am trying to integrate it with my code. Currently, I have the following written in C:

    // End and close the file
    fputs("}\n", fp);
    fclose(fp);

    // Translate the file
    char *export_graph = "dot diagram.dot -T svg -o diagram.svg";
    system(export_graph); 

How would I be able to use the first section of code you provided in your previous message? I need to execute all command line commands from this C file…

I also pasted the second section of code into a text file with that name. Is that fine? Will it be able to compile or do I need to include it into my Makefile to link it… I’m a bit of a beginner when it comes to scripts.

This is my Makefile so far:

.PHONY: all clean
CFLAGS = -I"C:/Program Files/Graphviz/include/graphviz"
LDIR = -L"C:\Program Files\Graphviz\lib"
LDFLAGS = -lgvc -lcgraph

binaries=sqlpsql
all: $(binaries)

sqlpsql: SQLP.c SQLPGrammar.y SQLPScanner.l SQLPtoSQL-main.c Preprocess.c Rules.c 
	
	bison -v -d SQLPGrammar.y
	flex --nounput -D SQLPGrammar SQLPScanner.l
	gcc -Wall -o sqlpsql SQLP.c SQLPtoSQL-main.c Preprocess.c Rules.c $(CFLAGS) $(LDIR) $(LDFLAGS)
	rm -f lex.yy.c SQLPGrammar.tab.c SQLPGrammar.tab.h

clean:
	-rm -f *.o *.output *.dot *.png $(binaries)

C coders: Help!
[I have not written a serious C program in over 25 years - I am way past rusty. Additionally, I seem to have misplaced my copy of Marc Rochkind’s “Advanced UNIX Programming” (https://www.amazon.com/Advanced-UNIX-Programming-Marc-Rochkind/dp/0131411543) (mine was the first edition), so I am really in deep water]

As I understand it you want to rewrite this:

  gvpr -f squish.gvpr|
  while read size;do 
    dot -T$T -G"size=$size" -G"ratio=compress" $f >$F.$size.$T; 
  done

as a C program. Very doable, I just don’t remember how to set up a pipeline in a C program. I’ll thrash about. Maybe one of the smarter folks will chime in.

p.s. I think your system call is missing the input file name
p.p.s. the squish.gvpr program file can be incorporated into your C program as a text string as part of the command line

Constructing that in C is probably going to involve popen and be a little complicated. Alternatively you could put this in a shell script and call Bash in the system call supplying that shell script.

For the second option, how would I do that? Would it involve defining the script as some kind of variable? (I am using Powershell to run my program)
Would you mind providing instructions for how I can execute that?

While this is possible, I think you probably don’t want to do this. You probably want to replicate Steve’s GVPR program in C so you can run the logic inside your own program instead of having to call GVPR and parse and interpret its output.

Alternatively you could have some wrapper script/orchestrator that calls your sqlpsql binary and then takes its output and runs the GVPR squishing. Writing this outer script in e.g. Python or Ruby is likely to be much easier than trying to manage interaction between subprocesses in C.

Hi! Sorry for the delayed response, I’ve been trying to get the graph squished on my own and I succeeded to a degree that I think will be fine!

An example looks like this:

Here is the dot code for it:

digraph ERD {
graph [ rankdir = "LR", ratio = compress, size=8];
ranksep=1;
"DEPARTMENT" [ label="<DEPARTMENT> DEPARTMENT|<PK_DEPARTMENT>deptcode \l | <F_DEPARTMENT> deptname \l " shape = "record", style = "rounded" ];
"COURSE" [ label="<COURSE> COURSE|<PK_COURSE>deptcode \lcnum \l | <F_COURSE> cname \l " shape = "record", style = "rounded" ];
"PROFESSOR" [ label="<PROFESSOR> PROFESSOR|<PK_PROFESSOR>pnum \l | <F_PROFESSOR> pname \loffice \ldeptcode \l " shape = "record", style = "rounded" ];
"CLASS" [ label="<CLASS> CLASS|<PK_CLASS>deptcode \lcnum \lterm \lsection \l | <F_CLASS> pnum \l " shape = "record", style = "rounded" ];
"ENROLLMENT" [ label="<ENROLLMENT> ENROLLMENT|<PK_ENROLLMENT>snum \ldeptcode \lcnum \lterm \lsection \l | <F_ENROLLMENT>  " shape = "record", style = "rounded" ];
"SCHEDULE" [ label="<SCHEDULE> SCHEDULE|<PK_SCHEDULE>deptcode \lcnum \lterm \lsection \lday \ltime \l | <F_SCHEDULE> room \l " shape = "record", style = "rounded" ];
"MARK" [ label="<MARK> MARK|<PK_MARK>snum \ldeptcode \lcnum \lterm \lsection \l | <F_MARK> grade \l " shape = "record", style = "rounded" ];
"STUDENT" [ label="<STUDENT> STUDENT|<PK_STUDENT>snum \l | <F_STUDENT> sname \lyear \l " shape = "record", style = "rounded" ];
"COURSE":"PK_COURSE"->"DEPARTMENT":"PK_DEPARTMENT" [arrowhead = normal] [label="deptcode"];
"PROFESSOR":"F_PROFESSOR"->"DEPARTMENT":"PK_DEPARTMENT" [arrowhead = normal] [label="deptcode"];
"CLASS":"F_CLASS"->"PROFESSOR":"PK_PROFESSOR" [arrowhead = normal] [label="pnum"];
"CLASS":"PK_CLASS"->"COURSE":"PK_COURSE" [arrowhead = normal] [label="deptcode, cnum"];
"ENROLLMENT":"PK_ENROLLMENT"->"STUDENT":"PK_STUDENT" [arrowhead = normal] [label="snum"];
"ENROLLMENT":"PK_ENROLLMENT"->"CLASS":"PK_CLASS" [arrowhead = normal] [label="deptcode, cnum, term, section"];
"SCHEDULE":"PK_SCHEDULE"->"CLASS":"PK_CLASS" [arrowhead = normal] [label="deptcode, cnum, term, section"];
"MARK":"PK_MARK"->"ENROLLMENT":"PK_ENROLLMENT" [arrowhead = normal] [label="snum, deptcode, cnum, term, section"];
}

However, there are a few final (minor) things I am noticing with it and was wondering if hopefully you had some suggestions for them?

Firstly, the edge of the graph on the right side is cut-off, as in the last node is missing it’s right edge. Is there a reason for this that is apparent from my code? Or is it just an unfixable glitch?

Secondly, is there a way to clean up the arrow labels a bit? Perhaps to make them follow the direction of the arrow, rather than restricting them to be horizontal? The two edges coming out of the ENROLLMENT table are a bit hard to interpret…

Thank you so much for your help!

Problem #1 - image clipping/truncation

Problem #2 - funky edge label placement

Hi!

Problem #1:
Using -Tsvg:cairo is a very promising solution, but for some reason another one of my other graphs gets rendered like so:

It did fix the other graph’s cut-off issue, however… Is there any reason for this graph turning out this way?

Problem #2:
I do think that adding a newline between really long arrow labels will be a good workaround to my problem. However, I am having a bit of trouble doing so (embarrassing). When I try to write “\n” to the file, the generated .gv file simply just includes an actual newline and pushes the file to the next line, which does nothing for the rendered graph. How can I get my C code to just print \n into the .gv file?

Thanks!

Problem #1:
Hmm, Cairo seems to be struggling with fonts and character placement. Rather than chase this down, first try a different work-around. Embed all the nodes and edges in an invisible (peripheries=0) cluster. It does not stop the chop/truncation, but it will not be visible.

digraph ERD {
graph [ rankdir = "LR", ratio = compress, size=8];
ranksep=1;
  subgraph clusterNoChop {  // enlarge the graph a bit
  graph [peripheries=0] 

"DEPARTMENT" [ label="<DEPARTMENT> DEPARTMENT|<PK_DEPARTMENT>deptcode \l | <F_DEPARTMENT> deptname \l " shape = "record", style = "rounded" ];
"COURSE" [ label="<COURSE> COURSE|<PK_COURSE>deptcode \lcnum \l | <F_COURSE> cname \l " shape = "record", style = "rounded" ];
"PROFESSOR" [ label="<PROFESSOR> PROFESSOR|<PK_PROFESSOR>pnum \l | <F_PROFESSOR> pname \loffice \ldeptcode \l " shape = "record", style = "rounded" ];
"CLASS" [ label="<CLASS> CLASS|<PK_CLASS>deptcode \lcnum \lterm \lsection \l | <F_CLASS> pnum \l " shape = "record", style = "rounded" ];
"ENROLLMENT" [ label="<ENROLLMENT> ENROLLMENT|<PK_ENROLLMENT>snum \ldeptcode \lcnum \lterm \lsection \l | <F_ENROLLMENT>  " shape = "record", style = "rounded" ];
"SCHEDULE" [ label="<SCHEDULE> SCHEDULE|<PK_SCHEDULE>deptcode \lcnum \lterm \lsection \lday \ltime \l | <F_SCHEDULE> room \l " shape = "record", style = "rounded" ];
"MARK" [ label="<MARK> MARK|<PK_MARK>snum \ldeptcode \lcnum \lterm \lsection \l | <F_MARK> grade \l " shape = "record", style = "rounded" ];
"STUDENT" [ label="<STUDENT> STUDENT|<PK_STUDENT>snum \l | <F_STUDENT> sname \lyear \l " shape = "record", style = "rounded" ];
"COURSE":"PK_COURSE"->"DEPARTMENT":"PK_DEPARTMENT" [arrowhead = normal] [label="deptcode"];

//graph[splines=curved]  // maybe
//graph[splines=true]  // ugly
//graph[splines=polyline] // huh, same as true
//graph[splines=ortho]  // nope
//graph[splines=false]
"PROFESSOR":"F_PROFESSOR"->"DEPARTMENT":"PK_DEPARTMENT" [arrowhead = normal] [label="deptcode"];
"CLASS":"F_CLASS"->"PROFESSOR":"PK_PROFESSOR" [arrowhead = normal] [label="pnum"];
"CLASS":"PK_CLASS"->"COURSE":"PK_COURSE" [arrowhead = normal] [label="deptcode, cnum"];
"ENROLLMENT":"PK_ENROLLMENT"->"STUDENT":"PK_STUDENT" [arrowhead = normal] [label="snum"];

// headlabel & tail label make a bigger mess
// prepending a \n helps
"ENROLLMENT":"PK_ENROLLMENT":se->"CLASS":"PK_CLASS" [arrowhead = normal] [label="deptcode, cnum, term, section" ];
"SCHEDULE":"PK_SCHEDULE"->"CLASS":"PK_CLASS" [arrowhead = normal] [label="deptcode, cnum, term, section"];
"MARK":"PK_MARK"->"ENROLLMENT":"PK_ENROLLMENT" [arrowhead = normal] [label="snum, deptcode, cnum, term, section"];
}
}

Problem #2:
You need 2 \ characters together. Three examples below:

  char s[50] = "0: \\n";

   printf("%s\n",s);
   s[0]='\\';
   s[1]='n';
   s[2]='\0';      
   printf("1: %s\n", s);
   printf("2: \\n  \n");

Wow, perfect thank you so much! Both of those worked incredibly.

I really appreciate all your help with this :slight_smile: