Dot is no longer my friend. What can I do? (10h SVG Render Process)

Long story short: Rendering universe.gv (721.8 KB) to SVG on my machine takes 10 hours (dot -v5 -O -Tsvg universe.gv). I would like to improve that, if possible.

I recently started a small project to help me build up timelines of events, and have those merged into a single, meaningful graph. I chose to render my data as GraphViz instructions, as I’ve had many great experiences with the tool stack in the past. The visualization of that graph is an important piece of the work, as it allows me to “scroll through” the events and place myself in the mindset of that time, by just visually parsing the context of the rendered image at a given location. For reference, a “low-fidelity” rendering (ortho splines, network simplex drastically limited) is generated in CI at https://oliversalzburg.github.io/timeline/.

When I looked into the subject, I came across:

These posts already contained some very insightful information. I understand that DOT is not really suitable for graphs with the amount of nodes that I currently have. However, I believe that the properties of the output of this layout engine are the best fit for the output graph layout I see as fitting for my project. When I tried to switch layout engines, to see if that would result in quick relieve, I didn’t get any immediate results (within few minutes), and ended up aborting the rendering process. I’m happy to return to evaluating more alternatives now, if guided that way.

As mentioned in the intro, rendering the .gv currently takes about 10 hours to complete on my machine (Intel Xeon E5-2687W v4 @ 3.00GHz). The majority of time is consumed in the “network simplex” phase. I do not want to reduce the quality of the final output, just to make the process faster. I still want to end up with the most beautiful splines possible, and the least amount of crossings. As I also render the GV code itself, I am also free to approach the problems at any layer of the solution.

Feedback and guidance would be appreciated :slight_smile:

Fun prompt. First up, my actual advice: maybe shard your graph by year and then re-concatenate it?

But then let’s take this at face value. Why is it slow?

I ran this on my macbook pro M4 with dot installed from homebrew, profiling with GitHub - mstange/samply: Command-line sampling profiler for macOS, Linux, and Windows, cutting it off after 60 seconds:

$ dot --version
dot - graphviz version 13.1.0 (20250701.0955)
$ samply record --duration 60 dot ~/Downloads/universe.gv -Tsvg

I get this profile: Firefox Profiler. Width of bars corresponds to time.

It’s mostly dfs_enter_inedge or doing a depth first search. I try to remove recursion (right click on a frame and click collapse direct recursion) my browser hangs, lol. There’s so much recursion here that the profiler UI is struggling. At any rate, here’s dfs_enter_inedge: lib/common/ns.c · main · graphviz / graphviz · GitLab

1 Like

Interesting! I will look into that, and see if I can derive new actions. Thanks :slight_smile:

Regarding the years, I actually have a feature to “cluster years” (places all events in the same year into a cluster), which I had much fun with initially. However, this led me to issues that would cause dot to abort the rendering process due to internal issues, so I dropped playing with that idea a while ago. However, I understand that you’re suggesting something different, and I will need to think about if I can utilize that idea somehow. Thanks.

You are misusing the minlen attribute (minlen | Graphviz). Minlen should be an integer that expresses the number of ranks (not inches or pixels) between tail & head for a specific edge.
The result is that dot assigns 20703 ranks to the graph. This causes network simplex to run for as long as it does.
By removing all the minlen references, we get 1748 ranks.
And dot will complete in <3 minutes. Yea!
If you are not satisfied with the result without minlen, we can probably help.

p.s. Cool graph
p.p.s. I was in college in 1969.

1 Like

You’re definitely right in that I apparently misunderstood how to use minlen. I will have to tinker a bit to readjust my understanding of things.

My understanding of minlen was, if it is >1, then the node pointed to will be placed on a new rank. So my application calculates the time distance between two events, then calculates how that distance relates to the “base unit” of the graph - day, week, or month. If the base unit is “day”, and two events are more than 24h apart, the minlen should reflect that in being >1, so that the event on the new day is also on a new rank.

Similarly, when the base unit is “week”, all events within the time range of a week will share the same rank. This is an important property to receive a visual result that reflects time distances, and identical timestamps shared between individual data sets end up in the correct location in the output.

So, in my mind, the 20703 ranks were “correct”, because that is likely the number of discrete days in the graph (or at least closely related to that metric), which should also occupy distinct ranks.

That being said, I shall see if I can achieve a satisfying result, given this new information, and then I’ll return. Thanks a lot!

(I hope my understanding of the algorithm is correct)
The “problem” is that dot creates (many) virtual nodes along the long edges, and that causes the long run time. (see https://www.graphviz.org/documentation/TSE93.pdf)
However, maybe we can get the best of both worlds.
Let me think about a post-processor that would “stretch” the 1748 rank version to recreate the desired distances.

I also had this feeling that my entire approach causes too much work, because I don’t prepare the data well enough. Ultimately, I know the time of the events, and I know the exact distance I want between any two events. This is pretty solid information that is not directly translated into useful constraints for dot to use (I believe). As long as the output was generated quickly, and the result looked right, I didn’t care that my approach might not be ideal. But I was wondering if it could be beneficial to try to use neato guided by pre-calculated pos attributes. Just another thing for me to evaluate though. :slight_smile:

Do you want the events on a rank to appear in chronological order? (i.e. if a rank==1 week, should friday events always follow thursday events?)
If so, then

  • add an explicit timestamp attribute (e.g. … ts=xyz …)
  • run dot -Tdot -Gphase=1 universe.gv -o universe.dot.
    This will give all the node heights & widths.
  • then write a gvpr (https://www.graphviz.org/pdf/gvpr.1.pdf) or python or whatever program to do the pos calculations (in points)
  • run that through neato -n myfile and neato will lay out the splines.
1 Like

Thanks again for this very helpful guidance. I feel like I can now see my fundamental misunderstandings in terms of ranks and positioning. A lot of my naive observations seem to make sense now, which gets a lot of thoughts rolling.

I started to correct my minlen calculation to integer values. This also allowed me to finally fix carrying over fractional time parts into the next rank. Now the minlen also more clearly relates to the “base unit” of the graph, which is neat. However, this had no noticeable impact on rendering performance, of which there was no expectation.

The real suggestion was to drop minlen anyway, so I also did that. The result is surprisingly pleasing, given how much importance I placed on this detail of my graph. This is giving me food for thought. At the very least, this will be my new default preview version to use while working on the data. Awesome!

Given this output, I can also better understand the suggestion to work from this version and just adjust the position according to time scale in a later step. This seems much more feasible than getting used to renderings taking in the area of days.

For clarity, the graph and the data sets on GitHub are only for public discussion of the project. There is a real data set, with private information that details my own entire life, and that of people related to me, which is merged with the public data to create the real image. In that version, the edges of the individual timelines also merge into single nodes, if timelines have shared entries that would result in duplicate nodes. The 10h benchmark is for the public data set. The private data set hasn’t been rendered in weeks. I assume it heavily outweighs the 20K ranks.

It was actually very hard for me to reproduce my problems with the public data set. For the longest time, it just would not work. Until I decided to add the start of the UNIX epoch as a world event the other day. I didn’t understand that fully at the time, but somehow realized that my private data set problems also started when I added the birth date of parents. This had already confused me, but I assumed it was somehow a result of the timestamps being negative, and me not handling time travel properly in my own code.

I now understand (hopefully) that the answer is simply the minlen forcing thousands of empty ranks at the start of the graph. Which also answers why another one of my performance boosts was the logarithmic scale for event spacing, as that just results in less empty ranks as well.

I hope I can get to remaining suggestion soon, but your help might have already enabled me to pile on more data again :smiley:

@steveroush If I may ask, was the singer “Oliver” a big deal in 69? I came across him the other day while “time traveling”, and looking for who might have inspired my own name. My senses tell me you mentioned that specific year, because you have required information to solve this side quest :smiley:

If you compute the node pos values (remember: in points) and feed the result to neato -n, the run time will be just a few seconds. But note that neato uses a somewhat different algorithm for laying out the splines. Probably OK, but give it a try.

Though I remember “Good Morning Starshine”, my 1969 comment referred to the 1st node - “Monty Python’s Flying Circus Season 1 … 05.10.1969”

1 Like

The dfs_* functions are the bottleneck of the vast majority of performance problems we get reported about Graphviz. I’m not aware of any remaining low hanging performance improvement opportunities there. Someone will probably have to consult the literature and come up with an algorithmic network simplex improvement.

Nevertheless, I’ve filed this as another perf test case for us to investigate: 10h SVG Render Process (#2702) · Issues · graphviz / graphviz · GitLab

Here is another case where (misused) minlen caused grief.
Dot is challenged by graphs with many ranks (thousands, tens of thousands).
Sometimes they are “real” ranks (many nodes & many of edges), sometimes “kind of real” ranks (virtual nodes added for edge labels), and sometimes “possibly avoidable?” ranks (virtual nodes & edges added just to pad out the minlen value) . As in the two graphs mentioned above.
It would seem possible to remove “unnecessary” virtual nodes & edges for phase 3 (2nd network simplex) and then restore them afterwards.

Here is a tiny example of minlen gone bad (core dump):

/***
goal:  - a graph that would calse phase=3 performance problems
***/
digraph tall{
  edge[minlen=50000]
  a -> {b  c d e f}
  edge[minlen=50500]  
  b -> {g h i}
}