Large GraphViz test cases? Parallelization and tuning

brianrmurphy · January 7, 2026, 10:20pm

I’ve found some mention of a few large test case files bevy.gv and ftzz.gv, but I cannot find these files anywhere. Where might they be? I have my own test case I stumbled upon, but it’s apparently way too big to allow convenient profiling, and perhaps one of those can be cut down more easily.

Incidentally, has anyone tried to parallelize dot? I seemed to have some small success with some initial work but then I took a breather to organize testing first after finding only a small speedup. These days computers have a lot of cores, and running single-threaded seems so wasteful. Are there still people working on the project who understand the basic algorithms being used and might be willing to discuss/review alternate implementation strategies if I try to do this?

-Brian

steveroush · January 7, 2026, 11:21pm

Welcome! Some pointers:

ftzz,gv & bevy.gv - dot gets stuck running mincross for a very long time on large graphs (#2475) · Issues · graphviz / graphviz · GitLab
many ranks - Graphviz takes forever to render the graph (#1652) · Issues · graphviz / graphviz · GitLab
parallel execution - parallelise 'enter_edge' (#2719) · Issues · graphviz / graphviz · GitLab
all the performance issues - Issues · graphviz / graphviz · GitLab
The dot algorithm - https://www.graphviz.org/documentation/TSE93.pdf

smattr · January 8, 2026, 9:43am

You probably want to look at tests/compare_performance.py in the Graphviz repo. ftzz.gv and bevy.gv are checked in as tests/2475_1.dot and tests/2475_2.dot.

scnorth · January 8, 2026, 1:35pm

I worked on some of the basic algorithms in dotgen, and would be happy to discuss parallelization.

There are four major phases to the layout, as described in the 1993 IEEE TSE paper. There have been some research publications in the past 5-10 years aimed at parallelizing some of the algorithms (specifically, network simplex) that would be worth finding and reading.

Searching for “parallel network simplex” one finds Kara and Özturan 2021, in which the abstract mentions “Our experiments show speedups up to four are possible, though they are typically lower.” Well, ok. Coarse-grained parallel crossing reduction might employ different randomization? Parallel spline routing might involve finding independent sets of routes and giving them to different threads.

Somewhat bad news is that running time is balanced between network simplex, crossing reduction, and spline routing, so running much faster means attacking all 3.

brianrmurphy · January 9, 2026, 8:02am

Have you used the tests in compare_performance.py as regression tests, or are they too slow for that? We seem to be just ignoring the outputs there, and I’m seeing some errors.

When I run compare_performance.py, I notice that #1172 has errors on the released versionon Ubuntu 24.04 (2.42.2-9ubuntu0.1, which calls itself 2.43 if asked):

lib/pathplan/shortest.c:93: cannot realloc pnls
Error: in routesplines, Pshortestpath failed
... [repeats many times]
Error: lost 4970 455 edge
Error: lost 4971 416 edge
... [ many more similar errors]
warning: command returned 1

which seemed to have disappeared when run from recent gitlab sources.

I also see similar errors in #2064 with the released version.

Also #2475, which took almost 2 hours to run on my (not completely idle) laptop.

Ok, too slow for CI, but an occasional regression run might be a good idea.

smattr · January 9, 2026, 9:29am

We seem to be just ignoring the outputs there, and I’m seeing some errors.

There is certainly ≥1 workload in there that is known to exit with non-zero status. This is intentional and something we still want to profile.

2.42.2-9ubuntu0.1, which calls itself 2.43 if asked

This too is a known problem.

steveroush · January 9, 2026, 11:52pm

I have ~60 input files with >300 nodes & >300 edges (semi-arbitrary cut-off). Many came from the Issues web pages and this forum.
Below is a graph showing the % of time spent in each of the 5 phases (inclued the drawing step) for some of the smaller files. (I’ll keep timing the larger files, but don’t expect completions)
Absolute timings are available if anyone want them.

[issue456 was killed during mincross]
[the timings come from dot -v and (roughly) grepping the stderr output for sec in the last field of each line]

brianrmurphy · January 10, 2026, 8:24am

How commonly used are the various layout filtes (dot, neato, circo)? I notice you are running mostly dot but also those other two in compare_performance.py. Are your runs using different tools?

I’ve managed to do sampling profiles of a few of the faster cases on my AMD Linux laptop. Indeed the hot spots correspond with cache misses, so memory footprint is key. Neato has very different hotspots than dot, but somehow I missed the circo test for now. My own test case was bogged down in mincross when I gave up profiling.

Btw, do separable subgraphs/components get processed often enough that we might do those in parallel? Not immediately clear that the code is thread-safe processing multiple subgraphs at once. If this is a common use case, and it’s not already done, it’s probably worth duplicating the graph to get better memory locality.

Lots of clever code here. I’ll have to dig into the data structures to figure out whether it can be compacted and/or what operations may be thread-safe w.r.t. code.

I’m suddenly remembering that parallelizing dusty-deck Fortran was made harder because of bugs probably introduced hand-tuning for old vector machines. In another previous jobs, we had some very clever hand-coded scheduling code that was eventually surpassed by a simpler approach with a commercial ILP solver, which more than one person could understand (after that one person left, and the code was unmaintainable).

scnorth · January 10, 2026, 5:48pm

What do you mean by “edge placement” apart from the 2nd network simplex phase, and spline routing? Is this the “top half” of the spline router, i.e. find the feasible regions for the splines to be drawn?

steveroush · January 10, 2026, 6:47pm

The legend definitely needs work.
edge placement is found by grepping stderr (dot -v …) for [Graphviz] routespl.c:
splines is found by grepping stderr (dot -v …) for [Graphviz] emit.c:

steveroush · January 10, 2026, 7:37pm

How commonly used are the various layout engines?
- very good question, no simple answer. But the bulk of the performance issues seem to be about dot
I do not use compare_performance.py. Not an editorial, just a very old dog (me) not wishing to learn a new trick.
Do separable subgraphs/components get processed frequently?
- All(?) the engines recognize pack & packmode attributes concerning layout of disconnected components
- dot and fdp handle clusters separately
You can run every input through every layout engine. Some attributes are not used by every engine, but they will be accepted (and maybe ignored). Here is a bash script that produces commands for many engines and interesting options

brianrmurphy · January 26, 2026, 6:02am

I have a relatively reviewable fix to ns.c which copies data into a private graph and
computes with more locality.

In a preliminary comparison, for the test case 2343.dot, I see 71% reduction in total run
time, due to 5x improvement in 2nd ranking time (on a recent AMD laptop processor).
“#2465 1” was -47% in one run, @2621 was -56%. Other “compare_performance.py”
examples are either very fast or didn’t have much improvement.

Test case #1864, which takes about 22m23s, and other shorter tests use neato or circo and are apparently unaffected by this change.

The approach is kind of ugly and a little inefficient (copying data in and out), but less risky as it barely touches the original algorithm code. I’m still cleaning up a merge request, so we
can discuss whether this can be made better when I post that.

Meanwhile, I’m looking for more benchmarks to validate and motivate reviewer effort as
well as to find other hotspots. As the really big cases get bogged down in other spots,
I suspect I need to look at mincross and maybe a few other passes. I’ll start by profiling
#1864 for the next round.

@steveroush Do you have numeric data for that graph and/or a collection of the input files
and cmd line options used to run them?

-Brian

steveroush · January 27, 2026, 1:05am

Here is a gist containing the 60 files. All have been annotated with performance-related data (# of nodes, edges, ranks, …), but not with timings.

brianrmurphy · February 3, 2026, 6:22am

I only see two graphs reachable from that .md file: 2854_0.gv and 4elt.gv.
They’re pretty big (28170 lines and 9048 lines) but don’t seem to contain 60 examples.

Topic		Replies	Views
Dot is no longer my friend. What can I do? (10h SVG Render Process) Help	23	566	January 20, 2026
Dot command seems to never end Help	5	3759	December 19, 2021
Latency in loading graph Help	6	899	July 1, 2021
Where does generating the graph take most of the time?	7	4670	May 24, 2021
Creating a DOT graph with thousands of nodes Help	2	3859	March 27, 2022

Large GraphViz test cases? Parallelization and tuning

Related topics