Large graph builders - should Graphviz provide more help?

And if so - what?

Most of the questions to the Forum, to stackoverflow, and to the Issues system are about small-ish graphs. Does that mean that large graph builders have everything under control?
My intuition says “no”, but my intuition is often confused.

So, if there are any users who build large graphs (say, 1500 edges or more +/-), please share your opinions - how can Graphviz be more supportive?

p.s. I know that faster is always better. What else? Especially in the area of documentation, tips, tools.

One quick win would be the ability to have some sort of “progress” callback (at the API level)

FWIW My main use case involves graphs that are too big and I present a tree view + breadcrumbs view of the data so they can pick the “top” cluster to render from…

We might provide more tutorial guidance about what to expect. In terms of layout, if a graph is directed but is not a tree, layered graph layout (dot) isn’t usually practical for more than a few dozen nodes. You can achieve somewhat of the same effect using neato -Gmode=hier (as explained in this report) up to maybe a few hundred nodes. sfdp for undirected graphs was engineered for thousands of nodes.

For large graph viewing, I’m not sure of the state of the art today, but it used to be the case that web clients run out of gas around 100K DOM objects. A node or edge typically has a couple of DOM objects. (I forget whether piecewise cubic Bezier splines are represented by one DOM object or possibly one per segment.) We like d3-graphviz (which is presently pinned to the top of this forum) it does rely on DOM rendering, and it’s probably good for several tens of thousands of nodes and edges. (Someone should let us know.)

cytoscape may be more scalable.

When graphs get really large, for undirected graphs (spring models) there is technology like tSNE and uMAP implementations in python that may be more satisfactory if you don’t need to see the edges. Also the “graphistry” commercial software uses GPUs for layout and rendering and is more scalable up to the memory limit of the GPU. I don’t think it has the concrete rendering features of graphviz (like all the node shapes and text layout options) but when I looked at it a few years ago it seemed good.

The vast majority of expensive graphs I’ve profiled are bottlenecked in dfs_range. I wonder if simply plumbing through a progress callback for that single function would be enough.

When I profiled, long ago, I thought the cost of a large layout was fairly well balanced between phases 2-4 (mincross, X coord solving, spline routing); cost of phase 1 (ranking = Y coord solving) i negligible. dfs_range is only involved in phase 3. A progress bar would probably need to account for this.