Full Tree of Life

I have used “-Ksfdp -GK=0.2 -Goverlap=true” to successfully layout all known life (4.5M taxa) in just 40 minutes, but I seek for improvement.

If you have any suggestions on how to make small branches more readable and whole tree more distributed - please share. Reducing K factor increases branch overlapping.

Full Tree of Life

5 Likes

Wow! Your application is amazing :grin:

Thank you!

Indeed, most impressive. Most impressive.
I have no direct experience with graphs with millions of nodes, and no useful experience with sfdp. That said, here are some of the attributes I would experiment with:

  • beautify
  • overlap (a guess: overlap=prism or overlap=prism3 )
  • overlap_scaling
  • overlap_shrink
  • sep
  • esep
  • smoothing
  • levels (maybe)
  • voro_margin (maybe)
  • quadtree (maybe)
  • repulsiveforce (maybe)

For completeness, you could use ccomps to break the graph into non-connected subgraphs; layout those graphs using the engine of choice (sfdp?), and then use gvpack to combine them back into a single graph. (Not sure I like the idea, but it but you never know until you see the result.)

You are at the level scale that was explored by the author of sfdp, Yifan Hu.

See his gallery, Visualizing large graphs: graph visualization of matrices from the SuiteSparse Collection

Many examples are under 100K nodes, but his website says “the largest graphs have tens of millions of nodes.” Yifan doesn’t hang out here, so you should contact him at his professional website Professor Yifan Hu

He’s really positive, and knowledgable about this topic.

I absolutely love it! I have this poster framed on my wall for many years now, and I’m still in love with it. Your graph satisfies on a slightly different level :slight_smile:

Thank you, right now I feel like only -GK and -Grepulsiveforce affect layout in somewhat desired way. In Gephi, Yifan Hu layout accepts more parameters, I will try it too. It seems like Graphviz’s sfdp finds optimal position for all nodes before applying force-directed algorithm, which leads to much better (and consistent) results, while in Gephi initial positions are random. I am already working with connected components one by one, but I’m using my own wrapper around Gephi Toolkit to deal with them.

I’ve visited this Gallery multiple times, but those pictures don’t give any useful information to me. All datasets and pdf’s are “File not found“, there’s no description on what those graphs represent and command which have been used for layout is not provided either. I think it would be cool to have similar gallery for Graphviz with examples of point-only graphs and how different layout parameters affect output.

Thank you. I think I have seen this chart and its interactive version. I like how all branches point outside of a circle, it gives more tree-ish look. Also, I wonder how all those micro-braches have been drawn.

I have tried twopi layout, but even for NCBI database (2.6M nodes across 38 depth levels with highest count of 389K at depth 10) circles become too big. It would be cool If it would’ve been possible to tweak a distance to the center of nodes at same depth. E.g. nodes at depth 5 placed at radius = 0.95x, 1x 1.05x.

Or maybe another way to make all branches point outwards.

1 Like

When a friend of mine, with a background in biology, saw the poster hanging in my apartment, she suggested that it’s not scientifically accurate. She gave me the impression that it was more artistic than scientific, which I could make my peace with. At the time, I hadn’t even questioned if the tiny branches have any accuracy, even at the time of publication. I always assumed someone drew them semi-randomly by hand. To me, it serves as a reminder of the richness and diversity of life. Alternative visualization just contribute more to that :slight_smile:

Understood, though how can they know what is “accurate” at a detailed level.