Question on Graphviz's edge data structures

Reading through the code in the cgraph library, I am a bit confused on the implementation for the creation and lookup for edges. It seems that nodes store multiple dictionaries to access edges from each subnode: in_id, out_id, in_seq, and out_seq.

struct Agsubnode_s {		/* the node-per-graph-or-subgraph record */
    Dtlink_t seq_link;		/* must be first */
    Dtlink_t id_link;
    Agnode_t *node;		/* the object */
    Dtlink_t *in_id, *out_id;	/* by node/ID for random access */
    Dtlink_t *in_seq, *out_seq;	/* by node/sequence for serial access */

I gather that the in_ and out_ prefixes represent edges which are incoming and outgoing from the subnode. Whenever preforming dictionary operations, these links will be copied into g->e_seq or g->e_id via dtrestore and dtextract.

My main two questions are:

  1. What are the reasons for the use of Dtlink_t structures?
  2. Why are edges looked up after finding the subnode? Why are they not looked up directly (similar to nodes)?

For the first question I have seen in the cdt documentation that there is some benefit provided by Dtlink_t. Specifically it is stated to be more efficient to walk than Dict_t and on the documentation for dtrestore(...) it is noted that this method may be used to reduce “dictionary overhead”.

Is there more context here someone could provide?

Greetings. I was one of the people that wrote this code.

The Dtlink_t structs embedded in Agsubnode_s allow it to be inserted without container objects in Dict_t dictionaries per subgraph (hence “subnode”). There’s a dictionary for walking nodes in order of creation (subgraph->n_seq) and another for accessing nodes by id (which is mapped from its string name) (subgraph->n_id). In cdt, there’s this idea of taking a dictionary (an ordered dictionary happens to be a splay tree) and kind of pickling it or freezing it via dtextract() and dtrestore() with respect to a particular Dict_t that contains all the methods for interpreting the data type being stored in the dictionary.

In cgraph, there are dictionaries in each node for its set of incident edges, instead of making a global dictionary for all edges of a subgraph. That’s where in_id, out_id, in_seq, out_seq come from. In contrast, in the original graph library (graphviz/lib/graph) and its first revision (graphviz/lib/agraph) I think we had global edge dictionaries. I think we suspected there was too much thrashing of edge dictionaries when making nested walks, so in lib/cgraph we made per-node edge sets (dictionaries). It was a long time ago and we were just trying stuff and hoped it would work well.

Nice to meet you Stephen! I have already seen and found some of your comments in the code helpful, thank you for replying :slight_smile:

I am a bit confused about the statement below. Specifically, what is the “it” which is being inserted?

The Dtlink_t structs embedded in Agsubnode_s allow it to be inserted without container objects in Dict_t dictionaries

Tangentially, I am also curious about the design of the seq / id structures. Initially when reading these I expected only the n_id / e_id to be implemented by Dict_t or similar dictionary. I was surprised to find that n_seq / e_seq were also implemented with a Dict_t instead of only a linked list. Although it does seem that e_seq is primarily used for only insertions / deletions.

Is the implementation of n_seq and e_seq as a Dict_t motivated by a desire to have everything homogeneous? Or is there some benifit to having these sequential data structures also stored as a dictionary?

Yes, mainly, keep things homogeneous. (Actually Dict_t can maintain linked lists or queues too though I don’t think graphviz ever uses that or maybe they don’t support random access in the way we need.)