Work going forward

First let me express how excellent it is to have @mark and @smattr on board. Your contributions are excellent and what a pace you’re having! I haven’t been able to keep up and reviewing even stuff that I might be able to understand, but you seem to be doing great with the help of each other. When I’m head over heels into something that is taking much longer time than I expected when I started, I find it hard to focus on anything else. This was the case with the D3 Graphviz Theme Component and with rtest that has taken most of my time the last few months.

I’m thinking on what major thing to do next. There’s always a conflict between what would be most fun and what would be most useful (even though it gratifying to be useful).

Most fun would be to look into some of the interactive stuff that’s been discussed in Graphviz web site, new edition. @mark’s ideas are excellent and would love if we could do something together. I also have some ideas on how to use animations to show how e.g. attributes work.

Later on I would also like to create more tests, but right now my mental budget for testing is overdrawn and I have to await the next budget period :stuck_out_tongue_winking_eye:. It sounds like @smattr is also interested in testing.

Right now I however feel that we must do something about the Windows side: binaries distribution, installation and build instructions and the CI pipeline. If no one else is going to take a stab at it, I think I will start with that. I’m sure it’s going to be a bumpy road :cold_sweat:.

Let me know if there’s something else that you think I would be suited to do that you feel is pertinent.

In general, I think it would be fun to cooperate more on stuff going forward. The result is usually much better if you discuss details up-front, rather that just reviewing when the work is done. Please let me know your thoughts and what your plans are.

Thank you for wanting to look into this.

Windows is probably more popular than all other platforms combined for Graphviz.

1 Like

Sorry, that’s somewhat on me :frowning: None of my MRs are time-critical, so would it help if I plan to leave each MR live for about a week and when posting I include a deadline date on which I’m planning to merge if I hear no objection? Also feel free to drop a comment that you haven’t had time to read a diff and I should delay merging.

I certainly like the seatbelt a test suite provides me!

I have no grand plans for new features. My main interest is in paying down some of the technical debt. In my debugging edits, I’ve frequently made mistakes that the compiler warned about but the warning was lost in the sea of other warnings. I’d like to get to a place where the build is warning free before I feel comfortable implementing new functionality. Having said that, I don’t think this should prevent yourself or others going ahead with new ideas.

1 Like

Not at all. I just wanted to clarify that the lack of feedback from me is not due to lack of appreciation, just a lack of time and knowledge. In order to perform reviews with any kind of quality, I would need to spend quite some time getting acquainted with the codebase first. Time that I think for me is better spent elsewhere at the moment. Especially given that we have others who can do this much better than me.

I completely agree. The kind of functionality I’m thinking about would be front-end stuff for the web page/documentation and not affect the C codebase.

Subsequently, an idea did occur to me… I wanted to probe our group mindset on supporting large graphs. Obviously “large” is a bit of a subjective term here. E.g. in graphviz#1731 from today, a user has a graph with 676 nodes and 16785 edges. For some people, this is large. For others, it is small.

I have some skin in the game on this one as I occasionally want to process graphs of ~10K nodes.

Issues like 1731 usually provoke a discussion about whether you could even read this graph if it was rendered. Typically, this is not relevant to me. My rendered graphs are not being printed or inserted into documents, but being interactively explored or programmatically consumed by downstream tools. Obviously there’s a limit at which your graph no longer fits in memory or is computationally intractable to render, but I want Graphviz to run right up to these limits.

How do other people feel? Are large graphs a priority? Should Graphviz be only concerning itself with printable/readable graphs?

My 2c (haven’t thought about this much): The graph abstractions should scale well to whatever memory you have.

However, I wouldn’t expect layout algorithms (placing the charts in 2D space) to scale that well (I’m not up on the algorithms here, I’m just guessing they’re at best linear-time).

I’d be interested in what you mean by being interactively explored or programmatically consumed. Are you talking about UIs that let you pivot on connections to a graph, like a hypertext graph traverser?

The dot language is a neat serialization format for graphs and a good lingua franca for tools that output graphs, but I think Graphviz is mostly focused on, well, graph visualization rather than being (say) a graph database that you can (say) ask questions about the graph using SQL. Would love to hear more about the applications you have in mind?

Something much less fancy than you imagine :slight_smile: We were producing some large SVGs that users would open in Google Chrome to then zoom in and out of. When the SVGs became too large for Chrome to handle we moved to a different specialized viewer (I can’t remember what offhand). When this became unwieldy, we had a script the consumed SVGs and, knowing what the user was interested in, would filter it to a smaller manageable SVG.

None of this was exactly optimal, or possibly even a good idea to begin with. But the accessibility of Graphviz and Chrome made the initial work flow attractive. Then, because SVGs are XML, it was straightforward to filter them in a Python script using XPath navigation etc. At the height of this setup, I think the build system even had some regression tests that scanned Graphviz-generated SVGs for particular expected features.

You could argue we should have been filtering the dot files before input to Graphviz or using a different graph processing system entirely, but it was hard to dispute the convenience of the arrangement we arrived at.

If you’re wondering what the actual use case for the above was, we were graphing data flow between so-called micro-operations (μops) in the execution pipeline of an x86 CPU. It was convenient to experiment with unorthodox scheduling constraints or optimizations and quickly see graphical feedback of the effect they would have on a particular workload. This was a research project, not something we used for production, but it was useful nonetheless.

I just realised I didn’t reply to the original question about dev work going forward. I think I’ll probably focus my efforts on documentation and examples, I’d like to have a world-class reference docs and getting started page.

1 Like

Have you done any major work on this yet? I’m asking because:

  1. Short-term I will update to adapt to the new Windows distribution. If you already have major changes coming in, I’ll do this as a bare-bone MVC. Otherwise I might do some more restructuring.
  2. Medium-term I plan to prototype some interactive stuff that might be useful on a getting started page and if you already have done stuff I’d like to adapt.

All the work I’ve done so far has been committed, so go ahead!

In the background I’m evaluating the docsy theme, which codifies in a lot of docs best practices. None of that should affect what you’re doing.

1 Like

I can’t decide what to do next. Please help me.

In your opinion, what would be the single most useful thing for @magjac to work on next?

  • Simplified release version numbering
  • Windows builds using Docker images and updated dependencies to lay the ground for further improvements on Windows
  • Work on any of the self-assigned bugs before doing anything else (please specify which in the comments)
  • Something else (please specify in the comments)

0 voters

I think both better windows CI and simplifying the release process would be important.

I’ll draw a slight distinction and say that the release version numbering is just a means to an end towards a simpler release process for maintainers (our users don’t care very much about what release version numbering scheme we use, version numbers are mostly used by us in bug reports). Do you agree?


Yes I agree, although I haven’t decided yet. The argument for making it simpler for maintainers already now is that going forward it will free up time that can be spent on user improvements instead of wasting it on making (complicated) releases.

absolutely. As a vision, i’d love to one day have an extra pipeline that we can run to say ‘take this build and treat it as a public release number x.y.z’ and have it all be automated. wouldn’t that be nice :-). Perhaps that pipeline sets all the environment variables to the build script to set the version number and build timestamp, and adds a commit to bump the version number. One can dream.

1 Like

For me it’s more than a dream or a vision. I’m already planning to do something like that.

We might even want to go even further to have that as the only way of releasing. In practice we don’t do any manual verification before stable releases anyway. If we would be stringent with requiring merge requests to update the version number and the changelog accordingly, there is no difference between a development release and a stable release. True continuous delivery in other words.

Sounds fantastic. Automating this is important/impactful work, as it’ll save us work in the future for every release

1 Like

The vote was inconclusive, but I’ve started to work on Windows builds using Docker because it seemed like a better opportunity to learn something new and it will make further Windows related CI/CD stuff so much faster and easier.

1 Like