Work going forward

First let me express how excellent it is to have @mark and @smattr on board. Your contributions are excellent and what a pace you’re having! I haven’t been able to keep up and reviewing even stuff that I might be able to understand, but you seem to be doing great with the help of each other. When I’m head over heels into something that is taking much longer time than I expected when I started, I find it hard to focus on anything else. This was the case with the D3 Graphviz Theme Component and with rtest that has taken most of my time the last few months.

I’m thinking on what major thing to do next. There’s always a conflict between what would be most fun and what would be most useful (even though it gratifying to be useful).

Most fun would be to look into some of the interactive stuff that’s been discussed in Graphviz web site, new edition - #12 by magjac. @mark’s ideas are excellent and would love if we could do something together. I also have some ideas on how to use animations to show how e.g. attributes work.

Later on I would also like to create more tests, but right now my mental budget for testing is overdrawn and I have to await the next budget period :stuck_out_tongue_winking_eye:. It sounds like @smattr is also interested in testing.

Right now I however feel that we must do something about the Windows side: binaries distribution, installation and build instructions and the CI pipeline. If no one else is going to take a stab at it, I think I will start with that. I’m sure it’s going to be a bumpy road :cold_sweat:.

Let me know if there’s something else that you think I would be suited to do that you feel is pertinent.

In general, I think it would be fun to cooperate more on stuff going forward. The result is usually much better if you discuss details up-front, rather that just reviewing when the work is done. Please let me know your thoughts and what your plans are.

Thank you for wanting to look into this.

Windows is probably more popular than all other platforms combined for Graphviz.

1 Like

Sorry, that’s somewhat on me :frowning: None of my MRs are time-critical, so would it help if I plan to leave each MR live for about a week and when posting I include a deadline date on which I’m planning to merge if I hear no objection? Also feel free to drop a comment that you haven’t had time to read a diff and I should delay merging.

I certainly like the seatbelt a test suite provides me!

I have no grand plans for new features. My main interest is in paying down some of the technical debt. In my debugging edits, I’ve frequently made mistakes that the compiler warned about but the warning was lost in the sea of other warnings. I’d like to get to a place where the build is warning free before I feel comfortable implementing new functionality. Having said that, I don’t think this should prevent yourself or others going ahead with new ideas.

1 Like

Not at all. I just wanted to clarify that the lack of feedback from me is not due to lack of appreciation, just a lack of time and knowledge. In order to perform reviews with any kind of quality, I would need to spend quite some time getting acquainted with the codebase first. Time that I think for me is better spent elsewhere at the moment. Especially given that we have others who can do this much better than me.

I completely agree. The kind of functionality I’m thinking about would be front-end stuff for the web page/documentation and not affect the C codebase.

Subsequently, an idea did occur to me… I wanted to probe our group mindset on supporting large graphs. Obviously “large” is a bit of a subjective term here. E.g. in graphviz#1731 from today, a user has a graph with 676 nodes and 16785 edges. For some people, this is large. For others, it is small.

I have some skin in the game on this one as I occasionally want to process graphs of ~10K nodes.

Issues like 1731 usually provoke a discussion about whether you could even read this graph if it was rendered. Typically, this is not relevant to me. My rendered graphs are not being printed or inserted into documents, but being interactively explored or programmatically consumed by downstream tools. Obviously there’s a limit at which your graph no longer fits in memory or is computationally intractable to render, but I want Graphviz to run right up to these limits.

How do other people feel? Are large graphs a priority? Should Graphviz be only concerning itself with printable/readable graphs?

My 2c (haven’t thought about this much): The graph abstractions should scale well to whatever memory you have.

However, I wouldn’t expect layout algorithms (placing the charts in 2D space) to scale that well (I’m not up on the algorithms here, I’m just guessing they’re at best linear-time).

I’d be interested in what you mean by being interactively explored or programmatically consumed. Are you talking about UIs that let you pivot on connections to a graph, like a hypertext graph traverser?

The dot language is a neat serialization format for graphs and a good lingua franca for tools that output graphs, but I think Graphviz is mostly focused on, well, graph visualization rather than being (say) a graph database that you can (say) ask questions about the graph using SQL. Would love to hear more about the applications you have in mind?

Something much less fancy than you imagine :slight_smile: We were producing some large SVGs that users would open in Google Chrome to then zoom in and out of. When the SVGs became too large for Chrome to handle we moved to a different specialized viewer (I can’t remember what offhand). When this became unwieldy, we had a script the consumed SVGs and, knowing what the user was interested in, would filter it to a smaller manageable SVG.

None of this was exactly optimal, or possibly even a good idea to begin with. But the accessibility of Graphviz and Chrome made the initial work flow attractive. Then, because SVGs are XML, it was straightforward to filter them in a Python script using XPath navigation etc. At the height of this setup, I think the build system even had some regression tests that scanned Graphviz-generated SVGs for particular expected features.

You could argue we should have been filtering the dot files before input to Graphviz or using a different graph processing system entirely, but it was hard to dispute the convenience of the arrangement we arrived at.

If you’re wondering what the actual use case for the above was, we were graphing data flow between so-called micro-operations (μops) in the execution pipeline of an x86 CPU. It was convenient to experiment with unorthodox scheduling constraints or optimizations and quickly see graphical feedback of the effect they would have on a particular workload. This was a research project, not something we used for production, but it was useful nonetheless.

I just realised I didn’t reply to the original question about dev work going forward. I think I’ll probably focus my efforts on documentation and examples, I’d like to have a world-class reference docs and getting started page.

1 Like

Have you done any major work on this yet? I’m asking because:

  1. Short-term I will update Download | Graphviz to adapt to the new Windows distribution. If you already have major changes coming in, I’ll do this as a bare-bone MVC. Otherwise I might do some more restructuring.
  2. Medium-term I plan to prototype some interactive stuff that might be useful on a getting started page and if you already have done stuff I’d like to adapt.

All the work I’ve done so far has been committed, so go ahead!

In the background I’m evaluating the docsy theme, which codifies in a lot of docs best practices. None of that should affect what you’re doing.

1 Like

I can’t decide what to do next. Please help me.

In your opinion, what would be the single most useful thing for @magjac to work on next?

  • Simplified release version numbering
  • Windows builds using Docker images and updated dependencies to lay the ground for further improvements on Windows
  • Work on any of the self-assigned bugs before doing anything else (please specify which in the comments)
  • Something else (please specify in the comments)

0 voters

I think both better windows CI and simplifying the release process would be important.

I’ll draw a slight distinction and say that the release version numbering is just a means to an end towards a simpler release process for maintainers (our users don’t care very much about what release version numbering scheme we use, version numbers are mostly used by us in bug reports). Do you agree?

2 Likes

Yes I agree, although I haven’t decided yet. The argument for making it simpler for maintainers already now is that going forward it will free up time that can be spent on user improvements instead of wasting it on making (complicated) releases.

absolutely. As a vision, i’d love to one day have an extra pipeline that we can run to say ‘take this build and treat it as a public release number x.y.z’ and have it all be automated. wouldn’t that be nice :-). Perhaps that pipeline sets all the environment variables to the build script to set the version number and build timestamp, and adds a commit to bump the version number. One can dream.

1 Like

For me it’s more than a dream or a vision. I’m already planning to do something like that.

We might even want to go even further to have that as the only way of releasing. In practice we don’t do any manual verification before stable releases anyway. If we would be stringent with requiring merge requests to update the version number and the changelog accordingly, there is no difference between a development release and a stable release. True continuous delivery in other words.

Sounds fantastic. Automating this is important/impactful work, as it’ll save us work in the future for every release

1 Like

The vote was inconclusive, but I’ve started to work on Windows builds using Docker because it seemed like a better opportunity to learn something new and it will make further Windows related CI/CD stuff so much faster and easier.

1 Like

Well, that didn’t turn out well :grimacing:.

I haven’t been able to spend much time on Graphviz the last couple of months, but I have a small hope that that will change in the semi-near future.

One Idea I have is to write a completely new test suite that tests Graphviz from the ground up. I’m thinking of making this completely in C++, generate SVG in memory, parse that and check for correctness. What do you think about such an idea? Any objections? Suggestions to do it otherwise? Problem with that approach?

I’m thinking to write this in modern C++ which means the tests will only run on the latest OS versions since we cannot support > C++11 on all OSs. I’m thinking that these tests should focus on the core stuff of Graphviz that is orthogonal to OSs and output formats.

The main benefits would be:

  1. I get to practice my C++ skills which have gone from zero to…, well slight more than zero, during the last 6 month since I started a new assignment together with a very skilled colleague who is eager to help me learn.
  2. We will get a much better test coverage than we have today and make us more confident to make changes.
  3. We will hopefully get an infrastructure where it’s easy to add new tests even for tricky corner cases.
  4. The tests will run fast since we wouldn’t need to read or write any files.
  5. SVG is easy to parse.

Thoughts?

1 Like

This sounds great!! An improved test suite would probably be the single most beneficial thing to Graphviz right now.

I have questions about the choice of C++ though. Why? With Python you get batteries included (e.g. capable of parsing XML-based docs like SVG without any third party libs), more readable code and no need to manually manage memory. I don’t want to be too negative, but as someone who’s debugged a number of Graphviz segfaults in recent history, I really do not want to be debugging segfaults in the test suite as well.

SVG tests sounds great, nicer than diffing images. And making it easier to add new tests would be nice.

I’d love to be able to simply drop a input & output file in a folder and have the tests automatically run, with no need for ‘writing code’. I think that’s something we could achieve: I’ve seen Typescript compiler do it, and Graphviz is also a compiler :slight_smile:

I do share Matthew’s slight aversion to C++ unit tests (I have to deal with segfaulting tests at work sometimes – not particularly fun). Perhaps it’s just the framework I use at work, but C++ unit tests have been pretty macro-heavy, like the language doesn’t make it simple to write unit tests (maybe other frameworks are better). If it’s fast and in-memory you’re after, perhaps piping the output from a subprocess (and avoiding the file system) would achieve your goal?