Shall we continue to have tests built into the portable source

Since this commit, we’ve had tests built into the portable source. Erwin Janssen continued this path in e.g. this commit and myself in e.g this commit, but now I’m starting to get cold feet since for running rtest in CI from the portable source, we would need to include the data and reference files as well and they are HUGE:

magjac@t440:/tmp/graphviz$ du -h rtest/* | grep "[0-9]M"
3,8M	rtest/graphs
12M	rtest/linux.x86
11M	rtest/nshare
2,5M	rtest/share
2,4M	rtest/windows

I’m not opposed to it per se, but I don’t want to include them without consensus. I’m also not clear of the reason tests are included in the portable source in the first place.

The alternative is to run tests from the repo directly.

What do you think?

I understand the problem if you have to ship different gold results for every os, math lib, gcc version …

OTOH It would be good to support the traditional:
./configure
make
make test
make install

to the maximum extent possible.

As a suggestion, would it be possible to split the tests into portable and non-portable tests, and then only include the portable tests in the portable_sources?

For example, I’m thinking that a lot of structure and functionality could be tested if we filtered all pos info from test and gold text-based results before comparison?

I don’t fully understand what’s being discussed and I don’t know what “the portable source” is, but I’m totally fine with another 30MB in the repository. I have other Git repositories that are exceeding 5GB, so this does not seem “HUGE” to me :wink:

The portable_sources are the contents of the graphviz.tar.gz generated by the first stage of the pipeline.

I feel it would be unfortunate to increase the size of the tar.gz by that much.

The current files are already in the repo. We discussed the portable source tar.gz that is distributed.

Ah I see, the tarball for distribution. Is it 30MB compressed or uncompressed?

33 MB compressed:

magjac@t440:~/graphviz-test$ du -sm artifacts/graphviz-2.45.20200531.1701.tar.gz graphviz-2.45.20200531.1701
33	artifacts/graphviz-2.45.20200531.1701.tar.gz
91	graphviz-2.45.20200531.1701

The tests themselves can be improved a lot later on, but at the moment my primary goal is to deploy something from off-the-shelf in order to instantly get better test coverage so that we detect if we accidentally break something simple.

Short-term I will not include anything new into the tar.gz and make all tests that run in the pipeline use test code directly from the repo and not from the tar.gz. This will also make it much easier to work with the tests since you don’t have to rebuild the tar.gz every time you change something locally (or hack the script to avoid that).

Sorry, not sure if you’re asking if we should not include the test data in CI or in the distro - is the distro what ends up getting packaged for end-user fedora/debian users? Does the distro also end up being sent to CI for testing?

Sorry, I used the wrong terminology in my original post. I meant portable source when I wrote distro and updated it later. The portable source is built on any platform, then for a specific platform this source is built into binary packages (.deb for Ubuntu/Debian, .rpm for Fedora/Readhat/Centos). I don’t think that the distro term is relevant here. It means distribution, which is Ubuntu XXX, Centos XXX, Fedora XXX etc. What ends up in them is beyond our control. E.g. in Ubuntu 18.04, it’s Graphviz 2.40.1 and in Ubuntu 20.04, it’s 2.43.0 (this is something I’d like to come back to later; how do we make sure updated Graphviz version gets picked up in the different distros?).

The portable source is just a tar.gz file and my question was whether we should include all test data in it or not.

CI is not any kind of container, it’s a process. In our CI/CD pipeline we first build the portable source on one platform and then the binary packages for each platform, then install those packages and run tests on them for each platform. In CI we could have had the choice to use test code and data from the portable source or from the git repo directly, but since we now have decided not to include all test data in the portable source, we must at least get that data from the git repo directly. My choice was then to get all test code and data from the git repo. Frankly, I think this is better for reasons I stated above.

For another user of the portable source, it’s still possible to run some of the tests from the portable source, but not those which are not (fully) included.

I think @Ellson can fill in why people want the portable source. I think its an old pattern that is still in use in different contexts, but I’m just guessing.

Ah, sorry about the miscommunication, it was due to me only seeing the first unedited post in my email inbox and none of the subsequent threads. I’ll do as you and click through to the forum before replying next time.

1 Like

The tar.gz are also the content of the graphviz.src.rpm, along with a graphviz.spec file.

The binary rpm packages can be built from the src.rpm. Traditionally this was the format for official releases, and probably still is.

In addition to the tar.gz, the first stage pipeline also provides (supposed to provide) a single point of version number generation. The version is in the tar.gv filename and in a VERSION file contained in the tar.gz.

Currently Windows and Apple builds miss this and so we make mistakes in all the additional places that need version. The idea of a single version number is that a binary release on any platform can always be traced back to a particular commit, and that the features of the release on different platforms are comparable with each other

Did i understand you to say that the Ubuntu builds are also missing this version info?

The first stage pipeline also execute any programs that are portable, such as the automake, autoconf steps to generate ./configure and Makefiles, and the generation of PDF man pages,

My opinion on the test data is that not all test data needs to be in the tar.gz. I would like to see
some minimal testing available to folks building from the tar.gz or the src.rpm, but I think that the CI test pipleline stage can have much more extensive tests.

1 Like

No. I just mentioned that they often have ancient versions of Graphviz available for install.