I previously raised this issue in an issue, but I think that isn’t the best forum for discussion, so let’s discuss the rand issue here instead.
To summarize that issue: I asked Gemini to look at matrix_ops.c and suggest improvement, and it noted several issues with the use of rand:
The random numbers are weak
rand is not thread-safe, so if we are multi-threaded we need locks.
In the power_iteration loop, it can lead to an infinite retry if the randomly chosen vector is not random enough and winds up collinear with all previous vectors.
Of these, only #2 seems like a significant issue. Unfortunately, rand_r (which is thread-safe) is not portable. Gemini suggested some xorshift algorithm that seemed overly simple and of unclear provenance.
In further searches I found https://github.com/imneme/pcg-c-basic, which is an Apache-licensed small C implementation of one of a class of interesting generators called PCG. (It’s a C excerpt/distillation of a C++ library which is available in Ubuntu as libpcg-cpp-dev.) Theres more discussion of the project here, https://www.pcg-random.org/, and the “k-dimensional equidistribution” sounds like an interesting property which addresses #3 as well, but that might not be covered by the simple C version.
I doubt this problem is worth resolving today, but it does raise questions which may arise in more important cases:
What is the process to consider adding a new library dependence to graphviz?
Would including file from an Apache-licensed C project in graphviz ok, or is that not allowed by the licensing?
Also related to this issue is the use of rand() % 100 and rand() % RANGE (where #define RANGE 500) in lib/neatogen/matrix_ops.c. These are used when constructing random vectors of doubles to be orthogonal against other vectors. We throw away randomness by doing the % operations, which Gemini noted, but it’s probably not really hurting the algorithm at all.
OTOH, the division instruction required for % is quite unnecessary there, and is relatively slow (probably 9-12 cycles latency, 1/6 cycles throughput on my CPU).
I could only trigger init_vec_orth1 if mode=heir and this mode is experimental
I could not trigger power_iteration, though a casual wander through the source makes me expect it to be triggered when mode=major (stress majorization - the default)
I wonder if there might a be a bug in the stress majorization leg of the code
neato performance does not seem to be a hot issue (unlike dot performance)
I agree that random number generation needs to be “fixed” sometime, but not necessarily soon
no comments on licensing or adding new libraries - outside my knowledge space
As I said previously, it’s not clear to me we have the latitude to change the RNG in use. We have no visibility into how/if downstream users are relying on this.
I would be extremely surprised if you can find a Graphviz workload where this division operation shows up as a hot spot. Sure, division is more expensive than shifts/rotates/etc., but the cost of this is almost certainly overwhelmed by all the other non-trivial computation going on in these algorithms.