Random Number Generation

I previously raised this issue in an issue, but I think that isn’t the best forum for discussion, so let’s discuss the rand issue here instead.

To summarize that issue: I asked Gemini to look at matrix_ops.c and suggest improvement, and it noted several issues with the use of rand:

  1. The random numbers are weak
  2. rand is not thread-safe, so if we are multi-threaded we need locks.
  3. In the power_iteration loop, it can lead to an infinite retry if the randomly chosen vector is not random enough and winds up collinear with all previous vectors.

Of these, only #2 seems like a significant issue. Unfortunately, rand_r (which is thread-safe) is not portable. Gemini suggested some xorshift algorithm that seemed overly simple and of unclear provenance.

In further searches I found https://github.com/imneme/pcg-c-basic, which is an Apache-licensed small C implementation of one of a class of interesting generators called PCG. (It’s a C excerpt/distillation of a C++ library which is available in Ubuntu as libpcg-cpp-dev.) Theres more discussion of the project here, https://www.pcg-random.org/, and the “k-dimensional equidistribution” sounds like an interesting property which addresses #3 as well, but that might not be covered by the simple C version.

I doubt this problem is worth resolving today, but it does raise questions which may arise in more important cases:

  • What is the process to consider adding a new library dependence to graphviz?
  • Would including file from an Apache-licensed C project in graphviz ok, or is that not allowed by the licensing?

-Brian

Also related to this issue is the use of rand() % 100 and rand() % RANGE (where #define RANGE 500) in lib/neatogen/matrix_ops.c. These are used when constructing random vectors of doubles to be orthogonal against other vectors. We throw away randomness by doing the % operations, which Gemini noted, but it’s probably not really hurting the algorithm at all.

OTOH, the division instruction required for % is quite unnecessary there, and is relatively slow (probably 9-12 cycles latency, 1/6 cycles throughput on my CPU).

[outside of my comfort zone, for several reasons]

  • matrix_ops.c only seems to be used by neato
  • I could only trigger init_vec_orth1 if mode=heir and this mode is experimental
  • I could not trigger power_iteration, though a casual wander through the source makes me expect it to be triggered when mode=major (stress majorization - the default)
  • I wonder if there might a be a bug in the stress majorization leg of the code
  • neato performance does not seem to be a hot issue (unlike dot performance)
  • I agree that random number generation needs to be “fixed” sometime, but not necessarily soon
  • no comments on licensing or adding new libraries - outside my knowledge space

I’m partly repeating my comments on Gitlab, but…

I’m willing to be persuaded I’m wrong, but my current position is that this is irrelevant because this is not crypto code.

Yes, as I said, #2558.

As previously mentioned, I agree.

As I said previously, it’s not clear to me we have the latitude to change the RNG in use. We have no visibility into how/if downstream users are relying on this.

I would be extremely surprised if you can find a Graphviz workload where this division operation shows up as a hot spot. Sure, division is more expensive than shifts/rotates/etc., but the cost of this is almost certainly overwhelmed by all the other non-trivial computation going on in these algorithms.

Yes, I wouldn’t do anything with this.