Random Number Generation

I previously raised this issue in an issue, but I think that isn’t the best forum for discussion, so let’s discuss the rand issue here instead.

To summarize that issue: I asked Gemini to look at matrix_ops.c and suggest improvement, and it noted several issues with the use of rand:

  1. The random numbers are weak
  2. rand is not thread-safe, so if we are multi-threaded we need locks.
  3. In the power_iteration loop, it can lead to an infinite retry if the randomly chosen vector is not random enough and winds up collinear with all previous vectors.

Of these, only #2 seems like a significant issue. Unfortunately, rand_r (which is thread-safe) is not portable. Gemini suggested some xorshift algorithm that seemed overly simple and of unclear provenance.

In further searches I found https://github.com/imneme/pcg-c-basic, which is an Apache-licensed small C implementation of one of a class of interesting generators called PCG. (It’s a C excerpt/distillation of a C++ library which is available in Ubuntu as libpcg-cpp-dev.) Theres more discussion of the project here, https://www.pcg-random.org/, and the “k-dimensional equidistribution” sounds like an interesting property which addresses #3 as well, but that might not be covered by the simple C version.

I doubt this problem is worth resolving today, but it does raise questions which may arise in more important cases:

  • What is the process to consider adding a new library dependence to graphviz?
  • Would including file from an Apache-licensed C project in graphviz ok, or is that not allowed by the licensing?

-Brian

Also related to this issue is the use of rand() % 100 and rand() % RANGE (where #define RANGE 500) in lib/neatogen/matrix_ops.c. These are used when constructing random vectors of doubles to be orthogonal against other vectors. We throw away randomness by doing the % operations, which Gemini noted, but it’s probably not really hurting the algorithm at all.

OTOH, the division instruction required for % is quite unnecessary there, and is relatively slow (probably 9-12 cycles latency, 1/6 cycles throughput on my CPU).

[outside of my comfort zone, for several reasons]

  • matrix_ops.c only seems to be used by neato
  • I could only trigger init_vec_orth1 if mode=heir and this mode is experimental
  • I could not trigger power_iteration, though a casual wander through the source makes me expect it to be triggered when mode=major (stress majorization - the default)
  • I wonder if there might a be a bug in the stress majorization leg of the code
  • neato performance does not seem to be a hot issue (unlike dot performance)
  • I agree that random number generation needs to be “fixed” sometime, but not necessarily soon
  • no comments on licensing or adding new libraries - outside my knowledge space