"fatal error C1060: compiler is out of heap space" for lib/edgepaint/lab_gamut.c

Wow, that’s painful. Seems the limits can be even more severe than that too. This actually seems to stem from some clause in C99:

The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:¹³

  • 65535 bytes in an object (in a hosted environment only)


¹³ Implementations should avoid imposing fixed translation limits whenever possible.

I guess we could try chunking this data into multiple strings. It’s perverse to be doing this dance just to make the compiler successfully shove a bunch of literal bytes into the output binary. We could try bypassing the compiler and passing this as a blob to the linker, but my experience is that getting this working cross-platform is more trouble than it’s worth.

Maybe we should take a step back and solve a higher order problem: what actually is this data? It looks vaguely regular to me. Could we compute it programmatically instead of shipping a 10MB look-up-table in the binary?

Yes, I was thinking along those lines already in my original post:

Any insights from @erg, @scnorth or @Ellson?

I’ll ask Emden. Possibly this table can be computed on demand.

What about asking Gitlab for a way to increase the memory size available to the compiler?
Surely this can be set in our XML config file.

Thanks.

No, there’s no memory options in the GitLab CI config YAML file and the Windows runners are still in beta with a fixed RAM size of 7.5GB. It might however be a 32-bit issue as both @smattr & @mark mentions above and then increasing the RAM size wouldn’t help anyway.

Related, but solved and not what we are seeing:

I am totally out of my depth here, but wonder if we can just move to 64 bits? I know, repeating something you already said.
We can try to get someone’s attention at GitLab.

Worth noting is also that the Linux shared runners only have 3.75 GB RAM.

Yes, it is data Yifan Hu used for a color palette. We have had problems
with the size in the past. I remember playing with some ways of
generating it at build/install time or dynamically as needed. I got
sidetracked when things fell apart. Let me refresh my memory about it.

1 Like

After some detailed research (i.e. chain-reading Wikipedia), I think this data is the visible portion of the CIELAB color space under a certain configuration. I tried to learn how to compute this programmatically, eventually coming across this SO post that links to this helpful site. With a combination of that and staring at the lab.c code, I came up with the Python below that almost reproduces the data. I think remaining discrepancies are due to precision loss or order-of-operations differences I’ve missed somewhere.

The main takeaways are:

  • yes, we can compute this data programmatically at runtime, however…
  • it seems prohibitively expensive to do so

Having said that, knowing how to compute it we can probably come up with a more compact/efficient representation of it.

def srgb_to_xyz(sR, sB, sG):                                                                                                                                                                                   [0/914566]

  var_R = sR / 255.0
  var_G = sG / 255.0
  var_B = sB / 255.0

  if var_R > 0.04045:
    var_R = ((var_R + 0.055) / 1.055) ** 2.4
  else:
    var_R /= 12.92
  if var_G > 0.04045:
    var_G = ((var_G + 0.055) / 1.055) ** 2.4
  else:
    var_G /= 12.92
  if var_B > 0.04045:
    var_B = ((var_B + 0.055) / 1.055) ** 2.4
  else:
    var_B /= 12.92

  var_R *= 100
  var_G *= 100
  var_B *= 100

  X = var_R * 0.4124 + var_G * 0.3576 + var_B * 0.1805
  Y = var_R * 0.2126 + var_G * 0.7152 + var_B * 0.0722
  Z = var_R * 0.0193 + var_G * 0.1192 + var_B * 0.9505

  return X, Y, Z

# D65 1931
Reference_X = 95.047
Reference_Y = 100.000
Reference_Z = 108.883

def xyz_to_cielab(X, Y, Z):

  var_X = X / Reference_X
  var_Y = Y / Reference_Y
  var_Z = Z / Reference_Z
  # if var_X > 0.008856:
  if var_X > 0.008856451679035631:
    var_X = var_X ** (1 / 3.0)
  else:
    # var_X = 7.787 * var_X + 16 / 116.0
    var_X = (903.2962962962963 * var_X + 16) / 116.0
  # if var_Y > 0.008856:
  if var_Y > 0.008856451679035631:
    var_Y = var_Y ** (1 / 3.0)
  else:
    # var_Y = 7.787 * var_Y + 16 / 116.0
    var_Y = (903.2962962962963 * var_Y + 16) / 116.0
  if var_Z > 0.008856451679035631: # 0.008856:
    var_Z = var_Z ** (1 / 3.0)
  else:
    # var_Z = 7.787 * var_Z + 16 / 116.0
    var_Z = (903.2962962962963 * var_Z + 16) / 116.0

  CIE_L = 116 * var_Y - 16
  CIE_a = 500 * (var_X - var_Y)
  CIE_b = 200 * (var_Y - var_Z)

  return CIE_L, CIE_a, CIE_b

colors = set()

for r in range(256):
  for g in range(256):
    for b in range(256):

      x, y, z = srgb_to_xyz(r, g, b)
      l, a, b = xyz_to_cielab(x, y, z)

      # round all values
      l = int(l)
      a = int(a)
      b = int(b)

      # reject out of range
      if l < 0 or l > 100:
        continue
      if a < -128 or a > 127:
        continue
      if b < -128 or b > 127:
        continue

      colors.add((l, a, b))

for color in sorted(colors):
  print(color)
1 Like

In a stackoverflow article it is claimed that in a somewhat similar situation, declaring a large array as static reduced compile time dramatically. Unfortunately we have two source files that may need to access this table - possibly they need to be merged?

$ grep lab_gamut.h *.c
lab.c:#include "lab_gamut.h"
lab_gamut.c:#include <lab_gamut.h>
$ 

I noticed our interesting ideas about include files. Also this code,

double *lab_gamut_from_file(char *gamut_file, const char *lightness, int *n){
  /* give a list of n points  in the file defining the LAB color gamut. return NULL if the mgamut file is not found.
     lightness is a string of the form 0,70, or NULL.
   */

mgamut? A string of the form 0,70, or NULL? Chuckle.

I think I’ve concluded that this is not a 32/64 bit issue. After an almost endless struggle with using environment variables in PowerShell that are defined in Visual Studio .bat files I finally managed to compile using the HostX64 compiler. I still fails, but it’s only the 32-bit jobs that gives any detailed error message. The pipeline is here: Pipeline · Magnus Jacobsson / graphviz · GitLab. I’d appreciate another pair of eyes to check the logs so I haven’t come to the wrong conclusion. Since we don’t get any detailed error message for the 64 bit builds I guess it would be possible that the 64 bit build encounters a completely different problem, but I doubt it.

~~I also tried using static without any luck: Pipeline · Magnus Jacobsson / graphviz · GitLab

EDIT: This failed for other reasons. I’ll get back with results.

Results from static attempt: Pipeline · Magnus Jacobsson / graphviz · GitLab. No luck.

Impressive digging @smattr :muscle:.

How performance-critical is this functionality? We could require users of this library to call an initialization function on startup that computes this table. As an added benefit, we could construct the table as doubles (rather than chars) and then drop the heap allocation in lab_gamut() and just return a const pointer into the table.

Is this worth exploring? If so, I can write up some patches for this so we can test how it behaves.

1 Like

What’s wrong with making it static (private) data and accessing through a function?

This seems like really a bad problem with the Microsoft Visual Studio C compiler.

I think others have broken up large tables into smaller tables and concatenated them.

The main point is that the algorithm is too slow to run every time. I’m also against penalizing other platforms because of this poor behavior in Visual C.

1 Like

It doesn’t help. See above. Or am I misunderstanding?

I have to call it a day here now, but I just wanted to mention that I’ve managed to compile it using HostX86 (32-bit) (but not using HostX64 (64-bit mode)) here: Pipeline · Magnus Jacobsson / graphviz · GitLab.

In these jobs I’m not using msbuild, but the MSVC compiler (cl) directly.

Exactly what I did to get there, I’m going to backtrack tomorrow, since unfortunately, in the latest series of commits leading up to this working one, I used only HostX64 so it’s going to take some bisecting. I’m not even sure that I’m compiling it in a way that is usable in the appropriate context. I also have to find out how to make msbuild set the cl options to what I’ve manually set them here.

Worth noting is also that this version is using a plain array with 1000 data points per row, not an array of structs with one struct per row, but I don’t know if this is necessary or not. In solitude, that didn’t help anyway.

1 Like

Wow, nice work!! I was beginning to think it would be impossible to get MSVC to work without significant code changes.

I can’t immediately tell what the key change to make it work was. Maybe msbuild is passing some different flags or a different environment to cl?

If we can figure out a way to decrease memory usage when compiling this file, I think we should still apply it to aid anyone building on an underpowered machine.

1 Like

Two things seem necessary to be able to build on the GitLab Windows runner:

  1. Disable code analysis completely (there are flavors)
  2. Rewrite lab_gamut.c to use a plain array instead of an array of structs.

I measured the memory consumption on my local laptop with different settings:

Code analysis Plain array Memory consumption Builds on GitLab
Yes No 3212 MB No
Yes Yes 2813 MB No
No No 969 MB No
No Yes 726 MB Yes

Unless I hear objections I’m going to pursue this.

TODO:

  • Rewrite lab.c (the consumer of lab_gamut.c) to use the plain array and test it.
  • Clean up the lab_gamut project settings to get a MVC.
  • Clean up the GitLab CI configuration and merge with current Linux builds.
  • Add deploy of Windows binaries to https://www2.graphviz.org/Packages/
  • Create MR for review
1 Like

I’m now finished with everything except deployment. To test this I’m going to have to push to a branch (or branches) in the master repo in order to be able to use the necessary tokens to deploy to https://www2.graphviz.org. I will name those branches magjac-<something>. I will also use some kind of magjac directory on https://www2.graphviz.org so I don’t interfere with the normal distribution.

This may lead to that you will get email about failing pipelines. As long as it looks like it’s coming from one of my branches, you can safely ignore them.