Some vmalloc questions

Graphviz contains its own allocator library, vmalloc (lib/vmalloc). This seems to provide a malloc-like interface with some extra tracing and profiling support. I’m trying to follow the motivation for this library. I would have thought Graphviz is not performance sensitive enough to need its own allocation strategies and could just call malloc directly. If there is a need to debug allocation itself, I would think you could do this by -Wl,--wrap malloc and friends or using LD_PRELOAD.

Maybe a level of indirection like this is necessary for Windows compatibility? The comments indicate some of this code originated ~1994, so maybe there simply wasn’t sufficient malloc standardization in the mid-90s? I browsed the commit log for this directory but did not manage to learn much. Can anyone else provide a bit of context here?

Short answer: correct, we do not need vmalloc.

Answer: at the time we wrote graphviz we thought it could grow into a more general purpose graph programming environment, so we were overly impressed with concepts like, let each graph have its own malloc heap (which is a feature of vmalloc), so de-allocations could be very fast. We didn’t ask why this was so important. It wasn’t.

Too much information: in that era others in our lab in AT&T were writing their own POSIX environment, including not only vmalloc (Phong Vo’s malloc), but also sfio (safe/fast I/O to replace old Unix stdio which had certain reliability problems), and IIRC libexpr was written to support a local version of Unix find, and we borrowed it for gvpr (graphviz processor, which is still in graphviz). Also IIRC the compelling application for vmalloc and sfio was Daytona, an internal database project, which is probably still supporting a vast realtime/history call detail/SS7 warehouse for fraud avoidance and CALEA.

2 Likes

Great explanation. Thanks for the detailed reply, Stephen! I’ve read about Daytona elsewhere before, so it’s interesting to learn of the connection to that as well.

Does this mean in the long term we should look to remove vmalloc (and sfio)? From the commit log and the Gitlab issues, vmalloc does not seem to be the cause of many problems. However, it seems like removing it would ease maintenance and the cognitive load on devs. What led me to vmalloc in the first place was reading an issue from a FreeBSD user discovering Graphviz relies on sbrk().

Removing vmalloc may also slightly accelerate Graphviz, as the compiler could more easily spot malloc/free pairs to elide. It would also enable tools like ASan and Valgrind to spot more leaks and use-after-free. I suspect currently any such issues involving vmalloc-ed pointers would be invisible to these tools because they would not recognize that vmalloc is the allocator implementation and its internal pointers should not be counted as references.

What do you think?

1 Like

Yes, I’m in favor of improving our ability to use modern heap checkers.

At one point I spent some time studying valgrind output. It’s tedious because it catches leaks in dependencies like cairopango and fontconfig, and I’m concerned mainly with O(n) leaks not O(1) leaks which are basically noise. Then valgrind stopped working in OSX and it appears there may be strong competition based on compile-time instrumentation.

1 Like

Indeed Address Sanitizer more or less subsumes the memcheck part of Valgrind now. I still use both though, as sometimes Valgrind is more familiar to me.

Sounds like we’re good to remove vmalloc. Thanks for taking the time to walk me through it. My plan for doing this would be to (1) one-by-one switch all clients to use malloc/free, then (2) remove vmalloc itself.

I haven’t come across any allocation call sites in Graphviz that can gracefully tolerate failure, so perhaps calling a malloc wrapper that exits on failure would be preferable. This is essentially lib/common/memory.c, but these functions should probably call exit() when they fail. I think these should also provide a calloc wrapper to avoid multiplication like in N_GNEW that floods my terminal with compiler warnings about possible integer overflow.

Please let me know if any of the above sounds undesirable.

1 Like

BTW I believe that libexpr depends on sfio (I think specifically the ability to push strings back into an open FILE channel) so not as good a candidate to remove.

1 Like

Sounds good Matthew.

What do you mean by switching clients (what’s a client? is it a data structure, function call or a library/binary?).

If you went caller-by-caller I’d guess the main risk in this would be mismatching vmalloc/free or malloc/vmfree (i.e. if we migrated a vmalloc to malloc but not the corresponding vmfree call to free for a piece of data?). I’d be particularly worried if we have anything that’s, say, freeing a void*, it’ll be hard to match that up to the malloc call.

(this is all very high level, I don’t have a lot of experience with malloc, just my 2c)

As a Java programmer, exiting on malloc failure sounds reasonable to me :wink: ;-). Retrofitting on failure tolerance throughout would be prohibitively hard.

Sorry my terminology was probably a little unclear. By “client” I meant going library-by-library, swapping out vmalloc calls.

You’re right that there’s the possibility of mismatching a pair of these and thereby ending up with a broken intermediate commit. I’m not so worried about void* situations, but complex control flow could certainly make it difficult to pair some malloc/frees.

My instinct was to do this in a fine grained way across multiple commits. I usually do this to ease the pain of anyone who has to bisect across changes like this in future. But I can do the changes as a monolithic commit if you think that’s safer?

As a Java programmer, exiting on malloc failure sounds reasonable to me

In general, calling exit() within a C/C++ library is considered bad practice. It’s better to return an error code or throw an exception to give the caller a chance to recover and/or take an alternative action. However, in the real world very few C/C++ programs attempt to handle allocation failures. This is partly because they’re rare (due to e.g. Linux overcommit) and partly because most programs have no reasonable response to an allocation failure.

The malloc/free wrappers in Graphviz are within libcommon, which seems to me not intended for use outside Graphviz and is really more part of the application support logic than a reusable library. So I think the minor sin of calling exit() in this case is fine. It’s probably an improved user experience over the current behavior of returning a null pointer that the caller is not expecting.

2 Likes

I’m strongly in favor of small commits for the same reason and also because they’re so much easier to review for correctness up-front.

For this kind of application a “stop-the-line” approach is much better than trying to recover in ways that might hide the problem for the user. Therefore I’d prefer throwing an exception or exiting over recovery actions that may alter the behavior depending on the amount of memory you have (of course there can be exceptions to this in certain cases, but as a general rule of thumb)

BTW, my experience is that “anyone” often is myself.

1 Like

Agree small commits are great for reviewers.

I’ll just throw out another idea to consider: instead of going library-by-library, swapping out calls to vmalloc to malloc, (running the slight risk of missing migrating some matching calls), I wonder if it would be possible to incrementally remove bits of the vmalloc library until the vmalloc library is just a raw alias, a no-op proxy to the underlying malloc library?

That would still be small commits (inside the vmalloc library) and once vmalloc is just an alias for malloc, it would be totally safe to replace vmalloc with malloc throughout the codebase (no worries about missing matching calls). WDYT?

This might not be possible if the API signatures or behaviours differ by a lot.

Good idea @mark.

Maybe I’m missing something, but I’m assuming that we would do this in a branch and finish it completely before merging it back to master. Would it be such a big deal if there was a mismatch that caused a crash, error or memory leak while there’s WIP as long as it’s working and consistent once we merge it back to master?

1 Like

Yes, this is definitely a possibility. What you describe is similar to the strangler pattern. I think this would need some preceding build system changes to make libcommon a dependency of vmalloc.

That’s doable, but we lose most of the advantages of fine grained commits if the intermediate states are broken. Having said that, incrementally landing everything directly in master isn’t ideal if we suddenly want to cut a release and the current state is “vmalloc half removed.”

What about if I do this in four stages, each one proposed as a merge request to master:

  1. Make libcommon malloc wrappers exit() on failure
  2. Make vmalloc functions call libcommon wrappers
  3. Swap all vmalloc calls out for the libcommon wrappers
  4. Remove vmalloc

Each of these three can themselves be a set of fine grained commits. And I believe Gitlab lets you set an MR dependent on another, so we could even have them all open and reviewed at once if that’s desirable. What do you think?

1 Like

Would it be such a big deal if there was a mismatch that caused a crash, error or memory leak while there’s WIP as long as it’s working and consistent once we merge it back to master?

It’s not a huge deal, but keeping every commit working in isolation opens up the ability to automatically bisect commits to find bugs, that can be a pretty powerful tool when tracking down where a regression was introduced. I don’t know if the commits are mostly green right now though.

What about if I do this in four stages, each one proposed as a merge request to master

This approach sounds good to me. Depending on how big step 3 “Swap all vmalloc calls out for the libcommon wrappers” ends up being (I don’t have a solid idea of how big that PR would be - maybe it would be small?), you might want to do as a series of smaller PRs to make it easier to make progress on the reviews incrementally.

Oh, wait, there are only 14 calls to vmalloc (and 19 to vmfree). I had assuming there would be hundreds, this is a much smaller change than I thought. In that case you probably won’t need to split up the step 3.

$ rg "vmalloc\(" | wc -l
      14

Wow, 14 calls, maybe I should have grepped before coming up with a 5 year plan… :wink: 11 of those are also within vmalloc itself. I started poking around a bit more and I realized it gets a bit more interesting when you start grepping for its cousins, vmstrdup and friends from lib/vmalloc/vmalloc.h. I’m still inclined to do this over several small commits.

There are a couple of first steps that seem like a win even if we don’t go on to replace vmalloc:

  1. make libcommon wrappers exit on allocation failure
  2. introduce a calloc wrapper

I’ll go ahead with these for now. Let me know if you object.

I thought any calls to vmalloc were #ifdefed so graphviz is not dependent on it. At least, that was always the intent. It should be OK to simply remove.

Maybe… Now that I try, I can’t actually seem to get vmalloc to execute at runtime, but I also can’t see how the vmalloc/vstrdup/etc calls are mapped to something else. Any clues?

I think that we may be talking past each other. I’m still pro small commits, I just wanted to question the need to merge back to master until “consummatum est”, i.e. when you have finalized everything. If you during your work in the branch find bugs, memory leaks or whatever in your earlier commits in the branch, you just git commit --fixup them and just before (but not earlier) the merge back to master you git rebase -i --autosquash them to have a clean git history for the future. This gives you many small commits without broken intermediate states.

I completely agree. See my answer to @smattr above.