Changing the public API to use size_t?

In my attempt to debug Segmentation fault when running test example neatopack.c (#1800) · Issues · graphviz / graphviz · GitLab I’ve started to remove warnings in parts of the source code that the backtrace touches.

I’ve now started to touch things related to sizes. There are an enormous amount of warnings like:

warning: conversion to ‘size_t {aka long unsigned int}’ from ‘int’ may change the sign of the result [-Wsign-conversion]

I don’t think we can rule out that no real bugs are hiding behind such warnings and I would like to get rid of them. Just casting int to size_t upon usage would just hide potential bugs and therefore I would like to change also the types of arguments and local variables to size_t where appropriate. In order for this to be doable, I would like to change the public API to use size_t for sizes where it today uses int or long. As an example:

-Agraph_t** ccomps (Agraph_t*, int*, char*);
+Agraph_t** ccomps (Agraph_t*, size_t*, char*);

Obviously, this would be a breaking change of the API and I guess we would need to step the Graphviz major version to 3 if we were to actually do this.

How do you feel about doing such a change?

Making a breaking change is not something we would want to do often, so if we are going to do it I think we should revise the whole API with respect to types so that it can remain stable once the work is finished

If we were to do it, it would be a quite substantial amount of work and it would take some time before it’s finished, but the code would be much more robust afterwards.

Another example that is not as straightforward is aginit where the rec_size argument means that aginit should do a recursive init if it’s negative. To be able to change the type to size_t, we need to also add a new argument for the recursive option:

-void           aginit(Agraph_t * g, int kind, char *rec_name, int rec_size, int move_to_front);
+void           aginit(Agraph_t * g, int kind, char *rec_name, size_t rec_size, int recur, int move_to_front);

I’ve pushed two commits containing more elaborate versions of the above changes to Commits · use-size_t-in-public-api · Magnus Jacobsson / graphviz · GitLab so you can more easily review what I mean. Obviously it doesn’t compile since I haven’t changed the call sites of aginit.

Please let me know your thoughts.

1 Like

I’m in favor of this. Some of the Google Autofuzz issues indicate many of the calculations involving int can overflow. Using an unsigned type for values that can/should never be negative seems like a significant improvement.

1 Like

It occurred to me that the changes I’m making for #1785 are similarly disruptive. Maybe these should also be targeted at a future version 3.

1 Like

Yes I would think we want to do all of this at the same time.

I’m not sure what to think about autofuzz issues that indicate integer overflow. For example, what if the graph specification makes a node 1000 miles (1600km) wide? Then 2^31 points will overflow. Say you guard against that. The failure will move somewhere else. I’m not sure how well we can plug all these holes. Still, it’s good to try.

1 Like

Sure, there’s always going to be some limit driven by the amount of memory or size of your address space. But I don’t think we should be artificially limiting this even more.

1 Like

Seriously what things in graphviz ever need to be more than even 2^24?

1 Like

Admittedly I don’t have a use case for this currently, but if I did need to work on a larger graph and I was stymied purely by Graphviz using int internally I would find that a little frustrating. My graphs are typically generated, so some do get quite large.

To approach this from the other direction, what is the argument for the type int? At present, I don’t see the rationale for using a signed type for a quantity that should only ever be non-negative.

1 Like

That’s not necessary; one could still pass -1, and although it’d wrap around to SIZE_MAX, you don’t need to cast for rec_size == -1 to compare true in this case.

In POSIX the iconv() function returns (size_t) -1 on failure, for example, where the explicit cast isn’t needed when doing a comparison but is so written in the standard only for clarity. mbrlen() is an ISO C example.

Edit: Well, if you used to take any negative number then that would still be a problem, but at least this way you wouldn’t need to introduce a redundant parameter.

Depending on how it’s used by the aginit() function, if it’s not passed to malloc(), for example, you could consider ssize_t which is often used in place of size_t except for specifying special cases such as this one (I’m not sure it’s ISO C though). getline() does this.

1 Like