File reading is quite forgiving - is this a good thing?

Mark-Atwell · January 5, 2022, 6:37pm

I’ve used Graphviz (GV) on & off for quite a few years, but I recently encountered a problem reading a file using GraphStream (GS). Their DOT file-format reader barfed on a few things where GV itself ‘swallowed happily’. I did eventually correct the input-file, but it’s really down to that GS seems to follow the defined format ‘to the letter’, whereas GV (ironically) doesn’t. I’m not sure that the latter is a good thing. Whilst I’d be wary of jumping to strict enforcement, likely breaking a number of previously working input-files (how many ‘broken’ web-pages existed because IE was just too forgiving?!), perhaps a ‘strict’ option could be introduced to ease transition and a diagnostic/warnings system to highlight the problems?

GS has a ‘JJ’ file that it uses to drive the tokenization and parsing. Perhaps it could be leveraged? See: In gs-core org.graphstream.stream.file.dot.DOTParser

steveroush · January 5, 2022, 7:00pm

Could you give some examples where GraphStream had problems?

Mark-Atwell · January 5, 2022, 7:23pm

Sorry, should have included some examples. GS barfed on this, which isn’t strictly correct, but GV was happy enough with:

SomeId [shape=“ellipse” label=“Feature: Some feature or other” color=red];

GS was happy with this (and, strictly, it is correct):

SomeId [shape=ellipse,label=“Feature: Some feature or other”, color=red];

IIRC, GS was complaining about expecting a comma or a semicolon. I removed the apparently-unnecessary double-quotes because GS hadn’t (so far) complained about them, but again, they don’t strictly adhere to the language-spec (I think?!).

steveroush · January 5, 2022, 11:34pm

The language spec (DOT Language | Graphviz) says commas and semi-colons are just fluff.

Semicolons and commas aid readability but are not required. Also, any amount of whitespace may be inserted between terminals.

It also lists 4 alternative ID definitions. Only one is a string bounded by double-quotes.

It seems GraphStream is too restrictive.

smattr · January 6, 2022, 3:37am

I don’t have a lot of context for this (what is GraphStream?), but you are right that the Graphviz parsing logic is forgiving to a fault. It attempts to continue in the face of almost anything and everything. You can see this in the Gitlab issues filed by Google Autofuzz. The parser will continue in the face of things like invalid unicode and then crash rather noisily.

I’ve been meaning to enumerate these cases and turn them into hard exits. This isn’t great behavior in library code, but it’s better than attempting to continue with a garbage input stream.

Note that the grammar as described on the website is also not exactly what the implementation accepts. The implementation is much more loose, in ways that probably should be examined and rethought.

Mark-Atwell · January 6, 2022, 6:04am

This is GraphStream - the demo video is very impressive: GraphStream - GraphStream - A Dynamic Graph Library (graphstream-project.org)

I think that GraphStream has probably gone for the more restrictive input because, at the end of the day, it’s likely easier to handle. To that end, it’s not a particularly important problem. But as smattr indicates, perhaps some tightening-up makes sense? I’m not sure about jumping to hard exits in a single-bound is wise though. It’ll likely break a lot of existing inputs. Perhaps warnings for now and a hard-exit option/flag?

smattr · January 6, 2022, 6:31am

I’m not sure about jumping to hard exits in a single-bound is wise though. It’ll likely break a lot of existing inputs. Perhaps warnings for now and a hard-exit option/flag?

There is some use case for pumping garbage bytes into Graphviz? I am not aware of any use case that would be broken by bailing out immediately on obvious syntax errors.

My comment about hard exits being bad was that in general calling exit in a library is not good. The callee has no idea what the caller’s use case/context is. But in Graphviz many of the functions have no mechanism for indicating failure. Our options are:

continue on bad input and crash (current de facto behavior)
exit
break API

(3) is not impossible, but it is much more invasive and has compounding downstream effects.

Mark-Atwell · January 6, 2022, 7:02am

I’d go for 1a: generate a warning and continue (for now)?

smattr · January 6, 2022, 4:20pm

There are already warnings for the cases I was thinking about.

Topic		Replies	Views
Strange graphviz syntax in pydot parser -- valid/useful anywhere? Dev	8	205	July 13, 2024
Grammaar in document icorrect Dev	7	792	February 1, 2025
Comma separated list of node IDs Help	6	950	March 27, 2022
Dot.exe no longer works Help	7	1287	March 23, 2023
Bison switching to .gv Graphviz output Show and Tell	2	1298	July 25, 2020

File reading is quite forgiving - is this a good thing?

Related topics