I’ve used Graphviz (GV) on & off for quite a few years, but I recently encountered a problem reading a file using GraphStream (GS). Their DOT file-format reader barfed on a few things where GV itself ‘swallowed happily’. I did eventually correct the input-file, but it’s really down to that GS seems to follow the defined format ‘to the letter’, whereas GV (ironically) doesn’t. I’m not sure that the latter is a good thing. Whilst I’d be wary of jumping to strict enforcement, likely breaking a number of previously working input-files (how many ‘broken’ web-pages existed because IE was just too forgiving?!), perhaps a ‘strict’ option could be introduced to ease transition and a diagnostic/warnings system to highlight the problems?
GS has a ‘JJ’ file that it uses to drive the tokenization and parsing. Perhaps it could be leveraged? See: In gs-core org.graphstream.stream.file.dot.DOTParser
Sorry, should have included some examples. GS barfed on this, which isn’t strictly correct, but GV was happy enough with:
SomeId [shape=“ellipse” label=“Feature: Some feature or other” color=red];
GS was happy with this (and, strictly, it is correct):
SomeId [shape=ellipse,label=“Feature: Some feature or other”, color=red];
IIRC, GS was complaining about expecting a comma or a semicolon. I removed the apparently-unnecessary double-quotes because GS hadn’t (so far) complained about them, but again, they don’t strictly adhere to the language-spec (I think?!).
I don’t have a lot of context for this (what is GraphStream?), but you are right that the Graphviz parsing logic is forgiving to a fault. It attempts to continue in the face of almost anything and everything. You can see this in the Gitlab issues filed by Google Autofuzz. The parser will continue in the face of things like invalid unicode and then crash rather noisily.
I’ve been meaning to enumerate these cases and turn them into hard exits. This isn’t great behavior in library code, but it’s better than attempting to continue with a garbage input stream.
Note that the grammar as described on the website is also not exactly what the implementation accepts. The implementation is much more loose, in ways that probably should be examined and rethought.
I think that GraphStream has probably gone for the more restrictive input because, at the end of the day, it’s likely easier to handle. To that end, it’s not a particularly important problem. But as smattr indicates, perhaps some tightening-up makes sense? I’m not sure about jumping to hard exits in a single-bound is wise though. It’ll likely break a lot of existing inputs. Perhaps warnings for now and a hard-exit option/flag?
I’m not sure about jumping to hard exits in a single-bound is wise though. It’ll likely break a lot of existing inputs. Perhaps warnings for now and a hard-exit option/flag?
There is some use case for pumping garbage bytes into Graphviz? I am not aware of any use case that would be broken by bailing out immediately on obvious syntax errors.
My comment about hard exits being bad was that in general calling exit in a library is not good. The callee has no idea what the caller’s use case/context is. But in Graphviz many of the functions have no mechanism for indicating failure. Our options are:
continue on bad input and crash (current de facto behavior)
exit
break API
(3) is not impossible, but it is much more invasive and has compounding downstream effects.