Grammar questions

DerMolly · September 13, 2020, 2:35pm

Hey,

I’m currently working on a parser for the DOT Language. For that I used the language info page1 and it seems to work all right. Then I searched for some examples to test the parser and I found some examples that work with the dot cli tool, but seems to be wrong when looking at the grammar1.

digraph D {
    node [fontname="Arial"];

    node_A [shape=record    label="shape=record|{above|middle|below}|right"];
    node_B [shape=plaintext label="shape=plaintext|{curly|braces and|bars without}|effect"];
}

which outputs

I’m puzzled by the node_A and node_B as node is stated to be a keyword and

Obviously, to use a keyword as an ID, it must be quoted.

My Parser identifies node as the keyword and does not know what to do with the ID _A or _B as the rules specifies [ after the keyword node. I guess the whole things get’s interpreted as a node_stmt ( node_stmt : node_id [ attr_list ] ) and node_Aandnode_Barenode_id`s.

digraph D {
  A -> {B, C, D} -> {F}
}

which outputs

     A
  /  |  \
 B   C  D
  \  |  /
     F

(new user can’t upload more than one image)

Here the problem seems to be the , separating the different node_ids in the subgraph ({B, C, D}). According to the grammar]1] the only separator in between stmts (or node_ids in this case) should be ; (or just a space). Commas are only for separating a_lists (alongside ;) (a_list : ID ‘=’ ID [ (‘;’ | ‘,’) ] [ a_list ]).

Am I missing something or is either the grammar outdated or the dot cli tool a bit leaner with the grammar?

smattr · September 13, 2020, 6:35pm

I’m not sure I fully understand your questions, but I’ll attempt an answer.

node is a keyword when it appears as a word itself, not as a prefix of other identifiers. There is nothing special about the identifiers node_A and node_B. You could equally well call these foo_A and bar_B.
I think the webpage is inaccurate about the grammar. The source itself contains the following rules:

stmtlist 	:	stmtlist stmt |	stmt ;

optsemi		: ';' | ;

stmt		:  attrstmt  optsemi
			|  compound	 optsemi
			;

compound 	:	simple rcompound optattr
					{if ($2) endedge(); else endnode();}
			;

simple		:	nodelist | subgraph ;

So I think B, C, D in your example is parsed as a nodelist.

I’m curious about your parser. Could you say more about what you’re building and/or your aims?

DerMolly · September 13, 2020, 7:28pm

Discussing it with the rest of the team we came to the same conclusion and are currently trying to fix our lexer to support keyword prefixes.
What is a nodelist? Is this a nonterminal missing from the grammar? The example you copied also shows that between statements there should only be a ‘;’ and not a ‘,’. Digging deeper into the source, there seems to be a much different grammar in the source file than on the website. Did you by any chance change your grammar in an update and forget to update the documentation on the page?

I’m curious about your parser. Could you say more about what you’re building and/or your aims?

I’m part of the team working on ~~CodiMD~~ HedgeDoc - a collaborative markdown editor. We’re currently in the process of rebuilding our frontend for 2.0. In that process we are examining all the diagrams we support. Until 1.6 one of them was graphviz, but as the JS-Lib to generate the svg can be called a hack at the best of times, we want to remove that if possible. Unfortunately there are no good TypeScript Libs to generate images from dot, so we’re trying to parse the dot langugae and generate mermaid code (annother diagram type we support) to minimized the amount of breaking in 2.0.

smattr · September 13, 2020, 8:26pm

Any changes to the grammar in either the source or website occurred before my time. I’ll have to leave it to one of the more senior maintainers to comment on the situation there. However, as a reference for building your own parser, I would use the Graphviz source code, not the website.

Interesting. Maybe @magjac can comment on the solution space here? I know he did some nice work to make this forum able to directly render DOT sources as SVGs.

magjac · September 13, 2020, 8:50pm

I have no idea if this is helpful or not, but…

I use my own library d3-graphviz which is based on @hpcc-js/wasm. It’s JavaScript, not TypeScript though, although there is @types/d3-graphviz.

In the graphviz-visual-editor react app there’s a parser based on pegjs which is an extended version of dotparser.

Here’s an example of that:

[dot verbose=true]

digraph D {
    node [fontname="Arial"];

    node_A [shape=record    label="shape=record|{above|middle|below}|right"];
    node_B [shape=plaintext label="shape=plaintext|{curly|braces and|bars without}|effect"];
}

[/dot]

DerMolly · September 13, 2020, 9:16pm

Yeah you’re right. The websites definition looked so good, precise and complete, that I thought that it would suffice.

@magjac I will need to have a look at those, maybe there’s something we can use. Thanks!

erg · September 14, 2020, 9:30pm

The grammar description in the website documentation is meant to be an easily understandable grammar, unlike the real yacc grammar. Anything you write using the former will be valid dot, and it probably captures most of what’s doable using the yacc grammar. One might consider adding commas as well as semicolons in stmt_list, but that would not give legal dot. To get a comma-separated node list, one would have to start making the grammar more complicated, which would defeat its pedagogical purpose. The bottom line: if you want to capture the complete dot grammar, you’ll have to use the yacc grammar, as noted.

Topic		Replies	Views
Grammaar in document icorrect Dev	7	782	February 1, 2025
Ports & compass points in node statements? Dev	3	604	May 3, 2024
Comma separated list of node IDs Help	6	946	March 27, 2022
Should identifiers be quoted when using characters from the “Miscellaneous Symbols and Pictographs” unicode block? Help	1	735	February 13, 2024
Strange graphviz syntax in pydot parser -- valid/useful anywhere? Dev	8	203	July 13, 2024

Grammar questions

Related topics