Grammaar in document icorrect

I just started with Graphviz and am reading the DOT documentation. The online grammar is a considerable improvement over the previous pdf version, but it still appears incorrect. Here are some comments and a ‘corrected’ grammar. Note that the ‘corrected’ version is different in some notable aspects with the online grammar, and hence may not be acceptable, and I have not run it through a LR(1) parser generator (Bison/[ABJ]Yacc) or a LALR(1) parser (ANTLR), so the new grammar may be incorrect (sigh). I don’t know how to attach a PDF document so I am providing a text version.

Existing grammar Note:
1. Only one graph can be seen. You can not have two graphs represented, as graph1 immediate;y followed by graph2.

 2. stmt-list requires that either there is no statement or thate are at least two statements, as in:

stmt; stmt or stmt stmt

 3. A valid attr_list is [ ]  [ ]  [x]  

 4. Valid a-list statement: id = id . Id = id id

 5. Valid a-list statement:    3.14 = 2

 6. Valid a-list statement:    <https://website.com> = 2

 7. From graph, stmt, edge_stmt, subgraph, edgeRHS the following is legal: 
   { ? {} }

 8. Making compass_pt part of your grammar interfers with the definition of id. However , if your grammar is strictly discursive then it appears ok but without extending the lexical definitions to include each of the compass points individually is confusing, and it is confusing in any case. It is my belief that the implementation probably makes compass-pt a semantic check rather than a syntactic check.

  9. The grammar can not recognize A -> { B C }
  1. The lexical definition of a number is incorrect.
    a) By your definition both ‘-’ and ‘’ are numbers.
    b) A corrected syntax using your syntax is:
    c) [-]?(.[0-9]+ | [0-9]+.[0-9]+))

       where ‘.’ is the literal period
    

New Grammar:
DOT Syntax
graph_list : graph_list graph
| graph
;

*graph*         :  [*strict*] ( *digraph*) | *graph* ) [ *lhs* ] ‘{‘ *stmt-list* ‘}’
            ;

*stmt-list*   :  *stmt-lis*t *stmt* [ ‘;’ ]
            |    *stmt* [ ‘;’ ]
            |
            ;

*stmt*        :  *node_stmt*
            | *edge_stmt*
            | *attr_stmt*
            | *lhs* ‘=’ *rhs*
            ;

*attr-stmt*   :  ( *graph* | *node* | *edge* ) *attr-list*
            ;

*attr-list*   :  *attr-list* ‘[‘ [ *a-list* ] ‘]’
            : ‘[‘ [ *a-list* ] ‘]’
            ;

*a-list*      :  *a-list* *lhs* ‘=’ *lhs* [ ( ‘.’ | ‘;’ ) ]
            | *lhs* ‘=’ *lhs* [ ( ‘.’ | ‘;’ ) ]
            ;

*edge-stmt*   :  ( *node-id* | *subgraph* ) *edgeRHS* attr*-*list
            ;

*edgeRHS*     :  *edgeRHS* *edgeop* ( *node-id* | *subgraph* )
            | *edgeop* ( *node-id* | *subgraph* )
            ;

*node-stmt*   :  *node-id* *attr-list*
            | *node-id*
            ;

*node-id*     :  *lhs* *port*
            | *lhs*
            ;

*port*        :  ‘:’ *lhs* [‘:’ compass_pt ]
            |
            ;

*port-location* :  ‘:’ *lhs*
            | ‘:’ ‘(‘ *lhs* ‘,’l*hs* ‘)’
            | ‘:’ *compass_pt*
            ;

*subgraph*    :  *subgraph_prefix* ‘{‘ *stmt-list* ‘}’
            ;

*subgraph_prefix*: **subgraph** *lhs*
            | **subgraph**

*compass_pt* :  (n | ne | e | se | s | sw | w | nw | c | _ )
           ;

*edgeop*     :  **->** | **-**
           ;

*comments*    :  ‘/*’ text ‘*/’
            | ‘//’ text
            ;

*lhs*         :  ***alpha*** *alpha_list*
            | *string*
            ;

*alpha_list* :  *alpha_list* ( ***alpha*** | ***numeric*** |  **_** )
            | ( alpha | numeric | ‘_’ )
            |
            ;

*string*      :***"*** *string_list* ***"***
            ;

*string_list* :   *string_list* *string_prim*
            | *string_prim*
            ;

*string_prim*   : ‘ ‘
            | ***alpha***
            | ***numeric***
            ;

*html*        : ‘< ***html_expression*** ‘>’
            ;

***html_expression***: lexical construct;

***alpha***      : lexical construct;

***numeric***    : lexical construct;

Meta-Language
‘ ‘    literal string, part of input language
:       separator for first element of grammar expressions
;       grammar statement termination
bold    literal string, part of input language
bold     italic  lexeme
italic   grammatical element
( )     grouping, an item within the grouping is required
[ ]     grouping, items witing the grouping are optional
|       alternation, a logical ‘or’ between options

Thank you for doing all this. I hadn’t thought about digraph G { 3.14 = 2 } but now it makes perfect sense. You wouldn’t think any of us had been “language people” but I think arguably Emden was a language person, though maybe more on the implementation side. e.g. the standard environment for SML-NJ, and an entire X windows client library in SML written from the atoms up. And what do people remember you for? 3.14 = 2. One of life’s lessons.

What should we do with this revised grammar? This is in a markup language?

Hi;

I am in the process of reading the Graphviz DOT manual. When I’m done, I’ll be an “expert” (yeah, sure). If you’d like, I can create a viable grammar in BNF which is suitable, at least suitable in the sense it can be tweaked to satisfy the DOT developer community.

I have sort-of started this with the grammar I sent. But, alas, it is probably defective and needs to be changed (see above). So I wouldn’t get too enthusiastic about it just yet.

What I’d like to propose is some mechanism where I can send the grammar to the development/documentation teams. This has the advantage that the BNF will look ‘pretty’. I am creating the grammar using LibreOffice, which makes it possible to produce a pdf, Microsoft doc(x), XML or text version of it.

In answer to your question, what I hope is that I can be useful to you, at least insofar as modification of the documentation is concerned. Any grammar produced can be used or not used by Graphviz in any manner they see fit. But with the existing grammar flawed something should really be done.

And, oh sigh, if this is agreeable I can also but the grammar in an acceptable format for any of the popular compiler generators. When I have done this on my own projects I have found that code generation (or addressing implementation issues) becomes easier. Basically, a change in the grammar (an enhancement) becomes an easy issue of implementation.

Don’t know what else to say. I use Doxygen, which uses Graphviz, and what Doxygen/Graphviz does is outstanding.

art

@scnorth

As I read the DOT document I may have some questions. How do I get them answered?

For example, looking at ** Subgraphs and Clusters** I see:

A -> {B C}

and

subgraph { 
  rank = same; A; B; C; 
} 

Does this mean that A -> subgraph { rank = same; A; B; C; } is legal? If we add an id such as subgraph id { } then is A -> id legal? And is A -> { rank = same; A; B; C } legal? (Note: because A is in the subgraph this is a cycle, but if the graph is not a digraph it should be legal. But (semantically) it may not make much sense. See below where subgraph { B; C; } is presented.

And on a separate vein, would the documentation make more sense to use subgraph { B, C; } rather than subgraph { A; B; C; }

@scnorth
Here is a good version of the initial grammar. There is still some confusion on my part as to what is required, and I haven’t run this through a parser generator yet. But I think (most) of the intent of original grammar is retained.

  1. The ports syntax is just a guess.
  2. Changing ‘compass_pt’ to id removes all sorts of issues and makes checking a semantic issue.
  3. Not having keywords in the language will create issues in that things like {graph, digraph, strict, subgraph, etc) have to have special care. Making them a semantic issue makes sense.
  4. I didn’t know if NODE [attr] -> was allowed so I just made it allowed. It’s easy enough to change in any case.
  5. I saw one reference to ‘cluster’ but it wasn’t in the original grammar so I left it out.

If there is any continuing interest I will change the formats for a parser generator and generate code to recognize graphs with their attributes.
art

=====================================
An alternate grammar (BNF extensions are not used):
DOT Syntax
graph_list : graph_list [ ‘;’ ] graph
| graph
;

graph : [strict] [ digraph) | graph ] [ id ] ‘{‘ stmt-list ‘}’
;

stmt-list : stmt-list stmt [ ‘;’ ]
| stmt [ ‘;’ ]
|
;

stmt : edge_stmt
| attr_stmt
| lhs ‘=’ rhs
;

node_stmt : node_id ‘[‘ attr_list ‘]’
| node_id
;

attr-stmt : ( graph | node | edge ) ‘[‘ attr-list ‘]’
;

attr-list : attr-list [ ‘;’ | ‘,’ ] a_list
| attr
;

attr : id ‘=’ rhs
| id ‘=’ a_list
;

a_list : a_list [ ‘;’ | ‘,’ ] rhs
| rhs
;

edge-stmt : node_stmt edgeop edgeRHS
;

edgeRHS : node-id
// | subgraph_id // semantic definition
| subgraph
;

subgraph : subgraph [ id ] ‘{ subgraph_list ‘}
| subgraph [ id ] ‘{ attr_stmt subgraph_list ‘}’
;

subgraph_list : subgraph_list [ ‘;’ | ‘,’ ] node_stmt
| node_stmt
;

node-id : id port
| id
;

port : ‘:’ id [‘:’ id ]
;

// compass_pt : (n | ne | e | se | s | sw | w | nw | c | _ )
// ;

edgeop : ‘?’ | ‘–’
;

Lexemes – the lexer recognizes the stuff
comments : ‘/’ text ‘/’ // not passed to parser
| ‘//’ text
;

id : alpha alpha_list // id passed to parser
| string
;

alpha_list : alpha_list ( alpha | numeric | ‘’ )
| ( alpha | numeric | ‘
’ )
|
;

string : “ string_list “ // string w/o ‘”’ passed to parser
;

string_list : string_list string_prime
| string_prime
;

string_prime : ‘ ‘
| alpha
| numeric
;

html : ‘< html_expression ‘>’ // html expression passed to parser
;

html_expression: lexical construct;

alpha : lexical construct;

numeric : lexical construct;

Meta-Language
‘ ‘ literal string, part of input language
: separator for first element of grammar expressions
; grammar statement termination
bold literal string, part of input language
bold italic lexeme
italic grammatical element
( ) grouping, an item within the grouping is required
grouping, items witing the grouping are optional
| alternation, a logical ‘or’ between options

I won’t speak to legal (not a developer, just a user), but

  • A -> subgraph { rank = same; A; B; C; },
    A -> { rank = same; A; B; C; } and
    A -> subgraph ZZZ { rank = same; A; B; C; }
    all produce the same graph:
    subgraphSyntax0

  • subgraph ZZ { rank = same; A; B; C; } A -> ZZ
    produces a different graph:
    subgraphSyntax11

note 1: cyclical digraphs are OK with dot, see Section 1 of https://www.graphviz.org/pdf/dotguide.pdf
note 2:

Semicolons and commas aid readability but are not required. Also, any amount of whitespace may be inserted between terminals.

@steveroush
Thanks. But what I am trying to understand is the grammar. My thought was that the grammar would tell me what’s legal, and I then have a ‘guide’ to read the documentation. But the grammar is faulty, and the lexical definition for a number is faulty.

The grammar appears inconsistent with the parser generator syntax descriptions. For example, port only has a compass_pt in the grammar but graphviz-2.50.0/cmd/lefty/dot2l/dotparse.y has considerably more granularity. So which is correct?

And there are four *parse.y files in the source listing. I’ve glanced at graphviz-2.50.0/cmd/lefty/parse.c and it does not appear to be code that was generated from a parser generator, it looks like a hand-crafted Floyd-Evans parser, so what is the purpose of the *parse.y files?

I am confused. My guess is that the grammar is a throw-away and the only thing to do is read dotguide.pdf and ignore the grammar. But the online documentation states that dotguide.pdf is not current, obsolete. So how do I learn the language?