Grammaar in document icorrect

slipbits · December 28, 2021, 11:08pm

I just started with Graphviz and am reading the DOT documentation. The online grammar is a considerable improvement over the previous pdf version, but it still appears incorrect. Here are some comments and a ‘corrected’ grammar. Note that the ‘corrected’ version is different in some notable aspects with the online grammar, and hence may not be acceptable, and I have not run it through a LR(1) parser generator (Bison/[ABJ]Yacc) or a LALR(1) parser (ANTLR), so the new grammar may be incorrect (sigh). I don’t know how to attach a PDF document so I am providing a text version.

Existing grammar Note:
1. Only one graph can be seen. You can not have two graphs represented, as graph1 immediate;y followed by graph2.

 2. stmt-list requires that either there is no statement or thate are at least two statements, as in:

stmt; stmt or stmt stmt

 3. A valid attr_list is [ ]  [ ]  [x]  

 4. Valid a-list statement: id = id . Id = id id

 5. Valid a-list statement:    3.14 = 2

 6. Valid a-list statement:    <https://website.com> = 2

 7. From graph, stmt, edge_stmt, subgraph, edgeRHS the following is legal: 
   { ? {} }

 8. Making compass_pt part of your grammar interfers with the definition of id. However , if your grammar is strictly discursive then it appears ok but without extending the lexical definitions to include each of the compass points individually is confusing, and it is confusing in any case. It is my belief that the implementation probably makes compass-pt a semantic check rather than a syntactic check.

  9. The grammar can not recognize A -> { B C }

The lexical definition of a number is incorrect.
a) By your definition both ‘-’ and ‘’ are numbers.
b) A corrected syntax using your syntax is:
c) [-]?(.[0-9]+ | [0-9]+.[0-9]+))
```
   where ‘.’ is the literal period
```

New Grammar:
DOT Syntax
graph_list : graph_list graph
| graph
;

*graph*         :  [*strict*] ( *digraph*) | *graph* ) [ *lhs* ] ‘{‘ *stmt-list* ‘}’
            ;

*stmt-list*   :  *stmt-lis*t *stmt* [ ‘;’ ]
            |    *stmt* [ ‘;’ ]
            |
            ;

*stmt*        :  *node_stmt*
            | *edge_stmt*
            | *attr_stmt*
            | *lhs* ‘=’ *rhs*
            ;

*attr-stmt*   :  ( *graph* | *node* | *edge* ) *attr-list*
            ;

*attr-list*   :  *attr-list* ‘[‘ [ *a-list* ] ‘]’
            : ‘[‘ [ *a-list* ] ‘]’
            ;

*a-list*      :  *a-list* *lhs* ‘=’ *lhs* [ ( ‘.’ | ‘;’ ) ]
            | *lhs* ‘=’ *lhs* [ ( ‘.’ | ‘;’ ) ]
            ;

*edge-stmt*   :  ( *node-id* | *subgraph* ) *edgeRHS* attr*-*list
            ;

*edgeRHS*     :  *edgeRHS* *edgeop* ( *node-id* | *subgraph* )
            | *edgeop* ( *node-id* | *subgraph* )
            ;

*node-stmt*   :  *node-id* *attr-list*
            | *node-id*
            ;

*node-id*     :  *lhs* *port*
            | *lhs*
            ;

*port*        :  ‘:’ *lhs* [‘:’ compass_pt ]
            |
            ;

*port-location* :  ‘:’ *lhs*
            | ‘:’ ‘(‘ *lhs* ‘,’l*hs* ‘)’
            | ‘:’ *compass_pt*
            ;

*subgraph*    :  *subgraph_prefix* ‘{‘ *stmt-list* ‘}’
            ;

*subgraph_prefix*: **subgraph** *lhs*
            | **subgraph**

*compass_pt* :  (n | ne | e | se | s | sw | w | nw | c | _ )
           ;

*edgeop*     :  **->** | **-**
           ;

*comments*    :  ‘/*’ text ‘*/’
            | ‘//’ text
            ;

*lhs*         :  ***alpha*** *alpha_list*
            | *string*
            ;

*alpha_list* :  *alpha_list* ( ***alpha*** | ***numeric*** |  **_** )
            | ( alpha | numeric | ‘_’ )
            |
            ;

*string*      :***"*** *string_list* ***"***
            ;

*string_list* :   *string_list* *string_prim*
            | *string_prim*
            ;

*string_prim*   : ‘ ‘
            | ***alpha***
            | ***numeric***
            ;

*html*        : ‘< ***html_expression*** ‘>’
            ;

***html_expression***: lexical construct;

***alpha***      : lexical construct;

***numeric***    : lexical construct;

Meta-Language
‘ ‘    literal string, part of input language
:       separator for first element of grammar expressions
;       grammar statement termination
bold    literal string, part of input language
bold     italic  lexeme
italic   grammatical element
( )     grouping, an item within the grouping is required
[ ]     grouping, items witing the grouping are optional
|       alternation, a logical ‘or’ between options

scnorth · December 30, 2021, 8:04pm

Thank you for doing all this. I hadn’t thought about digraph G { 3.14 = 2 } but now it makes perfect sense. You wouldn’t think any of us had been “language people” but I think arguably Emden was a language person, though maybe more on the implementation side. e.g. the standard environment for SML-NJ, and an entire X windows client library in SML written from the atoms up. And what do people remember you for? 3.14 = 2. One of life’s lessons.

What should we do with this revised grammar? This is in a markup language?

slipbits · December 30, 2021, 8:32pm

Hi;

I am in the process of reading the Graphviz DOT manual. When I’m done, I’ll be an “expert” (yeah, sure). If you’d like, I can create a viable grammar in BNF which is suitable, at least suitable in the sense it can be tweaked to satisfy the DOT developer community.

I have sort-of started this with the grammar I sent. But, alas, it is probably defective and needs to be changed (see above). So I wouldn’t get too enthusiastic about it just yet.

What I’d like to propose is some mechanism where I can send the grammar to the development/documentation teams. This has the advantage that the BNF will look ‘pretty’. I am creating the grammar using LibreOffice, which makes it possible to produce a pdf, Microsoft doc(x), XML or text version of it.

In answer to your question, what I hope is that I can be useful to you, at least insofar as modification of the documentation is concerned. Any grammar produced can be used or not used by Graphviz in any manner they see fit. But with the existing grammar flawed something should really be done.

And, oh sigh, if this is agreeable I can also but the grammar in an acceptable format for any of the popular compiler generators. When I have done this on my own projects I have found that code generation (or addressing implementation issues) becomes easier. Basically, a change in the grammar (an enhancement) becomes an easy issue of implementation.

Don’t know what else to say. I use Doxygen, which uses Graphviz, and what Doxygen/Graphviz does is outstanding.

art

slipbits · December 30, 2021, 9:14pm

@scnorth

As I read the DOT document I may have some questions. How do I get them answered?

For example, looking at ** Subgraphs and Clusters** I see:

A -> {B C}

and

subgraph { 
  rank = same; A; B; C; 
}

Does this mean that A -> subgraph { rank = same; A; B; C; } is legal? If we add an id such as subgraph id { } then is A -> id legal? And is A -> { rank = same; A; B; C } legal? (Note: because A is in the subgraph this is a cycle, but if the graph is not a digraph it should be legal. But (semantically) it may not make much sense. See below where subgraph { B; C; } is presented.

And on a separate vein, would the documentation make more sense to use subgraph { B, C; } rather than subgraph { A; B; C; }

slipbits · December 31, 2021, 12:45am

@scnorth
Here is a good version of the initial grammar. There is still some confusion on my part as to what is required, and I haven’t run this through a parser generator yet. But I think (most) of the intent of original grammar is retained.

The ports syntax is just a guess.
Changing ‘compass_pt’ to id removes all sorts of issues and makes checking a semantic issue.
Not having keywords in the language will create issues in that things like {graph, digraph, strict, subgraph, etc) have to have special care. Making them a semantic issue makes sense.
I didn’t know if NODE [attr] -> was allowed so I just made it allowed. It’s easy enough to change in any case.
I saw one reference to ‘cluster’ but it wasn’t in the original grammar so I left it out.

If there is any continuing interest I will change the formats for a parser generator and generate code to recognize graphs with their attributes.
art

=====================================
An alternate grammar (BNF extensions are not used):
DOT Syntax
graph_list : graph_list [ ‘;’ ] graph
| graph
;

graph : [strict] [ digraph) | graph ] [ id ] ‘{‘ stmt-list ‘}’
;

stmt-list : stmt-list stmt [ ‘;’ ]
| stmt [ ‘;’ ]
|
;

stmt : edge_stmt
| attr_stmt
| lhs ‘=’ rhs
;

node_stmt : node_id ‘[‘ attr_list ‘]’
| node_id
;

attr-stmt : ( graph | node | edge ) ‘[‘ attr-list ‘]’
;

attr-list : attr-list [ ‘;’ | ‘,’ ] a_list
| attr
;

attr : id ‘=’ rhs
| id ‘=’ a_list
;

a_list : a_list [ ‘;’ | ‘,’ ] rhs
| rhs
;

edge-stmt : node_stmt edgeop edgeRHS
;

edgeRHS : node-id
// | subgraph_id // semantic definition
| subgraph
;

subgraph : subgraph [ id ] ‘{ subgraph_list ‘}
| subgraph [ id ] ‘{ attr_stmt subgraph_list ‘}’
;

subgraph_list : subgraph_list [ ‘;’ | ‘,’ ] node_stmt
| node_stmt
;

node-id : id port
| id
;

port : ‘:’ id [‘:’ id ]
;

// compass_pt : (n | ne | e | se | s | sw | w | nw | c | _ )
// ;

edgeop : ‘?’ | ‘–’
;

Lexemes – the lexer recognizes the stuff
comments : ‘/’ text ‘/’ // not passed to parser
| ‘//’ text
;

id : alpha alpha_list // id passed to parser
| string
;

alpha_list : alpha_list ( alpha | numeric | ‘’ )
| ( alpha | numeric | ‘’ )
|
;

string : “ string_list “ // string w/o ‘”’ passed to parser
;

string_list : string_list string_prime
| string_prime
;

string_prime : ‘ ‘
| alpha
| numeric
;

html : ‘< html_expression ‘>’ // html expression passed to parser
;

html_expression: lexical construct;

alpha : lexical construct;

numeric : lexical construct;

Meta-Language
‘ ‘ literal string, part of input language
: separator for first element of grammar expressions
; grammar statement termination
bold literal string, part of input language
bold italic lexeme
italic grammatical element
( ) grouping, an item within the grouping is required
grouping, items witing the grouping are optional
| alternation, a logical ‘or’ between options

steveroush · January 1, 2022, 7:30pm

I won’t speak to legal (not a developer, just a user), but

A -> subgraph { rank = same; A; B; C; },
A -> { rank = same; A; B; C; } and
A -> subgraph ZZZ { rank = same; A; B; C; }
all produce the same graph:
subgraph ZZ { rank = same; A; B; C; } A -> ZZ
produces a different graph:

note 1: cyclical digraphs are OK with dot, see Section 1 of https://www.graphviz.org/pdf/dotguide.pdf
note 2:

Semicolons and commas aid readability but are not required. Also, any amount of whitespace may be inserted between terminals.

slipbits · January 1, 2022, 8:38pm

@steveroush
Thanks. But what I am trying to understand is the grammar. My thought was that the grammar would tell me what’s legal, and I then have a ‘guide’ to read the documentation. But the grammar is faulty, and the lexical definition for a number is faulty.

The grammar appears inconsistent with the parser generator syntax descriptions. For example, port only has a compass_pt in the grammar but graphviz-2.50.0/cmd/lefty/dot2l/dotparse.y has considerably more granularity. So which is correct?

And there are four *parse.y files in the source listing. I’ve glanced at graphviz-2.50.0/cmd/lefty/parse.c and it does not appear to be code that was generated from a parser generator, it looks like a hand-crafted Floyd-Evans parser, so what is the purpose of the *parse.y files?

I am confused. My guess is that the grammar is a throw-away and the only thing to do is read dotguide.pdf and ignore the grammar. But the online documentation states that dotguide.pdf is not current, obsolete. So how do I learn the language?

Vithanco · February 1, 2025, 7:57pm

Hi @slipbits! This is (was?) an interesting thread. Did you continue in any way?
My understanding/reeading of above is that the language was never “bullet-proof” and not enough time was spent on harden the grammar. Instead, it was based on a “good enough” approach (which worked quite well). But i think an improved grammar could help… (speaking from the outside, so no weight behind my words)

Topic		Replies	Views
Grammar questions Help	6	743	September 14, 2020
Strange graphviz syntax in pydot parser -- valid/useful anywhere? Dev	8	196	July 13, 2024
Comma separated list of node IDs Help	6	944	March 27, 2022
Graph / subgraph attributes Help	5	1563	May 23, 2021
Ports & compass points in node statements? Dev	3	575	May 3, 2024

Grammaar in document icorrect

Related topics