LL1 grammar analysis. Yet another top-down parser generator.
The calculator checks LL1 grammar correctness, parses a text using the grammar, shows FIRST, FOLLOW and FIRST PLUS sets, parsing tree and gives PLANETCALC parsing code.
The following calculator generates the parser code by the grammar given in EBNF form. We use classic math expression grammar as an example.
The result can be used to create LL(1) parser using the PCP library available on this site.
Code example:
var parser = new PCP.parser
( [/* put generated code here */] );
//creates parsing tree
var tree = parser.parse( text );
//result traversal
tree.visit( {
/* define custom traversal
functions here with names
according your grammar
(nt1 and nt2 names are choosen
here for illustration purposes only);*/
"nt1": function ( v ) {
// use to get expression value:
// v.getValue()
}
,"nt2": function ( v ) {
// use for children traversal:
// v.visit( childvisitor );
}
});
Some details regarding EBNF syntax and this calculator usage are below the calculator.
Extended Backus-Naur Form (EBNF)
EBNF has many advantages comparing to simple BNF:
-
EBNF rule can contain a few text lines. The rule end is marked by the semicolon (;).
terminal character
= letter
| decimal digit
| concatenate symbol
| defining symbol
| definition separator symbol
| end comment symbol
| end group symbol
| end option symbol
| end repeat symbol
| except symbol
| first quote symbol
| repetition symbol
| second quote symbol
| special sequence symbol
| start comment symbol
| start group symbol
| start option symbol
| start repeat symbol
| terminator symbol
| other character; -
Nonterminal symbols must not be marked by angle brackets (but all terminal symbols must be in single or double quotes).
decimal digit = ’0’ | ’1’ | ’2’ | ’3’ | ’4’ | ’5’ | ’6’ | ’7’| ’8’ | ’9’;
-
EBNF has a special syntax for Kleene closure - any expression in braces can be repeated zero or more times:
syntax= syntax rule, {syntax rule};
- Exceptions can be defined using a minus sign. For example, the following excerption from the EBNF standard defines any symbol except a single quotation mark:
first terminal character= terminal character - first quote symbol;
Full RBNF description can be found in ISO/IEC 14977 standard.
EBNF standard extensions
A few extensions to the EBNF standard are implemented in the calculator above. As a special sequence, you can use a regular expression or hexadecimal symbol code. E.g., for line feed, you may use the following rule, defined by two hexadecimal symbols:
lineend= ?0xd?,?0xa?;
E.g., the following regular expression-based rule can be used for non-printable symbols matching:
sp=?/\s+/?;
Grammar analysis
The calculator performs simple grammar analysis. It detects the following errors and unacceptable conditions:
- Grammar syntax errors
- Root rule missing
- Single root rule check
- Left recursion
- Back-track free parsing condition
Scanning and parsing combined
In classic compiler theory there is scanning step which produces tokens from the input stream. During scanning, all unnecessary symbols e.g. unprintable ones and comments get separated from tokens, which is used further on parsing step. Commonly scanners are based on regular expressions. This calculator does scanning and parsing in the same way using single grammar. Commonly, if you define a single grammar for scanning and parsing, the grammar become very complex. To simplify the grammar we allow to split whole grammar in multiple parts each of which has its own root rule. Remove rules input box defines the rules to be removed during preliminary parsing (scanning) steps. To see an example grammar with two parsing steps and tokens to be removed on preliminary parsing step click the following link:
Two step grammar exampleLiterature:
Comentarios