Definitions
As seen in the syntax: We work with five (or six) different define mechanism. Let us list them
- prod This is a production, it starts with the keyword prod. Productions with the same name can be separated by | or simply by writing the name again.
- prec This keyword is placed before any precedence definition. To define a precedence, write prec "symbol" prec_level where prec_level is an int. The higher level, the stronger binding.
- assoc As with prec, this keyword defines associativity for a symbol. To define, write assoc "left/right" : tuple("symbol") - associativity can be either left or right. Target symbols can be tupled, ex. assoc "left" : "times","plus","minus"
- token Defines a token in this way token tuple("name") as tuple("regex"). Make sure to match the size of tuples on each side. An example could be token "plus","minus" as "+","-". The flag -cap can be used just after the token keyword. If so, the token is captured by the lexer for later use in the syntax tree, ex. token -cap "id" as "[a-zA-Z][a-zA-Z0-9_]*".
- !token This is !token tuple("regex") - a tuple of regular expressions to be ignored by the lexer.
- group This is done as group {tuple("symbol")} - a name of symbols to be grouped in order to create syntax errors.
A minimum for creating a parser is a set of productions. If any operator ambiguities are present, precedence and associativity is needed. If a token is not defined, the generator will insert the token name in the regular expression.
The tokens are placed in the regular expression in the order they appear in the source code. This is crucial since a regular expression that is a substring of a later regular expression can cancel out the latter, ex. ab|abcd - in this case the string abcd will be matched by the first choice of the regex, leaving cd not to be matched at all.
If some string is present in the input that is not defined one way or the other as a token, the lexer will terminate with an "garbage in expression" error.
The parser will generate syntax errors when the input string does not conform to the syntax.