Parsing input (SLR parser)
- Preface
- Parsing input (SLR parser)
- Parser F# code
- Well-formed syntax
- Semantics of propositional logic
- Constructing tables
- The application
For this projects I'm going to construct a SLR parser. These differs from LL(1) parsing in that we do not have to rewrite the grammar before we can create the parser. Furthermore we do not have to incorporate precedence and associativity in the grammar. These attributes can be set afterwards.
Lexing
For the parser we need a set of tokens
imp = "=>" | "->" biimp = "<=>" | "<->" or = "||" | "\/" and = "&&" | "/\" not = "~" lpar = "(" rpar = ")" atom = [a-z] | "T" | "F"An atom can be a variable or one of the constants T,F. The operators are called connectives. The first four can be expressed in two different ways. This is so as to accommodate the syntax of most programming languages along the syntax of the Coq Proof Assistant
Grammar
The grammar is as follows. Again we do not need to care about precedence and associativity for now.
Exp \rightarrow Exp imp Exp Exp \rightarrow Exp biimp Exp Exp \rightarrow Exp or Exp Exp \rightarrow Exp and Exp Exp \rightarrow not Exp Exp \rightarrow Atom Atom \rightarrow lpar Exp rpar Atom \rightarrow atomSLR parsing
When I started building this parser I skipped all the theory of SLR-parsing and just followed the recipe. I think this might be a good approach. The recipe is
- Add the production S' \rightarrow S to the grammar. Here S is the start production.
- Construct a NFA for the right hand side of each production.
- In the NFA: if a state has an outgoing transition on a non-terminal, add ε-transitions to every production of this non-terminal.
- Convert the NFA to a DFA.
- Build a table from the DFA.
- Calculate Follow on the grammar obtained in (1). Use this Follow-set to add reduce actions to the table obtained in 5.
- Solve conflicts in the table.
This is the outline. Lets now go into depth.
NFA and DFA
The inner machinery of a SLR-parser is a parse table, and this table is constructed using a DFA. We construct the DFA by writing an NFA. For this to happen we first add a start production. Furthermore we number the productions:
(0) Exp' \rightarrow Exp (1) Exp \rightarrow Exp imp Exp (2) Exp \rightarrow Exp biimp Exp (3) Exp \rightarrow Exp or Exp (4) Exp \rightarrow Exp and Exp (5) Exp \rightarrow not Exp (6) Exp \rightarrow Atom (7) Atom \rightarrow lpar Exp rpar (8) Atom \rightarrow atomThen for each production we add transitions based on the right sides as thus (using syntax of nfa2dfa converter):
#Transitions # [0] Exp' -> Exp (A) -Exp>(B) ((B)) # [1] Exp -> Exp imp Exp (C) -Exp>(D) (D) -imp>(E) (E) -Exp>(F) ((F)) # [2] Exp -> Exp biimp Exp (G) -Exp>(H) (H) -biimp>(I) (I) -Exp>(J) ((J)) # [3] Exp -> Exp or Exp (K) -Exp>(L) (L) -or>(M) (M) -Exp>(N) ((N)) # [4] Exp -> Exp and Exp (O) -Exp>(P) (P) -and>(Q) (Q) -Exp>(R) ((R)) # [5] Exp -> not Exp (S) -not>(T) (T) -Exp>(U) ((U)) # [6] Exp -> Atom (V) -Atom>(W) ((W)) # [7] Atom -> lpar Exp rpar (A1) -lpar> (B1) (B1) -Exp>(C1) (C1) -rpar>(D1) ((D1)) # [8] Atom -> atom (F1) -atom>(G1) ((G1))The alphabet ran empty, so I have started over with the naming. The names are not important. To this we add every transition a state can make to a non-terminal, as an ε-transition. Ex. from (A) we have the transition Exp, so we add an ε-transition to every production of Exp, that is [1],[2],[3],[4],[5],[6]. We get
# [0] Exp' -> Exp start(A) -eps>(C) -eps>(G) -eps>(K) -eps>(O) -eps>(S) -eps>(V) -Exp>(B) ((B)) # [1] Exp -> Exp imp Exp (C) -eps>(G) -eps>(K) -eps>(O) -eps>(S) -eps>(V) -Exp>(D) (D) -imp>(E) (E) -eps>(C) -eps>(G) -eps>(K) -eps>(O) -eps>(S) -eps>(V) -Exp>(F) ((F)) # [2] Exp -> Exp biimp Exp (G) -eps>(C) -eps>(K) -eps>(O) -eps>(S) -eps>(V) -Exp>(H) (H) -biimp>(I) (I) -eps>(C) -eps>(G) -eps>(K) -eps>(O) -eps>(S) -eps>(V) -Exp>(J) ((J)) # [3] Exp -> Exp or Exp (K) -eps>(C) -eps>(G) -eps>(O) -eps>(S) -eps>(V) -Exp>(L) (L) -or>(M) (M) -eps>(C) -eps>(G) -eps>(K) -eps>(O) -eps>(S) -eps>(V) -Exp>(N) ((N)) # [4] Exp -> Exp and Exp (O) -eps>(C) -eps>(G) -eps>(K) -eps>(S) -eps>(V) -Exp>(P) (P) -and>(Q) (Q) -eps>(C) -eps>(G) -eps>(K) -eps>(O) -eps>(S) -eps>(V) -Exp>(R) ((R)) # [4] Exp -> not Exp (S) -not>(T) (T) -eps>(C) -eps>(G) -eps>(K) -eps>(O) -eps>(S) -eps>(V) -Exp>(U) ((U)) # [6] Exp -> Atom (V) -eps>(A1) -eps>(F1) -Atom>(W) ((W)) # [7] Atom -> lpar Exp rpar (A1) -lpar> (B1) (B1) -eps>(C) -eps>(G) -eps>(K) -eps>(O) -eps>(S) -eps>(V) -Exp>(C1) (C1) -rpar>(D1) ((D1)) # [8] Atom -> atom (F1) -atom>(G1) ((G1))To convert the above epsilon riddled NFA we just use the nfa2dfa converter. We obtain the following
start (s0) = {A,A1,C,F1,G,K,O,S,V} -Atom>(s1) -Exp>(s2) -atom>(s3) -lpar>(s4) -not>(s5) ((s1)) = {W} ((s2)) = {B,D,H,L,P} -and>(s6) -biimp>(s7) -imp>(s8) -or>(s9) ((s3)) = {G1} (s4) = {A1,B1,C,F1,G,K,O,S,V} -Atom>(s1) -Exp>(s10) -atom>(s3) -lpar>(s4) -not>(s5) (s5) = {A1,C,F1,G,K,O,S,T,V} -Atom>(s1) -Exp>(s11) -atom>(s3) -lpar>(s4) -not>(s5) (s6) = {A1,C,F1,G,K,O,Q,S,V} -Atom>(s1) -Exp>(s12) -atom>(s3) -lpar>(s4) -not>(s5) (s7) = {A1,C,F1,G,I,K,O,S,V} -Atom>(s1) -Exp>(s13) -atom>(s3) -lpar>(s4) -not>(s5) (s8) = {A1,C,E,F1,G,K,O,S,V} -Atom>(s1) -Exp>(s14) -atom>(s3) -lpar>(s4) -not>(s5) (s9) = {A1,C,F1,G,K,M,O,S,V} -Atom>(s1) -Exp>(s15) -atom>(s3) -lpar>(s4) -not>(s5) (s10) = {C1,D,H,L,P} -and>(s6) -biimp>(s7) -imp>(s8) -or>(s9) -rpar>(s16) ((s11)) = {D,H,L,P,U} -and>(s6) -biimp>(s7) -imp>(s8) -or>(s9) ((s12)) = {D,H,L,P,R} -and>(s6) -biimp>(s7) -imp>(s8) -or>(s9) ((s13)) = {D,H,J,L,P} -and>(s6) -biimp>(s7) -imp>(s8) -or>(s9) ((s14)) = {D,F,H,L,P} -and>(s6) -biimp>(s7) -imp>(s8) -or>(s9) ((s15)) = {D,H,L,N,P} -and>(s6) -biimp>(s7) -imp>(s8) -or>(s9) ((s16)) = {D1}Parsing Table
The above DFA we list as a table (* is for contains accepting). For every transition on a terminal in the above DFA we add the sn (where n is a state number). These are called shift actions. For every non-terminal we add gn (n is again a state number). These are called goto actions. In the obtained table the names of NFA-states are only there to help our understanding. These are of no practical use.
DFA state | NFA states | Transitions | |||||||||
imp | biimp | or | and | not | lpar | rpar | atom | Exp | Atom | ||
s0 | A,A1,C,F1,G,K,O,S,V | s5 | s4 | s3 | g2 | g1 | |||||
s1* | W | ||||||||||
s2* | B,D,H,L,P | s8 | s7 | s9 | s6 | ||||||
s3* | G1 | ||||||||||
s4 | A1,B1,C,F1,G,K,O,S,V | s5 | s4 | s3 | g10 | g1 | |||||
s5 | A1,C,F1,G,K,O,S,T,V | s5 | s4 | s3 | g11 | g1 | |||||
s6 | A1,C,F1,G,K,O,Q,S,V | s5 | s4 | s3 | g12 | g1 | |||||
s7 | A1,C,F1,G,I,K,O,S,V | s5 | s4 | s3 | g13 | g1 | |||||
s8 | A1,C,E,F1,G,K,O,S,V | s5 | s4 | s3 | g14 | g1 | |||||
s9 | A1,C,F1,G,K,M,O,S,V | s5 | s4 | s3 | g15 | g1 | |||||
s10 | C1,D,H,L,PC1,D,H,L,P | s8 | s7 | s9 | s6 | s16 | |||||
s11* | D,H,L,P,U | s8 | s7 | s9 | s6 | ||||||
s12* | D,H,L,P,R | s8 | s7 | s9 | s6 | ||||||
s13* | D,H,J,L,P | s8 | s7 | s9 | s6 | ||||||
s14* | D,F,H,L,P | s8 | s7 | s9 | s6 | ||||||
s15* | D,H,L,N,P | s8 | s7 | s9 | s6 | ||||||
s16* | D1 |
Reduce
We need the Follow-set of the expanded grammar in order to calculate what is called reduce actions. Let's use grammar2set with the input
prod Exp' -> Exp prod Exp -> Exp "imp" Exp prod Exp -> Exp "biimp" Exp prod Exp -> Exp "or" Exp prod Exp -> Exp "and" Exp prod Exp -> "not" Exp prod Exp -> Atom prod Atom -> "(" Exp ")" prod Atom -> "atom"For which we obtain
Follow(Exp') = {$} Follow(Exp) = {),and,or,biimp,imp,$} Follow(Atom) = {),and,or,biimp,imp,$}Say we have some production, N \rightarrow \alpha, with production number p. When a DFA state contains an accepting NFA state marked with p, add a reduce p action on alle symbols in Follow(N) in the table. If p = 0, then add an accept action instead. Hence where shift and goto target states, reduce target productions. Here we go
accepting(s1) = {W} W.p = 6 prod(6) = Exp \rightarrow Atom Follow(Exp) = {),and,or,biimp,imp,$} (reduce) accepting(s2) = {B} B.p = 0 prod(0) = Exp' \rightarrow Exp Follow(Exp') = {$} (accept) accepting(s3) = {G1} G1.p = 8 prod(9) = Atom \rightarrow atom Follow(Atom) = {),and,or,biimp,imp,$} (reduce) accepting(s11) = {U} U.p = 5 prod(5) = Exp \rightarrow not Exp Follow(Exp) = {),and,or,biimp,imp,$} (reduce) accepting(s12) = {R} R.p = 4 prod(4) = Exp \rightarrow Exp and Exp Follow(Exp) = {),and,or,biimp,imp,$} (reduce) accepting(s13) = {J} J.p = 2 prod(2) = Exp \rightarrow Exp biimp Exp Follow(Exp) = {),and,or,biimp,imp,$} (reduce) accepting(s14) = {F} F.p = 1 prod(1) = Exp \rightarrow Exp imp Exp Follow(Exp) = {),and,or,biimp,imp,$} (reduce) accepting(s15) = {N} N.p = 3 prod(3) = Exp \rightarrow Exp or Exp Follow(Exp) = {),and,or,biimp,imp,$} (reduce) accepting(s16) = {D1} D1.p = 7 prod(7) = Atom \rightarrow lpar Exp rpar Follow(Atom) = {),and,or,biimp,imp,$} (reduce)We add the reduce and accept actions to the table along side a column for the end marker (terminal) $.
DFA state | NFA states | Transitions | ||||||||||
imp | biimp | or | and | not | lpar | rpar | atom | Exp | Atom | $ | ||
s0 | A,A1,C,F1,G,K,O,S,V | s5 | s4 | s3 | g2 | g1 | ||||||
s1* | W | r6 | r6 | r6 | r6 | r6 | r6 | |||||
s2* | B,D,H,L,P | s8 | s7 | s9 | s6 | accept | ||||||
s3* | G1 | r8 | r8 | r8 | r8 | r8 | r8 | |||||
s4 | A1,B1,C,F1,G,K,O,S,V | s5 | s4 | s3 | g10 | g1 | ||||||
s5 | A1,C,F1,G,K,O,S,T,V | s5 | s4 | s3 | g11 | g1 | ||||||
s6 | A1,C,F1,G,K,O,Q,S,V | s5 | s4 | s3 | g12 | g1 | ||||||
s7 | A1,C,F1,G,I,K,O,S,V | s5 | s4 | s3 | g13 | g1 | ||||||
s8 | A1,C,E,F1,G,K,O,S,V | s5 | s4 | s3 | g14 | g1 | ||||||
s9 | A1,C,F1,G,K,M,O,S,V | s5 | s4 | s3 | g15 | g1 | ||||||
s10 | C1,D,H,L,PC1,D,H,L,P | s8 | s7 | s9 | s6 | s16 | ||||||
s11* | D,H,L,P,U | s8/r5 | s7/r5 | s9/r5 | s6/r5 | r5 | r5 | |||||
s12* | D,H,L,P,R | s8/r4 | s7/r4 | s9/r4 | s6/r4 | r4 | r4 | |||||
s13* | D,H,J,L,P | s8/r2 | s7/r2 | s9/r2 | s6/r2 | r2 | r2 | |||||
s14* | D,F,H,L,P | s8/r1 | s7/r1 | s9/r1 | s6/r1 | r1 | r1 | |||||
s15* | D,H,L,N,P | s8/r3 | s7/r3 | s9/r3 | s6/r3 | r3 | r3 | |||||
s16* | D1 | r7 | r7 | r7 | r7 | r7 | r7 |
Solving Conflicts
In the new table we have what is called conflicts. In the row of state s11 to state s15 four of the terminals have a conflict between shift and reduce. For every conflicting action we choose according to the following. We have a last read input, which we obtain by looking for a terminal in the production the reduce action targets, and we have the next input which is the terminal named in the column. Now
- In the case where last and current has same precedence level, we have two situations
- The operators are left-associative: we keep reduce.
- The operators are right-associative: we keep shift.
- In the case we have a + b * c where * binds tighter than +, we keep the shift action.
- In the case we have a * b + c where again * binds tighter than +, we keep the reduce action.
In order for this to be of any use we need some formal syntax rules for propositional logic. I found some in the book Logic in Computer Science, Convention 1.3, page 5: ¬ binds tighter than ∧ and ∨ which binds tighter than →. To the implication level we just add ↔. Both implication and biimplication is right associative. The rest for now is left associative. Thus we have p \rightarrow (q \lrarrow r) and ((\not p) \land q) \rightarrow r
With this in mind let us have a look at the conflicts in s11*
- Under imp we look up the accepting state (U) in prod(5) = Exp \rightarrow not Exp. Hence we see a not before the imp. This is a matter of precedence, we want it to be parsed as (not p) imp q. We are in case 3, we keep the reduce.
- Under biimp we have the same situation as above. We keep reduce.
- Under or we have the same situation as above. We keep reduce.
- Under and we have the same situation. We keep reduce.
We move on to s12* with reduction/preceding production prod(4) = Exp \rightarrow Exp and Exp
- Under imp we have that the preceding and binds tighter: (p and q) imp r. Thus we are in case 3. We keep reduce.
- Under biimp we have as above. We keep reduce.
- Under or we have the situation where the two binds with same strength, and that the operators are left associative: (p and q) or r. We are in case 1.1, and we keep reduce.
- Under and: same as above. We keep reduce.
For s13* we have the preceding production prod(2) = Exp biimp Exp
- Under imp we have same binding strength, but right associativity, we are in case 1.2, we keep shift.
- Under biimp we have as above. We keep shift.
- Under or: we have p biimp (q or r) - or binds tighter, so we keep shift.
- Under and we have the same as above. We keep shift.
For s14* we have the preceding production prod(1) = Exp \rightarrow Exp imp Exp
- Under imp we have the same binding strength and a right associative operator. We are in case 1.2, we keep shift.
- Under biimp we have as above. We keep shift.
- Under or we have p imp (q or r), thus we keep shift.
- Under and we have as above. We keep shift.
Lastly for s15* we have the preceding production prod(3) = Exp \rightarrow Exp or Exp
- Under imp we have (p or q) imp r. Case 3, we keep reduction.
- Under biimp we have as above. We keep reduction
- Under or we have the same tightness with a left associative operator (p or q) or r. We keep reduce.
- Under and we have as above. We keep reduce.
And we can create the final table
DFA state | NFA states | Actions | GOTO | |||||||||
imp | biimp | or | and | not | lpar | rpar | atom | $ | Exp | Atom | ||
s0 | A,A1,C,F1,G,K,O,S,V | s5 | s4 | s3 | g2 | g1 | ||||||
s1* | W | r6 | r6 | r6 | r6 | r6 | r6 | |||||
s2* | B,D,H,L,P | s8 | s7 | s9 | s6 | accept | ||||||
s3* | G1 | r8 | r8 | r8 | r8 | r8 | r8 | |||||
s4 | A1,B1,C,F1,G,K,O,S,V | s5 | s4 | s3 | g10 | g1 | ||||||
s5 | A1,C,F1,G,K,O,S,T,V | s5 | s4 | s3 | g11 | g1 | ||||||
s6 | A1,C,F1,G,K,O,Q,S,V | s5 | s4 | s3 | g12 | g1 | ||||||
s7 | A1,C,F1,G,I,K,O,S,V | s5 | s4 | s3 | g13 | g1 | ||||||
s8 | A1,C,E,F1,G,K,O,S,V | s5 | s4 | s3 | g14 | g1 | ||||||
s9 | A1,C,F1,G,K,M,O,S,V | s5 | s4 | s3 | g15 | g1 | ||||||
s10 | C1,D,H,L,PC1,D,H,L,P | s8 | s7 | s9 | s6 | s16 | ||||||
s11* | D,H,L,P,U | r5 | r5 | r5 | r5 | r5 | r5 | |||||
s12* | D,H,L,P,R | r4 | r4 | r4 | r4 | r4 | r4 | |||||
s13* | D,H,J,L,P | s8 | s7 | s9 | s6 | r2 | r2 | |||||
s14* | D,F,H,L,P | s8 | s7 | s9 | s6 | r1 | r1 | |||||
s15* | D,H,L,N,P | r3 | r3 | r3 | r3 | r3 | r3 | |||||
s16* | D1 | r7 | r7 | r7 | r7 | r7 | r7 |
I have rearranged so that $ is located after atom. And then marked the terminals as Actions and the non-terminals as GOTO.
Pseudo Code
For the sake of our understanding we can try to run the parser on some string by hand. The parser works by maintaining a stack. We have for shift, reduce, accept and error the following
- Shift: Shift/push the next input symbol onto the stack.
- Reduce: The right end of the string must be located on top of the stack. Locate the left end of the string within the stack and decide with what non-terminal to replace the string. This is done by popping the length of the right hand side of the non-terminal from the stack. Then read the top of the stack, use this read to locate a goto on the non-terminal, and push this on top of stack.
- Accept: Announce successful completion of the parsing.
- Error: Happens in any of the blank fields in the table. Decide on some proper error handling.
So lets say we have the input p or (q imp r)$ - where p,q,r are atom. Initialize the stack with state 0. We write the stack on the right side, the input string in the middle and the chosen action on the right side as follows
Stack | Input | Action |
0 | p or (q imp r)$ | shift with 3 |
0 3 | or (q imp r)$ | reduce by 8: Atom -> atom |
0 1 | or (q imp r)$ | reduce by 6: Exp -> Atom |
0 2 | or (q imp r)$ | shift with 9 |
0 2 9 | (q imp r)$ | shift with 4 |
0 2 9 4 | q imp r)$ | shift with 3 |
0 2 9 4 3 | imp r)$ | reduce by 8: Atom -> atom |
0 2 9 4 1 | imp r)$ | reduce by 6: Exp -> Atom |
0 2 9 4 10 | imp r)$ | shift with 8 |
0 2 9 4 10 8 | r)$ | shift with 3 |
0 2 9 4 10 8 3 | )$ | reduce by 8: Atom -> atom |
0 2 9 4 10 8 1 | )$ | reduce by 6: Exp -> Atom |
0 2 9 4 10 8 14 | )$ | reduce by 1: Exp -> Exp imp Exp |
0 2 9 4 10 | )$ | shift with 16 |
0 2 9 4 10 16 | $ | reduce with 7: Atom -> lpar Exp rpar |
0 2 9 1 | $ | reduce by 6: Exp -> Exp imp Exp |
0 2 | $ | accept |
The whole algorithm is found in the dragon book. Initialize the stack with state0. In pseudo code we now get:
let a = first symbol of input while (true) let s = top of stack if action[s,a] == shift t: push t onto the stack let a = next input symbol elif action[s,a] == reduce (prod A -> B): pop |B| symbols off the stack let t = top of stack push goto[s,A] onto stack handle production A -> B elif action[s,a] == accept: break else handle errorIn practice both t and prod (A -> B) are number/indexes into the table. As are every entry of the GOTO-table. Thus the stack is of type int.