Parser F# code

Both the table and a version of the algorithm that prints hitting productions, can be coded in F# as follows

Discriminating unions

type Token = | Imp | Biimp | Or | And | Not | LPar | RPar | Atom of string | Dollar type Action = | Reduce of int | Shift of int | Goto of int | Error | Accept

Tables (where actionrow2dict and gotorow2dict converts the input array to a dict with the proper index keys matching the original table)

let actionTable = [| actionrow2dict [|Error;Error;Error;Error;Shift 5;Shift 4;Error;Shift 3;Error|] actionrow2dict [|Reduce 6;Reduce 6;Reduce 6;Reduce 6;Error;Error;Reduce 6;Error;Reduce 6|] actionrow2dict [|Shift 8;Shift 7;Shift 9;Shift 6;Error;Error;Error;Error;Accept|] actionrow2dict [|Reduce 8;Reduce 8;Reduce 8;Reduce 8;Error;Error;Reduce 8;Error;Reduce 8|] actionrow2dict [|Error;Error;Error;Error;Shift 5;Shift 4;Error;Shift 3;Error|] actionrow2dict [|Error;Error;Error;Error;Shift 5;Shift 4;Error;Shift 3;Error|] actionrow2dict [|Error;Error;Error;Error;Shift 5;Shift 4;Error;Shift 3;Error|] actionrow2dict [|Error;Error;Error;Error;Shift 5;Shift 4;Error;Shift 3;Error|] actionrow2dict [|Error;Error;Error;Error;Shift 5;Shift 4;Error;Shift 3;Error|] actionrow2dict [|Error;Error;Error;Error;Shift 5;Shift 4;Error;Shift 3;Error|] actionrow2dict [|Shift 8;Shift 7;Shift 9;Shift 6;Error;Error;Shift 16;Error;Error|] actionrow2dict [|Reduce 5;Reduce 5;Reduce 5;Reduce 5;Error;Error;Reduce 5;Error;Reduce 5|] actionrow2dict [|Reduce 4;Reduce 4;Reduce 4;Reduce 4;Error;Error;Reduce 4;Error;Reduce 4|] actionrow2dict [|Shift 8;Shift 7;Shift 9;Shift 6;Error;Error;Reduce 2;Error;Reduce 2|] actionrow2dict [|Shift 8;Shift 7;Shift 9;Shift 6;Error;Error;Reduce 1;Error;Reduce 1|] actionrow2dict [|Reduce 3;Reduce 3;Reduce 3;Reduce 3;Error;Error;Reduce 3;Error;Reduce 3|] actionrow2dict [|Reduce 7;Reduce 7;Reduce 7;Reduce 7;Error;Error;Reduce 7;Error;Reduce 7|] |] let gotoTable = [| gotorow2dict [|Some 2;Some 1|] gotorow2dict [|None;None|] gotorow2dict [|None;None|] gotorow2dict [|None;None|] gotorow2dict [|Some 10;Some 1|] gotorow2dict [|Some 11;Some 1|] gotorow2dict [|Some 12;Some 1|] gotorow2dict [|Some 13;Some 1|] gotorow2dict [|Some 14;Some 1|] gotorow2dict [|Some 15;Some 1|] gotorow2dict [|None;None|] gotorow2dict [|None;None|] gotorow2dict [|None;None|] gotorow2dict [|None;None|] gotorow2dict [|None;None|] gotorow2dict [|None;None|] gotorow2dict [|None;None|] |]

The parser

let parser tokens = let tLen = Array.length tokens let popOne = function | s::stack -> (s,stack) | _ -> failwith "parser: popOne error" let popN n stack = let rec exec n0 acc = function | stack when n0 = n -> (acc,stack) | s::stack -> exec (n0 + 1) (acc @ [s]) stack | [] -> failwith "parser: popN error" exec 0 [] stack let pushGoto stack = function | Some g -> g::stack | None -> stack let getNextFromInput i = if i < tLen then (i + 1,tokens.[i+1]) else failwith "parser: getNextFromInputError" let removeAtomName = function | Atom _ -> Atom "" | t -> t let rec exec (i,a) sStack = let (s,_) = popOne sStack match actionTable.[s].[removeAtomName a] with | Shift t -> let newStack = t::sStack let (i,a) = getNextFromInput i exec (i,a) newStack | Reduce r -> let (prod,rSide) = productions.[r] let betaLen = Array.length rSide let (_,newStack) = popN betaLen sStack let (t,_) = popOne newStack let newStack = pushGoto newStack gotoTable.[t].[prod] printfn "prodId = %d: %s -> %A" r prod rSide exec (i,a) newStack | Accept -> printfn "parsing done" printfn "stack : %A" sStack () | _ -> failwith "parser: error case reached" exec (0,tokens.[0]) [0]

The cool thing is that this parser is bottom-up, which means that for some input the leaves are parsed first. This cooperates very well with functional programming since we can create a new node with two already calculated notes. We do not have to mutate the tree.

Error handling

We can insert error messages into the table as follows

DFA stateActionsGOTO
impbiimporandnotlparrparatom$ExpAtom
s0e4e4e4e4s5s4e3s3e2g2g1
s1*r6r6r6r6e4e4r6e4r6
s2*s8s7s9s6e4 ...accept
s3*r8r8r8r8e4e4r8e4r8
s4e4e4e4e4s5s4e3s3e2g10g1
s5e4e4e4e4s5s4e3s3e2g11g1
s6e4e4e4e4s5s4e3s3e2g12g1
s7e4e4e4e4s5s4e3s3e2g13g1
s8e4e4e4e4s5s4e3s3e2g14g1
s9e4e4e4e4s5s4e3s3e2g15g1
s10s8s7s9s6s16e1
s11*r5r5r5r5e4e4r5e4r5
s12*r4r4r4r4e4e4r4e4r4
s13*s8s7s9s6e4e4r2e4r2
s14*s8s7s9s6e4e4r1e4r1
s15*r3r3r3r3e4e4r3e4r3
s16*r7r7r7r7e4e4r7e4r7

Where

The error system is quite intuitive. Look up s1. Here we see that it has actions on imp, biimp, or, and, rpar all of which in our syntax follows an atom.

For this to work we have to change the type Error to Error of string.

Share