Nullable

The Nullable-set is pretty straight forward, at least the formal version. It's given as

Nullable() = true Nullable(a) = false Nullable(\alpha \beta) = Nullable(\alpha) \land Nullable(\beta) Nullable(N) = Nullable(\alpha_{1}) \lor ... \lor Nullable(\alpha_{n}), where N \rightarrow \alpha_{1}, ..., N \rightarrow \alpha_{n}

But say we have some grammar for the regex a*b*

A \rightarrow 'a' A | B B \rightarrow 'b' B |

Both the productions A and B are nullable. But if we start computing on A, then we need the unresolved Nullable(B) to finish. We thus need to iterate when Nullable(B) has been found.

The grammar when parsed looks like this

("A",[[Term "a";NonTerm "A"];[NonTerm "B"]]) ("B",[[Term "b";NonTerm "B"];[]])

Where tuples are key * value into dictionaries. To calculate nullable we need a cooperation between logical ∧ and ∨ and nullable-values of the lists in the value of the tuple. Let for explanation

t_{a} = Term "a" t_{b} = Term "b" nonT_{A} = NonTerm "A" nonT_{B} = NonTerm "B" rightA_{1} = t_{a} nonT_{A} rightA_{2} = nonT_{B} rightB_{1} = t_{b} nonT_{B} rightB_{2} = \epsilon

Where ε be emtpy. Now

null_{A1} = Nullable(rightA_{1}) = Nullable(t_{a}) \land Nullable(nonT_{A}) null_{A2} = Nullable(rightA_{2}) = Nullable(nonT_{B}) null_{B1} = Nullable(rightB_{1}) = Nullable(t_{b}) \land Nullable(nonT_{B}) null_{B2} = Nullable(rightB_{2}) = true

These depend on conjunctions. Furthermore we get

Nullable(A) = null_{A1} \lor null_{A2} Nullable(B) = null_{B1} \lor null_{B2}

These depend on disjunctions. To optimize and better our understanding we can use short circuit evaluation. Say a production contains a set of right-sides. For every right-side, if it contains just one terminal, this right-side is not nullable. For every right-side, if just a single one is nullable, the whole production is nullable.

Since we have to iterate, we have to take into account grammars that are circular. Say we have the grammar

A \rightarrow B B \rightarrow A | 'b'

The alternate B-production containing the terminal 'b' is just there to point out that this could be a grammar. It isn't totally useless. I haven't allowed grammars like the circular one in this application. So I can use my own algorithm to calculate Nullable. The idea is taken from fixpoint iterations used to solve the next two sets. This one works as follows

For every production, P_{0} ,iterate through every right-side adding the following to an initIter-dictionary
- If current right-side is empty, then add true for whole P_{0}
- If current right-side is not empty, then iterate through every symbol of this right-side. If a terminal is found, add false for whole right-side. Else add dependency of found non-terminals.
Init empty solved and unsolved-dictionaries. For every production, P_{0}, iterate through right-sides as follows
- If true, then add Nullable(P_{0}) = true to solved. In this case do not add anything to unsolved. This happens on empty right-sides in (1).
- If dependencies are found, then for each dependency, d_{0}, do
  d_{0} \in solved \land solved(d_{0}) = false \Rightarrow (short-circuit) right-side = false empty deps d_{0} \in solved \land solved(d_{0}) = true \Rightarrow remove d_{0} from deps right-side = true d_{0} \notin solved \Rightarrow keep dep in deps
  If there aren't any dependencies left, add Nullable(P_{0}) = right-side to solved. Else add dependencies to unsolved.
Now
- If unsolved is empty, then return solved
- Else if unsolved.Count = initIter.Count, nothing has happened. Report error.
- Else goto 2 with initIter = unsolved, and with the obtained solved instead of initializing a new one.

The idea is to gradually remove any unsolved production using values from solved ones. This idea does not work with the above circular grammar since we can't obtain any Nullable-values and thus can't remove anything from unsolved: B \rightarrow 'b' isn't nullable, and we have to obtain Nullable(A) to close the disjunction.