The second set we need calced, is First. This is defined as

First() = \emptyset First(a) = {a} First(\alpha \beta) = | First(\alpha) \union First(\beta) if Nullable(\alpha) | First(\alpha) otherwise First(N) = First(\alpha_{1}) \union ... \union First(\alpha_{n}) where N \rightarrow \alpha_{1}, ... , N \rightarrow \alpha_{n}

For a, \alpha, \beta, N be terminal, set of grammar rules, same and nonterminal, respectively. Thus we need the Nullable-set. And we possibly need to iterate for grammars with productions with First-set dependent on productions defined later. For example observe the grammar

A \rightarrow "a" A | B B \rightarrow "b" B | C C \rightarrow A |

In this case First(A) = First(B) \union {"a"}.

My first though was to do something as done for Nullable. But this can't be done, I discovered after implementing. Some grammars lock with unsolved = newUnsolved even though the First-set is solvable. Instead I found what is called fixpoint iteration in the book Introduction to Compiler Design. The idea is as follows:

  1. Create an init-set with obvious (terminal) First-symbols along with dependencies.
  2. For every element in init add the resolved/obvious sets to solved. Also add dependencies if these are located in solved.
  3. Iterate until no new elements are added to solved.

This is a quite simple approach. The algorithm will stop since at some point no new elements can be added. After all these sets are finite. The above grammar will have init-sets as follows

First(A) = {"a"} \union First(B) First(B) = {"b"} \union First(C) First(C) = First(A) \union \emptyset

Hence First(A) contains dependencies union with the obvious {"a"}. And so on. With this grammar my Nullable-approach will surely fail since none of the init-sets can be added to solved.

When iterating we just keep adding elements

#Iteration1 First(A) = {"a"} \union {"b"} First(B) = {"b"} \union {"a"} First(C) = {"a","b"} #Iteration2 First(A) = {"a","b"} \union {"b","a"} First(B) = {"b","a"} \union {"a"} First(C) = {"a","b"} \union {"a","b"}

No new elements are added, and we are finished.

For maintaining sets I found the delicious data-structure HashSet<T> in .NET. It's almost as created particularly for this task. When adding an element already present in the hash-set, the Add-method simply returns false. Thus we iterate inserting elements element wise until for every insertion Add returns false. Simple as that!

One part that makes these sets hard to calculate, I think, is that third rule when Nullable non-terminals are present. Observe the following grammar

A \rightarrow B C "a" B \rightarrow "b" | C \rightarrow "c" |

Here we have that First(A) = {"a","b","c"}. We sort of recursively apply the third rule.