The second set we need calced, is First. This is defined as
First() = \emptyset First(a) = {a} First(\alpha \beta) = | First(\alpha) \union First(\beta) if Nullable(\alpha) | First(\alpha) otherwise First(N) = First(\alpha_{1}) \union ... \union First(\alpha_{n}) where N \rightarrow \alpha_{1}, ... , N \rightarrow \alpha_{n}For a, \alpha, \beta, N be terminal, set of grammar rules, same and nonterminal, respectively. Thus we need the Nullable-set. And we possibly need to iterate for grammars with productions with First-set dependent on productions defined later. For example observe the grammar
A \rightarrow "a" A | B B \rightarrow "b" B | C C \rightarrow A |In this case First(A) = First(B) \union {"a"}.
My first though was to do something as done for Nullable. But this can't be done, I discovered after implementing. Some grammars lock with unsolved = newUnsolved even though the First-set is solvable. Instead I found what is called fixpoint iteration in the book Introduction to Compiler Design. The idea is as follows:
This is a quite simple approach. The algorithm will stop since at some point no new elements can be added. After all these sets are finite. The above grammar will have init-sets as follows
First(A) = {"a"} \union First(B) First(B) = {"b"} \union First(C) First(C) = First(A) \union \emptysetHence First(A) contains dependencies union with the obvious {"a"}. And so on. With this grammar my Nullable-approach will surely fail since none of the init-sets can be added to solved.
When iterating we just keep adding elements
#Iteration1 First(A) = {"a"} \union {"b"} First(B) = {"b"} \union {"a"} First(C) = {"a","b"} #Iteration2 First(A) = {"a","b"} \union {"b","a"} First(B) = {"b","a"} \union {"a"} First(C) = {"a","b"} \union {"a","b"}No new elements are added, and we are finished.
For maintaining sets I found the delicious data-structure HashSet<T> in .NET. It's almost as created particularly for this task. When adding an element already present in the hash-set, the Add-method simply returns false. Thus we iterate inserting elements element wise until for every insertion Add returns false. Simple as that!
One part that makes these sets hard to calculate, I think, is that third rule when Nullable non-terminals are present. Observe the following grammar
A \rightarrow B C "a" B \rightarrow "b" | C \rightarrow "c" |Here we have that First(A) = {"a","b","c"}. We sort of recursively apply the third rule.