LCS - Longest Common Subsequence

05.06.2018

Dynamic programming has the following properties

• Typically used for optimizing problems.
• The problem must contain optimal substructure and overlapping sub problems.
• 2 ways to implement: top-down approach with memoization and bottom up.

LCS - Longest Common Subsequence

Longest common subsequence of letters in two strings of length $m$ and $n$. No necessarily coherent. There are $2^{n + m}$ permutations.

Example : ABC and ACB. Construct a table, let the left most column and the topmost row contain zeroes. Every time $x_i = y_j$, increase the value of the cell and add right-up arrow. Else we point arrows up or right to largest neighboring cell and adding the value. As follows.

 A B C 0 0 0 0 A 0 1↖ 1← 1← C 0 1↑ 1↑ 2↖ B 0 1↑ 2↖ 2↑

Now by following the arrows from the largest number we can obtain $A,C$

Optimal substructure of LCS.

Theorem 15.1. Let $X = (x_1, x_2, ... , x_m)$ and $Y = (y_1, y_2 , ... , y_n)$ be sequences, and let $Z = (z_1, z_2 , ..., z_k)$ be any LCS of $X$ and $Y$.

1. If $x_m = y_n$, then $z_k = x_m = y_n$ and $Z_{k - 1}$ is an LCS of $X_{m - 1}$ and $Y_{n - 1}$.
2. If $x_m \ne y_n \land z_k \ne x_m$, then $Z$ is an LCS of $X_{m - 1}$ and $Y$.
3. If $x_m \ne y_n \land z_k \ne y_n$, then $Z$ is an LCS of $X$ and $Y_{n - 1}$.

*Observe that we have exhausted all cases: if $x_m \ne y_n$, then we have 2 OR 3. We proof the three as follows

1. If $z_k \ne x_m$, then we could obtain a LCS longer than $Z$ by appending $x_m = y_n$ to it. This is absurd since we have assumed $Z$ to be LCS. Thus we have $z_k = x_m = y_n$. Now assume for contradiction that there exists an LCS, $W$ of $X_{m - 1}$ and $Y_{n - 1}$ that is longer than $k - 1$, that is minimum $k$. Now append $x_m = y_n$ to this, and we have a LCS with length greater than or equal to $k + 1$ in contradiction to the assumption that $Z$ is an LCS with length $k$.
2. If $z_k \ne x_m$, then $Z$ is a common subsequence of $X_{m - 1}$ and $Y$.If there where a common subsequence, $W$, of $X_{m - 1}$ and $Y_{n}$ with length greater than $k$, then $W$ would also be a common subsequence of $X_m$ and $Y_n$ in contradiction to the assumption that $Z$ is LCS.
3. The same as (2).

Overlapping subproblems In order to find LCS of $X$ and $Y$ we may need to find LCS of $X$ and $Y_{n - 1}$ and LCS of $X_{m - 1}$ and $Y$, each of which contains the subsproblem of finding LCS of $X_{m - 1}$ and $Y_{n - 1}$. Formalized that is

• $x_m, y_n$ → $x_{m - 1}, y_{n - 1}$ if equal/same.
• $x_m,y_n$ → $x_m,y_{n - 1}/x_{m - 1}, y_n$ → $x_{m - 1},y_{n - 1}$ in second iteration.

Running Time & Space usage.

The resulting running time has been reduced from exponential to polynomial. The alg. fills a table of size $m \cdot n$.

The algorithm requires $O(m \cdot n)$ space to operate. This can be reduced if no backtracking is required - that is if we only need the length of the LCS, not the LCS it self. In this case we can discard all but the two most recent rows.