pathterminuspages/notes/aboutcontactabout me

LCS - Longest Common Subsequence

05.06.2018

Dynamic programming has the following properties

  • Typically used for optimizing problems.
  • The problem must contain optimal substructure and overlapping sub problems.
  • 2 ways to implement: top-down approach with memoization and bottom up.

LCS - Longest Common Subsequence

Longest common subsequence of letters in two strings of length $m$ and $n$. No necessarily coherent. There are $2^{n + m}$ permutations.

Example : ABC and ACB. Construct a table, let the left most column and the topmost row contain zeroes. Every time $x_i = y_j$, increase the value of the cell and add right-up arrow. Else we point arrows up or right to largest neighboring cell and adding the value. As follows.

ABC
0000
A01↖1←1←
C01↑1↑2↖
B01↑2↖2↑

Now by following the arrows from the largest number we can obtain $A,C$

Optimal substructure of LCS.

Theorem 15.1. Let $X = (x_1, x_2, ... , x_m)$ and $Y = (y_1, y_2 , ... , y_n)$ be sequences, and let $Z = (z_1, z_2 , ..., z_k)$ be any LCS of $X$ and $Y$.

  1. If $x_m = y_n$, then $z_k = x_m = y_n$ and $Z_{k - 1}$ is an LCS of $X_{m - 1}$ and $Y_{n - 1}$.
  2. If $x_m \ne y_n \land z_k \ne x_m$, then $Z$ is an LCS of $X_{m - 1}$ and $Y$.
  3. If $x_m \ne y_n \land z_k \ne y_n$, then $Z$ is an LCS of $X$ and $Y_{n - 1}$.

*Observe that we have exhausted all cases: if $x_m \ne y_n$, then we have 2 OR 3. We proof the three as follows

  1. If $z_k \ne x_m$, then we could obtain a LCS longer than $Z$ by appending $x_m = y_n$ to it. This is absurd since we have assumed $Z$ to be LCS. Thus we have $z_k = x_m = y_n$. Now assume for contradiction that there exists an LCS, $W$ of $X_{m - 1}$ and $Y_{n - 1}$ that is longer than $k - 1$, that is minimum $k$. Now append $x_m = y_n$ to this, and we have a LCS with length greater than or equal to $k + 1$ in contradiction to the assumption that $Z$ is an LCS with length $k$.
  2. If $z_k \ne x_m$, then $Z$ is a common subsequence of $X_{m - 1}$ and $Y$.If there where a common subsequence, $W$, of $X_{m - 1}$ and $Y_{n}$ with length greater than $k$, then $W$ would also be a common subsequence of $X_m$ and $Y_n$ in contradiction to the assumption that $Z$ is LCS.
  3. The same as (2).

Overlapping subproblems In order to find LCS of $X$ and $Y$ we may need to find LCS of $X$ and $Y_{n - 1}$ and LCS of $X_{m - 1}$ and $Y$, each of which contains the subsproblem of finding LCS of $X_{m - 1}$ and $Y_{n - 1}$. Formalized that is

  • $x_m, y_n$ → $x_{m - 1}, y_{n - 1}$ if equal/same.
  • $x_m,y_n$ → $x_m,y_{n - 1}/x_{m - 1}, y_n$ → $x_{m - 1},y_{n - 1}$ in second iteration.

Running Time & Space usage.

The resulting running time has been reduced from exponential to polynomial. The alg. fills a table of size $m \cdot n$.

The algorithm requires $O(m \cdot n)$ space to operate. This can be reduced if no backtracking is required - that is if we only need the length of the LCS, not the LCS it self. In this case we can discard all but the two most recent rows.

CommentsGuest Name:Comment: