Sunday, February 10, 2019

Common Child - Hacker Rank Solution

A string is said to be a child of a another string if it can be formed by deleting 0 or more characters from the other string. Given two strings of equal length, what's the longest string that can be constructed such that it is a child of both?

For example, ABCD and ABDC have two children with maximum length 3, ABC and ABD. They can be formed by eliminating either the D or C from both strings. Note that we will not consider ABCD as a common child because we can't rearrange characters and ABCD  ABDC.
Function Description
Complete the commonChild function in the editor below. It should return the longest string which is a common child of the input strings.
commonChild has the following parameter(s):
  • s1, s2: two equal length strings
Input Format
There is one line with two space-separated strings,  and .
Constraints
  • All characters are upper case in the range ascii[A-Z].
Output Format
Print the length of the longest string , such that  is a child of both  and .
Sample Input
HARRY
SALLY
Sample Output
 2
Explanation
The longest string that can be formed by deleting zero or more characters from  and  is , whose length is 2.
Sample Input 1
AA
BB
Sample Output 1
0
Explanation 1
 and  have no characters in common and hence the output is 0.
Sample Input 2
SHINCHAN
NOHARAAA
Sample Output 2
3
Explanation 2
The longest string that can be formed between  and  while maintaining the order is .
Sample Input 3
ABCDEF
FBDAMN
Sample Output 3
2
Explanation 3
 is the longest child of the given strings.

Common Child - Hacker Rank Solution

Let's discuss the example of two anagrams

Example Case
TERRACED
CRATERED

This makes it a little more compact. What we are basically trying to do here is to find the length of the longest common subsequence of these two strings while maintaining order (which is why the anagram case isn't just trivial.)

I find the easiest way to visualise this is to form a grid with the first string along the top and with the other string along the side.

Grid Form
  T E R R A C E D
C - - - - - - - -
R - - - - - - - -
A - - - - - - - -
T - - - - - - - -
E - - - - - - - -
R - - - - - - - -
E - - - - - - - -
D - - - - - - - -
You then simply sweep along from top left to bottom right searching for matches and recording any that you do find

Row 1
  T E R R A C E D
C - - - - - C C C
R - - - - - - - -
A - - - - - - - -
T - - - - - - - -
E - - - - - - - -
R - - - - - - - -
E - - - - - - - -
D - - - - - - - -
Once you find a match you continue that to the end so that it can be used later

Row 2
  T E R R A C E D
C - - - - - C C C
R - - R R R R R R
A - - - - - - - -
T - - - - - - - -
E - - - - - - - -
R - - - - - - - -
E - - - - - - - -
D - - - - - - - -
Here we can only use one R as we only have one available in CR (although there are two by this point in TERR). To account for this I either use the information to the left or the information from above plus my current character. This prevents clashes from the right. For the last 3 letters in row two we could have used either C or R.

This method effectively stops us from trying to use CR as the C is not available to the top or left at the point which we match the R

Row 3
  T E R R  A  C  E  D
C - - - -  -  C  C  C
R - - R R  R  R  R  R
A - - R R  RA RA RA RA
T - - - -  -  -  -  -
E - - - -  -  -  -  -
R - - - -  -  -  -  -
E - - - -  -  -  -  -
D - - - -  -  -  -  -
We now have a string of length 2 and could no longer have included the C as it comes after both R and A in TERRACED

Row 4 & 5
  T E  R  R  A  C  E   D
C - -  -  -  -  C  C   C
R - -  R  R  R  R  R   R
A - -  R  R  RA RA RA  RA
T T T  R  R  RA RA RA  RA
E T TE TE TE RA RA RAE RAE
R - - - -  -  -  -   -
E - - - -  -  -  -   -
D - - - -  -  -  -   -
We are now up to a string of length 3 - note that we could not have TEE as that would use the E twice - looking to the left this problem does not occur.

Row 6
  T E  R   R   A   C  E   D
C - -  -   -   -   C  C   C
R - -  R   R   R   R  R   R
A - -  R   R   RA  RA RA  RA
T T T  R   R   RA  RA RA  RA
E T TE TE  TE  RA  RA RAE RAE
R T TE TER TER TER TER TER TER 
E - -  -   -   -   -   -   -
D - -  -   -   -   -   -   -
We now have a different string of length 3. We will be extending it shortly. It doesn't actually matter as the question only asks for the length of a string but it is generally better to favour strings from the left over those from above as it gives you more options as you go down.

Row 7
  T E  R   R   A   C   E    D
C - -  -   -   -   C   C    C
R - -  R   R   R   R   R    R
A - -  R   R   RA  RA  RA   RA
T T T  R   R   RA  RA  RA   RA
E T TE TE  TE  RA  RA  RAE  RAE
R T TE TER TER TER TER TER  TER 
E T TE TER TER TER TER TERE TERE 
D - -  -   -   -   -   -    -

This is an interesting line as it shows how to cope with clashes from above. In this case when we hit the first TE from above we can not add our E to it as it is already being used for the first E. If we have a match we actually have to add to the cell above and to the left (unless we are at the first column, it which case we can only have a match one letter long). Otherwise we will use more letters than we have. By the time we come to the second E we are fine to add it however and now have a string of length 4.

Final Row
  T E  R   R   A   C   E    D
C - -  -   -   -   C   C    C
R - -  R   R   R   R   R    R
A - -  R   R   RA  RA  RA   RA
T T T  R   R   RA  RA  RA   RA
E T TE TE  TE  RA  RA  RAE  RAE
R T TE TER TER TER TER TER  TER 
E T TE TER TER TER TER TERE TERE 
D T TE TER TER TER TER TERE TERED

On processing our final row we have the result. TERED - 5 letter common string. The answer appears in the bottom right cell. While this does work it would not be efficient for large strings. It is however easy to convert to a length count.

Converting to Length Counts
In this case we fill the array with zeros and on a match we take the maximum of either the value in the cell to the left or the cell above left plus one (for our match). If there is no match we simply take the maximum of the cell to the left and the cell above.

Our example therefore becomes

Number Version
  T E R R A C E D
C 0 0 0 0 0 1 1 1
R 0 0 1 1 1 1 1 1
A 0 0 1 1 2 2 2 2
T 1 1 1 1 2 2 2 2
E 1 2 2 2 2 2 3 3
R 1 2 3 3 3 3 3 3
E 1 2 3 3 3 3 4 4
D 1 2 3 3 3 3 4 5

A little harder to follow but much more concise.

My code for this is in Python - note that it actually times out in the python interpreter which really doesn't like for loops. However it runs in time with the PyPy interpreter. So I've felt justified to add it here.

I also don't usually dump code in one big file (well not huge) - but it did cut the time down significantly to not have the dp as a class variable.

In order to save space I have simply used two rows for the matrix. This is fine as we only ever look up one row anyway. This is often true in dynamic programming and can save a significant amount of space. (Say in a case where the string lengths are 5000)
Problem Setter's code:
def LCS(s1, s2):
    n, m = len(s1), len(s2)
    lcs = [[0] * (m + 1) for _ in xrange(n + 1)]

    for i, c1 in enumerate(s1):
        for j, c2 in enumerate(s2):
            if c1 == c2:
                lcs[i][j] = lcs[i - 1][j - 1] + 1
            else:
                lcs[i][j] = max(lcs[i][j - 1], lcs[i - 1][j])

    return lcs[n - 1][m - 1]


if __name__ == '__main__':
    a = raw_input()
    b = raw_input()

    print LCS(a, b)

3 comments:

  1. not work, time limit exceededhttps://allhackerranksolutions.blogspot.com/logout?d=https://www.blogger.com/logout-redirect.g?blogID%3D7495129068344190013%26postID%3D162726857358923022

    ReplyDelete
  2. if you change "lcs[i][j] = max(lcs[i][j - 1], lcs[i - 1][j])" to if-else structures, you can solve this problem.
    ex)
    problem
    lcs[i][j] = max(lcs[i][j - 1], lcs[i - 1][j])

    solution

    else:
    if arr[i][j-1] > arr[i-1][j]:
    arr[i][j] = arr[i][j-1]
    else:
    arr[i][j] = arr[i - 1][j]

    ReplyDelete

Powered by Blogger.