Sunday, February 3, 2019

Sherlock and Anagrams - Hacker Rank Solution

Two strings are anagrams of each other if the letters of one string can be rearranged to form the other string. Given a string, find the number of pairs of substrings of the string that are anagrams of each other.

For example , the list of all anagrammatic pairs is  at positions  respectively.
Function Description
Complete the function sherlockAndAnagrams in the editor below. It must return an integer that represents the number of anagrammatic pairs of substrings in .
sherlockAndAnagrams has the following parameter(s):
  • s: a string .
Input Format
The first line contains an integer , the number of queries.
Each of the next  lines contains a string  to analyze.
Constraints


String  contains only lowercase letters  ascii[a-z].
Output Format
For each query, return the number of unordered anagrammatic pairs.
Sample Input 0
2
abba
abcd
Sample Output 0
4
0
Explanation 0
The list of all anagrammatic pairs is  and  at positions  and  respectively.
No anagrammatic pairs exist in the second query as no character repeats.
Sample Input 1
2
ifailuhkqq
kkkk
Sample Output 1
3
10
Explanation 1
For the first query, we have anagram pairs  and  at positions  and respectively.
For the second query:
There are 6 anagrams of the form  at positions  and .
There are 3 anagrams of the form  at positions  and .
There is 1 anagram of the form  at position .
Sample Input 2
1
cdcd
Sample Output 2
5
Explanation 2
There are two anagrammatic pairs of length  and .
There are three anagrammatic pairs of length  at positions  respectively.

Sherlock and Anagrams - Hacker Rank Solution
Two string are anagrams if and only if for every letter occurring in any of them the number of its occurrences is equal in both the strings.
This definition is crucial and will lead to the solution. Since the only allowed letters are lowercase English letters, from  to , the alphabet size is constant and its size is . This allows us to assign a constant size signature to each of the substring of .
A signature of some string  will be a tuple of  elements where the -th element denotes the number of occurrences of the -th letter of the alphabet in .
So, for example, if  then its signature is , so the only non-zero elements are the ones corresponding to letter  with value of 2 and letter  with value of 1.
Notice, that any string that is an anagram of  will have the same signature as , and every string that is not an anagram of  will definitely have a different signature.
This concept of signatures allows the following approach.
Let's iterate over all substrings of  and for each fixed substring let's compute its signature and add that signature to signatures hashmap, where  denotes the number of substrings of  with a signature .
Finally, the only remaining thing to do is to get the number of pairs of substrings of  that are anagrams. It's easy to do having our hashmap. Notice that if there are  substrings of  with signature , then they can form  pairs of substrings with signature , so we can just iterate over all values in the hashmap and for each value  add  to the final result.
The below, commented code, in Python, illustrates this exact approach.
The time complexity is  since we iterate over all  substrings of  and for each substring we compute its signature in  time. It's worth to mention that each operation on hashmap has constant running time since our signatures have a constant size, i.e.  which is the size of our alphabet. Otherwise, if the alphabet size is not constant, this approach will have  time complexity.
import string

q = int(raw_input())

ALPHABET = string.ascii_lowercase

for _ in xrange(q):
    s = raw_input()
    signatures = {}

    signature = [0 for _ in ALPHABET]
    for letter in s:
        signature[ord(letter)-ord(ALPHABET[0])] += 1

    # iterate over all substrings of s
    for start in range(len(s)):
        for finish in range(start, len(s)):
            # initialize substring signature
            signature = [0 for _ in ALPHABET]
            for letter in s[start:finish+1]:
                signature[ord(letter)-ord(ALPHABET[0])] += 1
            # tuples are hashable in contrast to lists
            signature = tuple(signature)
            signatures[signature] = signatures.get(signature, 0) + 1

    res = 0
    for count in signatures.values():
        res += count*(count-1)/2
    print res

6 comments:

  1. Nice and clear explanation, thanks!

    ReplyDelete
  2. I have a little doubt at the loop where you are getting all the sub-strings considering "abcd" as our main string the sub strings that will be generated from the loop are "a", "ab", "abc", "abcd", "b", "bc", "bcd" and "c", "cd", "d". How does it cover the remaining substrings like "ac", "bd" etc.?

    ReplyDelete
  3. We don't need those substrings as only the ones in sequence are considered

    ReplyDelete
  4. #!/bin/python3

    import math
    import os
    import random
    import re
    import sys

    # Complete the sherlockAndAnagrams function below.
    def sherlockAndAnagrams(s):
    count=0
    for i in range(1,len(s)+1):
    for r in range(0,len(s)-i):
    for c in range(r+1,len(s)-i+1):
    if sorted(list(s[r:r+i]))==sorted(list(s[c:c+i])):
    count+=1
    return count
    if __name__ == '__main__':
    fptr = open(os.environ['OUTPUT_PATH'], 'w')

    q = int(input())

    for q_itr in range(q):
    s = input()

    result = sherlockAndAnagrams(s)

    fptr.write(str(result) + '\n')

    fptr.close()

    ReplyDelete
  5. #!/bin/python3

    import math
    import os
    import random
    import re
    import sys

    # Complete the sherlockAndAnagrams function below.
    def sherlockAndAnagrams(s):
    count=0
    for i in range(1,len(s)+1):
    for r in range(0,len(s)-i):
    for c in range(r+1,len(s)-i+1):
    if sorted(list(s[r:r+i]))==sorted(list(s[c:c+i])):
    count+=1
    return count
    if __name__ == '__main__':
    fptr = open(os.environ['OUTPUT_PATH'], 'w')

    q = int(input())

    for q_itr in range(q):
    s = input()

    result = sherlockAndAnagrams(s)

    fptr.write(str(result) + '\n')

    fptr.close()

    ReplyDelete

Powered by Blogger.