Two strings are anagrams of each other if the letters of one string can be rearranged to form the other string. Given a string, find the number of pairs of substrings of the string that are anagrams of each other.
For example , the list of all anagrammatic pairs is at positions respectively.
Function Description
Complete the function sherlockAndAnagrams in the editor below. It must return an integer that represents the number of anagrammatic pairs of substrings in .
sherlockAndAnagrams has the following parameter(s):
- s: a string .
Input Format
The first line contains an integer , the number of queries.
Each of the next lines contains a string to analyze.
Each of the next lines contains a string to analyze.
Constraints
String contains only lowercase letters ascii[a-z].
Output Format
For each query, return the number of unordered anagrammatic pairs.
Sample Input 0
2
abba
abcd
Sample Output 0
4
0
Explanation 0
The list of all anagrammatic pairs is and at positions and respectively.
No anagrammatic pairs exist in the second query as no character repeats.
Sample Input 1
2
ifailuhkqq
kkkk
Sample Output 1
3
10
Explanation 1
For the first query, we have anagram pairs and at positions and respectively.
For the second query:
There are 6 anagrams of the form at positions and .
There are 3 anagrams of the form at positions and .
There is 1 anagram of the form at position .
There are 6 anagrams of the form at positions and .
There are 3 anagrams of the form at positions and .
There is 1 anagram of the form at position .
Sample Input 2
1
cdcd
Sample Output 2
5
Explanation 2
There are two anagrammatic pairs of length : and .
There are three anagrammatic pairs of length : at positions respectively.
There are three anagrammatic pairs of length : at positions respectively.
Sherlock and Anagrams - Hacker Rank Solution
Two string are anagrams if and only if for every letter occurring in any of them the number of its occurrences is equal in both the strings.
This definition is crucial and will lead to the solution. Since the only allowed letters are lowercase English letters, from to , the alphabet size is constant and its size is . This allows us to assign a constant size signature to each of the substring of .
A signature of some string will be a tuple of elements where the -th element denotes the number of occurrences of the -th letter of the alphabet in .
So, for example, if then its signature is , so the only non-zero elements are the ones corresponding to letter with value of 2 and letter with value of 1.
Notice, that any string that is an anagram of will have the same signature as , and every string that is not an anagram of will definitely have a different signature.
This concept of signatures allows the following approach.
Let's iterate over all substrings of and for each fixed substring let's compute its signature and add that signature to signatures hashmap, where denotes the number of substrings of with a signature .
Finally, the only remaining thing to do is to get the number of pairs of substrings of that are anagrams. It's easy to do having our hashmap. Notice that if there are substrings of with signature , then they can form pairs of substrings with signature , so we can just iterate over all values in the hashmap and for each value add to the final result.
The below, commented code, in Python, illustrates this exact approach.
The time complexity is since we iterate over all substrings of and for each substring we compute its signature in time. It's worth to mention that each operation on hashmap has constant running time since our signatures have a constant size, i.e. which is the size of our alphabet. Otherwise, if the alphabet size is not constant, this approach will have time complexity.
import string
q = int(raw_input())
ALPHABET = string.ascii_lowercase
for _ in xrange(q):
s = raw_input()
signatures = {}
signature = [0 for _ in ALPHABET]
for letter in s:
signature[ord(letter)-ord(ALPHABET[0])] += 1
# iterate over all substrings of s
for start in range(len(s)):
for finish in range(start, len(s)):
# initialize substring signature
signature = [0 for _ in ALPHABET]
for letter in s[start:finish+1]:
signature[ord(letter)-ord(ALPHABET[0])] += 1
# tuples are hashable in contrast to lists
signature = tuple(signature)
signatures[signature] = signatures.get(signature, 0) + 1
res = 0
for count in signatures.values():
res += count*(count-1)/2
print res
Nice and clear explanation, thanks!
ReplyDeleteI have a little doubt at the loop where you are getting all the sub-strings considering "abcd" as our main string the sub strings that will be generated from the loop are "a", "ab", "abc", "abcd", "b", "bc", "bcd" and "c", "cd", "d". How does it cover the remaining substrings like "ac", "bd" etc.?
ReplyDeleteWe don't need those substrings as only the ones in sequence are considered
ReplyDelete#!/bin/python3
ReplyDeleteimport math
import os
import random
import re
import sys
# Complete the sherlockAndAnagrams function below.
def sherlockAndAnagrams(s):
count=0
for i in range(1,len(s)+1):
for r in range(0,len(s)-i):
for c in range(r+1,len(s)-i+1):
if sorted(list(s[r:r+i]))==sorted(list(s[c:c+i])):
count+=1
return count
if __name__ == '__main__':
fptr = open(os.environ['OUTPUT_PATH'], 'w')
q = int(input())
for q_itr in range(q):
s = input()
result = sherlockAndAnagrams(s)
fptr.write(str(result) + '\n')
fptr.close()
perfect
Delete#!/bin/python3
ReplyDeleteimport math
import os
import random
import re
import sys
# Complete the sherlockAndAnagrams function below.
def sherlockAndAnagrams(s):
count=0
for i in range(1,len(s)+1):
for r in range(0,len(s)-i):
for c in range(r+1,len(s)-i+1):
if sorted(list(s[r:r+i]))==sorted(list(s[c:c+i])):
count+=1
return count
if __name__ == '__main__':
fptr = open(os.environ['OUTPUT_PATH'], 'w')
q = int(input())
for q_itr in range(q):
s = input()
result = sherlockAndAnagrams(s)
fptr.write(str(result) + '\n')
fptr.close()