Sunday, February 3, 2019

Fraudulent Activity Notifications - Hacker Rank Solution

HackerLand National Bank has a simple policy for warning clients about possible fraudulent account activity. If the amount spent by a client on a particular day is greater than or equal to  the client's median spending for a trailing number of days, they send the client a notification about potential fraud.
The bank doesn't send the client any notifications until they have at least that trailing number of prior days' transaction data.
Given the number of trailing days  and a client's total daily expenditures for a period of  days, find and print the number of times the client will receive a notification over all  days.
For example,  and . On the first three days, they just collect spending data. At day , we have trailing expenditures of . The median is  and the day's expenditure is . Because , there will be a notice. The next day, our trailing expenditures are  and the expenditures are . This is less than  so no notice will be sent. Over the period, there was one notice sent.
Note: The median of a list of numbers can be found by arranging all the numbers from smallest to greatest. If there is an odd number of numbers, the middle one is picked. If there is an even number of numbers, median is then defined to be the average of the two middle values. (Wikipedia)
Function Description
Complete the function activityNotifications in the editor below. It must return an integer representing the number of client notifications.
activityNotifications has the following parameter(s):
  • expenditure: an array of integers representing daily expenditures
  • d: an integer, the lookback days for median spending
Input Format
The first line contains two space-separated integers  and , the number of days of transaction data, and the number of trailing days' data used to calculate median spending. 
The second line contains  space-separated non-negative integers where each integer  denotes .
Constraints
Output Format
Print an integer denoting the total number of times the client receives a notification over a period of  days.
Sample Input 0
9 5
2 3 4 2 3 6 8 4 5
Sample Output 0
2
Explanation 0
We must determine the total number of  the client receives over a period of  days. For the first five days, the customer receives no notifications because the bank has insufficient transaction data: .
On the sixth day, the bank has  days of prior transaction data, , and  dollars. The client spends  dollars, which triggers a notification because .
On the seventh day, the bank has  days of prior transaction data, , and  dollars. The client spends  dollars, which triggers a notification because .
On the eighth day, the bank has  days of prior transaction data, , and  dollars. The client spends  dollars, which does not trigger a notification because .
On the ninth day, the bank has  days of prior transaction data, , and a transaction median of  dollars. The client spends  dollars, which does not trigger a notification because .
Sample Input 1
5 4
1 2 3 4 4
Sample Output 1
0
There are  days of data required so the first day a notice might go out is day . Our trailing expenditures are  with a median of  The client spends  which is less than  so no notification is sent.


Fraudulent Activity Notifications - Hacker Rank Solution

In this problem, you need to find the running median. There can be several approaches to solving this. Notice that a client can spend at most $200 per day. We can take advantage of this small number.

Finding Median using Counting Sort

Let's see how can we find the median of an array using counting sort with an example. Suppose the array is . The maximum number in the array is 7. If you write down the frequency of each numbers from  to , you will get a table like this:
There are  elements in the array, so the median is the  number in the sorted array. You can loop over the frequency table to find the  number.

Get back to the original problem

In the original problem, you need to maintain a frequency table for each window of size  in the array. You can do it by keeping track of the starting and ending point of the window. Let's suppose the start point of the current window is  and the end point is  and . Also, assume you already have a frequency table for that window. When you go to next window , you can update the frequency table by reducing the frequency of the element in index  and by increasing the frequency of the element in index . Using the frequency table you can find the median and your problem is solved.
The complexity is 

Tricky Part

Note that the median can be a floating point value when the size of the array is even. For example, if the array is , the median is . You will not pass the 2nd sample I/O if you don't handle it properly.

Another Solution

This can be solved using two priority queues in This video explains it nicely.
Problem Setter's code:
import sys
#sys.stdin = open("in", "r")
n, d = map(int, raw_input().split())
arr = map(int, raw_input().split())

dic = {}

def find(idx):
    s = 0
    for i in xrange(0, 200):
        freq = 0
        if i in dic:
            freq = dic[i]
        s = s + freq
        if s>=idx:
            return i
        
ans = 0
for i in xrange(0, n):
    val = arr[i]
    
    if i>=d:
        med=find(d/2 + d%2)
        
        if d%2==0:
            ret = find(d/2+1)
            if val >=med + ret:
                ans = ans+1
        else:
            if val>=med*2:
                ans = ans + 1

    if val not in dic: dic[val] = 0
    dic[val] = dic[val] + 1
    
    #print i,dic
    if i>=d:
        prev = arr[i-d]
        dic[prev] = dic[prev]-1

print ans
Problem Tester's code:
#include <iostream>
#include <cstdio>
#include <algorithm>
#include <cstring>
#include <ctime>
#include <cassert>
using namespace std;
#define SZ(x) ((int)(x.size()))
#define FOR(i,n) for(int (i)=0;(i)<(n);++(i))
#define FOREACH(i,t) for (typeof(t.begin()) i=t.begin(); i!=t.end(); i++)
#define REP(i,a,b) for(int (i)=(a);(i)<=(b);++i)


typedef long long ll;
const int INF = 1e9;

const int N = 2e5;
const int V = 200;

int a[N];

int cnt[V+1];

int main()
{
    ios_base::sync_with_stdio(0);
    int n, d;
    cin >> n >> d;
    assert(n >= 1 && n <= N);
    assert(d >= 1 && d <= n);
    FOR(i, n) cin >> a[i];
    FOR(i, n) assert(a[i] >= 0 && a[i] <= V);
    int res = 0;

    FOR(i, d) cnt[a[i]]++;
    REP(i, d, n-1)
    {
        //SOLVE HERE
        int acc = 0;
        int low_median = -1, high_median = -1;
        REP(v, 0, V)
        {
            acc += cnt[v];
            if(low_median == -1 && acc >= int(floor((d+1)/2.0)))
            {
                low_median = v;
            }
            if(high_median == -1 && acc >= int(ceil((d+1)/2.0)))
            {
                high_median = v;
            }
        }
        assert(acc == d);
        int double_median = low_median + high_median;
        //cout << low_median << " " << high_median << " -> " << median << endl;
        if(a[i] >= double_median)
        {
            res++;
        }
        cnt[a[i-d]]--;
        cnt[a[i]]++;
    }
    cout << res << endl;
    return 0;
}

1 comment:

Powered by Blogger.