Rabin Karp

Rabin Karp

A string-searching algorithm that uses rolling hash to find an exact match of a pattern string in a text.

The expected time complexity is O(S+T) where S and T are the lengths of the original string s and the pattern string t respectively.

Its worst-case time complexity is O(ST).

Algorithm

First get the hash ht of the pattern string t.

We keep a sliding window of size T and compute the hash hs of the substring of s within the window.

Whenever ht == hs, the part in the window might be the same as t. In case of hash conflict, we linearly check the equity of them.

Implementation

Version 1

  • The hashes are signed long long.

  • We need to use a number m to mod the hashes in case of overflow.

  • We need to do hs += m when hs < 0.

// OJ: https://leetcode.com/problems/implement-strstr/
// Author: github.com/lzl124631x
// Time: average O(S+T), worst O(ST)
// Space: O(1)
class Solution {
    typedef long long LL;

public:
    int strStr(string s, string t) {
        int S = s.size(), T = t.size(), d = 128; 
        if (!S || !T || T > S) return T ? -1 : 0;
        LL m = 1e9+7, p = 1, hs = s[0], ht = t[0];
        for (int i = 1; i < T; ++i) {
            p = (p * d) % m; // we need d^(T-1)
            ht = (ht * d + t[i]) % m;
            hs = (hs * d + s[i]) % m;
        }
        for (int i = 0; i <= S - T; ++i) { // Loop for each start/pop point
            if (hs == ht) { // in case of collision, check the equity.
                int j = 0;
                for (; j < T && s[i + j] == t[j]; ++j);
                if (j == T) return i;
            }
            if (i < S - T) hs = ((hs - s[i] * p) * d + s[i + T]) % m;
            if (hs < 0) hs += m; // we might get negative hs, converting it to positive
        }
        return -1;
    }
};

Version 2

  • This version is translated from standard golang implementation.

  • The hashes are unsigned int.

  • No mod number m is needed! It will automatically mod by 2^32 when overflow happens.

Version 3

Why 1099511628211?

1099511628211 is the FNV(Fowler–Noll–Vo) prime for 64 bit integer. When using FNV prime, the hash values produced are more scattered throughout the hash space.

See more in FNV primearrow-up-right

Application

Rabin Karp algorithm or rolling hash is often used in problems where you need to find repeated subarray/substring in a long array/string.

Examples:

The following function is used in 718. Maximum Length of Repeated Subarray (Medium)arrow-up-right to generate the rolling hash.

Prefix Hash Array

Example: 1698. Number of Distinct Substrings in a String (Medium)arrow-up-right

Problem

Reference

TODO

https://codeforces.com/blog/entry/4898arrow-up-right

Last updated

Was this helpful?