-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Rabin–Karp algorithm #413
Conversation
Codecov Report
@@ Coverage Diff @@
## master #413 +/- ##
=============================================
+ Coverage 98.622% 98.635% +0.013%
=============================================
Files 26 26
Lines 3411 3444 +33
=============================================
+ Hits 3364 3397 +33
Misses 47 47
|
pydatastructs/strings/algorithms.py
Outdated
if q == 0: | ||
for i in range(t): | ||
positions.append(i) | ||
return positions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if q == 0: | |
for i in range(t): | |
positions.append(i) | |
return positions | |
if q == 0 or t == 0: | |
return positions | |
pydatastructs/strings/algorithms.py
Outdated
@@ -107,3 +109,38 @@ def _do_match(string, query, kmp_table): | |||
k = k + 1 | |||
|
|||
return positions | |||
|
|||
def hash_str(string, p=137, m=1e9 + 7): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def hash_str(string, p=137, m=1e9 + 7): | |
def _hash_str(string, p=137, m=1e9 + 7): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this implementation of hash function same as in https://www.cs.cmu.edu/afs/cs/academic/class/15451-f14/www/lectures/lec6/karp-rabin-09-15-14.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the implementation, I've used a similar hash function but not the exact same. Instead of doing T[i] * p^(j-i)
I've done T[i] * p^(i)
. So instead of iterating from len(T) to 0
, it is iterating from 0 to len(T)
Here is the reference https://cp-algorithms.com/string/string-hashing.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of repeating 1e9 + 7
, let's define it as a constant (PRIME_NUMBER = 10000000007
) somewhere at the starting of this file and then use it else where. We also need a Note in documentation of find
to explain the details of hash function like what value of p
and M
we are using and why.
Rest it's fine.
pydatastructs/strings/algorithms.py
Outdated
@@ -71,7 +73,7 @@ def _knuth_morris_pratt(text, query): | |||
|
|||
def _build_kmp_table(query): | |||
pos, cnd = 1, 0 | |||
kmp_table = OneDimensionalArray(int, len(query) + 1) | |||
kmp_table = OneDimensionalArray(int, len(query) + 5) # hacky fix to avoid error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is kmp_table = OneDimensionalArray(int, len(query) + 1)
then for the case where len(query) is 0
the line kmp_table[pos] = cnd
after the while statement will pop out an error index out of range
as pos is intialised with 1. I've increased the array size so it won't go out of range. kmp_table = OneDimensionalArray(int, len(query) + 2)
will also work though ;).
Rest the increased array size will not affect anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do this at the entry point of _kmp
.
Changes added:
|
Thanks @pratikgl |
Added Rabin–Karp algorithm to find API
References to other Issues or PRs or Relevant literature
Fixes #402
Brief description of what is fixed or changed
In the find API implemented in the string algorithm. I have added Rabin–Karp algorithm which finds the occurrence of a query string in a text using string hashing.
Other comments
Reference: https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm