Knuth-Morris-Pratt (KMP) Algorithm
ABABDABACDABABCABAB
How KMP Algorithm Works
The Knuth-Morris-Pratt (KMP) algorithm is an efficient string matching algorithm that uses the observation that when a mismatch occurs, the pattern itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters.
The algorithm:
- Preprocess the pattern to create a Longest Proper Prefix which is also Suffix (LPS) array
- Use the LPS array to decide the next positions to match
- Skip characters that we know will anyway match
The KMP algorithm has a time complexity of O(n+m), where n is the length of the text and m is the length of the pattern. This makes it much more efficient than the naive approach, especially for large texts and patterns.
function KMP(text, pattern):
n = length of text
m = length of pattern
lps = computeLPSArray(pattern)
i = 0 // index for text
j = 0 // index for pattern
while i < n:
if pattern[j] == text[i]:
i = i + 1
j = j + 1
if j == m:
print "Pattern found at index", i - j
j = lps[j - 1]
elif i < n and pattern[j] != text[i]:
if j != 0:
j = lps[j - 1]
else:
i = i + 1
visualize(text, pattern, i, j, lps)
function computeLPSArray(pattern):
m = length of pattern
lps = array of size m
len = 0
i = 1
while i < m:
if pattern[i] == pattern[len]:
len = len + 1
lps[i] = len
i = i + 1
elif len != 0:
len = lps[len - 1]
else:
lps[i] = 0
i = i + 1
visualizeLPS(pattern, lps, i, len)
return lps
function visualize(text, pattern, i, j, lps):
// Update UI to show:
// 1. The full text with the current window highlighted
// 2. The pattern below the text, aligned with the current window
// 3. Highlight matching characters
// 4. Show the LPS array and its use in skipping comparisons
function visualizeLPS(pattern, lps, i, len):
// Update UI to show:
// 1. The pattern
// 2. The current state of the LPS array
// 3. Highlight the current comparison in the pattern