Boyer-Moore algorithm  
PROBLEM 
The text is an array T.1,T.2,...,T.N and the pattern is an array P.1,P.2,...,P.M. The elements of P. and T. are characters drawn from a finite alphabet Sigma., Sigma. is an array of R elements. We say that pattern P. occurs with shift S in text T. if 0<=S<=N-M and if  T.SpJ=P.J, where SpJ indicates S+J, for J=1,...,M. String-matching problem is  the problem of finding all valid shifts with which a given pattern occurs in a given text. 
   
 ALGORITHM 
The BOYER_MOORE_MATCHER algorithm was invented by Robert S. Boyer and J. Strother Moore. Its worst-case running time is O((N-M+1)*M+R). 
   
 PRACTICE 
When a text and a pattern are real strings then the fastest solution is by  REAL_STRING_MATCHER. In general case there is not an unambiguous winner. You have to test subroutines:   
 
on your data. Example, where M respectively R is number of elements in P. respectively in Sigma. and S is number of valid shifts: 
 
| Running time in seconds for N=100000 |  
| Algorithm | 
M=10,R=1999,S=50 | 
M=10,R=4,S=10000 | 
M=100,R=4,S=10000 | 
 
| Naive | 
16  | 
25  | 
150  | 
 
| KMP | 
21  | 
22  | 
23  | 
 
| Boyer-Moore | 
4  | 
15  | 
138  | 
 
 
IMPLEMENTATION 
Unit: 
internal subroutine
  
 Global variables: array T.1,...,T.N of strings, array P.1,...,P.M of strings, array Sigma.1,...,Sigma.R of strings
  
 Parameters:
 positive integer N - number of elements in T., positive integer M - number of elements in P., positive integer R - number of elements in Sigma.
  
 Output:
 displays on the screen Pattern occurs with shift S for all valid shift S
 
 
 
 
 BOYER_MOORE_MATCHER: 
 procedure expose T. P. Sigma.
 parse arg M, N, R 
 call COMPUTE_LAST_OCCURENCE_FUNCTION M, R 
 call COMPUTE_GOOD_SUFFIX_FUNCTION M 
 S = 0
 do while S <= N - M
   do J = M to 1 by -1
     SpJ = S + J
     if P.J <> T.SpJ then leave
   end
   if J = 0
     then do
       say "Pattern occurs at shift" S
       S = S + Gama.0
     end
     else do
       SpJ = S + J; TSpJ = T.SpJ
       S = S + MAX(Gama.J, J - Lambda.TSpJ)
     end
 end
 return
  
 COMPUTE_LAST_OCCURENCE_FUNCTION: 
 procedure expose P. Lambda. Sigma.
 parse arg M, R
 do I = 1 to R
   SigmaI = Sigma.I; Lambda.SigmaI = 0
 end
 do J = 1 to M
   PJ = P.J; Lambda.PJ = J
 end 
 return
  
 COMPUTE_GOOD_SUFFIX_FUNCTION:
 procedure expose Gama. P.
 parse arg M
 call COMPUTE_PREFIX_FUNCTION M
 do I = 1 to M
   PuvPi.I = Pi.I; PuvP.I = P.I
 end
 do I = M to 1 by -1
   MmIp1 = M - I + 1; P.I = PuvP.MmIp1
 end
 call COMPUTE_PREFIX_FUNCTION M
 do I = 1 to M; P.I = PuvP.I; end
 do J = 0 to M
   Gama.J = M - PuvPi.M
 end
 do L = 1 to M
   J = M - Pi.L
   if Gama.J > L - Pi.L 
     then Gama.J = L - Pi.L
 end
 return
  
 COMPUTE_PREFIX_FUNCTION: 
 procedure expose Pi. P.
 parse arg M
 Pi.1 = 0; K = 0
 do Q = 2 to M 
   Kp1 = K + 1
   do while K > 0 & P.Kp1 <> P.Q
     K = Pi.K; Kp1 = K + 1
   end
   Kp1 = K + 1
   if P.Kp1 = P.Q then K = K + 1
   Pi.Q = K
 end
 return 
  |   
  
CONNECTIONS 
 Literature 
Cormen T. H., Leiserson Ch. E., Rivest R. L. Introduction to Algorithms  The MIT Press, Cambridge, 1990 
  
  
 |