As we have seen in the previous section, a contiguous part of a word (obtained by erasing a prefix or/and a suffix) is a subword or factor. If we eliminate arbitrary letters from a word, what is obtained is a scattered subword, sometimes called subsequence. Special scattered subwords, in which the consecutive letters are at distance at least and at most in the original word, are called -subwords. More formally we give the following definition.
Let , , be positive integers, and let be a word over the alphabet . The word , where
, for ,
is a -subword of length of .
For example the -subwords of are: , , , , , , , , , , , , , , , , , , , , , , .
The number of different -subwords of a word is called -complexity and is denoted by .
For example, if , then .
Words with pairwise different letters are called rainbow words. The -complexity of a rainbow word of length does not depends on what letters it contains, and is denoted by .
To compute the -complexity of a rainbow word of length , let us consider the word (if , then ) and the corresponding digraph , with
For see Figure 26.11.
The adjacency matrix of the graph is defined by:
Because the graph has no directed cycles, the entry in row and column in (where , with ) will represent the number of -length directed paths from to . If is the identity matrix (with entries equal to 1 only on the first diagonal, and 0 otherwise), let us define the matrix :
The -complexity of a rainbow word is then
The matrix can be better computed using a variant of the well-known Warshall algorithm:
From we obtain easily . The time complexity of this algorithms is .
For example let us consider the graph in Figure 26.11. The corresponding adjacency matrix is:
After applying the Warshall algorithm:
and then , the sum of entries in .
The Warshall algorithm combined with the Latin square method can be used to obtain all nontrivial (with length at least 2) -subwords of a given rainbow word of length . Let us consider a matrix with the elements which are set of words. Initially this matrix is defined as
If and are sets of words, will be formed by the set of concatenation of each word from with each word from :
If is a word, let us denote by the word obtained from by erasing the first character: . Let us denote by the set in which we erase from each element the first character. In this case is a matrix with entries .
Starting with the matrix defined as before, the algorithm to obtain all nontrivial -subwords is the following:
The set of nontrivial -subwords is . The time complexity is also .
For , , , the initial matrix is:
Counting the one-letter subwords too, we obtain .
In this case instead of we will use . For a rainbow word, we will denote the number of -subwords which finish at the position . For
For simplicity, let us denote by . The -complexity of a rainbow word can be obtained by the formula
Because of (26.1) we can write in the case of
and the sequence is one of Fibonacci-type. For any we have and from this results. Therefore the numbers are defined by the following recurrence equations:
These numbers can be generated by the following generating function:
The -complexity can be expressed with these numbers by the following formula:
where is the generating function of the Fibonacci numbers (with , ). Then, from this formula we have
Figure 26.13 contains the values of for and .
The following theorem gives the value of in the case :
The main step in the proof is based on the formula
The value of can be also obtained by computing the number of sequences of length of 's and 's, with no more than adjacent zeros. In such a sequence one 1 represents the presence, one 0 does the absence of a letter of the word in a given -subword. Let denote the number of -length sequences of zeros and ones, in which the first and last position is 1, and the number of adjacent zeros is at most . Then it can be proved easily that
, for ,
, for all ,
because any such sequence of length () can be continued in order to obtain a similar sequence of length in only one way (by adding a sequence of the form on the right). For the following formula also can be derived:
If we add one 1 or 0 at an internal position (e.g at the position) of each sequences, then we obtain sequences of length , but from these, sequences will have adjacent zeros.
The generating function corresponding to is
By adding zeros on the left and/or on the right to these sequences, we can obtain the number , as the number of all these sequences. Thus
( zeros can be added in ways to these sequences: on the left and on the right, on the left and on the right, and so on).
From the above formula, the generating function corresponding to the complexities can be obtained as a product of the two generating functions and , thus:
In the sequel instead of we will use . In this case the distance between two letters picked up to be neighbours in a subword is at least .
Let us denote by the number of -subwords which begin at the position in a rainbow word of length . Using our previous example (abcdef), we can see that , , , , , and .
The following formula immediately results:
for , and ,
For simplicity, will be denoted by .
The -complexity of rainbow words can be computed by the formula:
This can be expressed also as
because of the formula
In the case the complexity can be computed easily: .
From (26.2) we get the following algorithm for the computation of . The numbers for a given and are obtained in the array . Initially all these elements are equal to . The call for the given and and the desired is:
BArray is a global one. Output
The recursive algorithm is the following:
B5 6 7
This algorithm is a linear one.
If the call is , the elements will be obtained in the following order: , , , , , , and .
Proof. Let us consider a rainbow word and let us count all its -subwords which begin with . If we change for in each -subword which begin with , we obtain -subwords too. If we add in front of each -subword which begin with , we obtain -subwords too. Thus
So is a Fibonacci number, and because , we obtain that .
If we use the notation , because of the formula
a generalized middle sequence will be obtained:
Let us call this sequence -middle sequence. Because of the equality , the -middle sequence can be considered as a generalization of the Fibonacci sequence.
Then next linear algorithm computes , by using an array to store the necessary previous elements:
DO6 print 7
Using the generating function , the following closed formula can be obtained:
This can be used to compute the sum , which is the coefficient of in the expansion of the function
So . Therefore
Proof. Let us consider the generating function . Then, taking into account the equation (26.6) we obtain . The general term in this expansion is equal to
and the coefficient of is equal to
The coefficient of is
By Theorem 26.26 , and an easy computation yields