## 26.2. 26.2 Generalized complexity measures

As we have seen in the previous section, a contiguous part of a word (obtained by erasing a prefix or/and a suffix) is a subword or factor. If we eliminate arbitrary letters from a word, what is obtained is a scattered subword, sometimes called subsequence. Special scattered subwords, in which the consecutive letters are at distance at least and at most in the original word, are called -subwords. More formally we give the following definition.

Let , , be positive integers, and let be a word over the alphabet . The word , where

,

, for ,

,

is a -subword of length of .

For example the -subwords of are: , , , , , , , , , , , , , , , , , , , , , , .

The number of different -subwords of a word is called -complexity and is denoted by .

For example, if , then .

### 26.2.1. 26.2.1 Rainbow words

Words with pairwise different letters are called rainbow words. The -complexity of a rainbow word of length does not depends on what letters it contains, and is denoted by .

To compute the -complexity of a rainbow word of length , let us consider the word (if , then ) and the corresponding digraph , with

,

.

For see Figure 26.11.

The adjacency matrix of the graph is defined by:

Because the graph has no directed cycles, the entry in row and column in (where , with ) will represent the number of -length directed paths from to . If is the identity matrix (with entries equal to 1 only on the first diagonal, and 0 otherwise), let us define the matrix :

The -complexity of a rainbow word is then

The matrix can be better computed using a variant of the well-known Warshall algorithm:

Warshall( )

  1     2
FOR

TO

3
DO

FOR

TO

4
DO

FOR

TO

5
DO

6
RETURN



From we obtain easily . The time complexity of this algorithms is .

For example let us consider the graph in Figure 26.11. The corresponding adjacency matrix is:

After applying the Warshall algorithm:

and then , the sum of entries in .

The Warshall algorithm combined with the Latin square method can be used to obtain all nontrivial (with length at least 2) -subwords of a given rainbow word of length . Let us consider a matrix with the elements which are set of words. Initially this matrix is defined as

If and are sets of words, will be formed by the set of concatenation of each word from with each word from :

If is a word, let us denote by the word obtained from by erasing the first character: . Let us denote by the set in which we erase from each element the first character. In this case is a matrix with entries .

Starting with the matrix defined as before, the algorithm to obtain all nontrivial -subwords is the following:

Warshall-Latin( )

  1     2
FOR

TO

3
DO

FOR

TO

4
DO

FOR

TO

5
DO

IF

and    6
THEN

7
RETURN



The set of nontrivial -subwords is . The time complexity is also .

For , , , the initial matrix is:

and

Counting the one-letter subwords too, we obtain .

#### 26.2.1.1.  The case

In this case instead of we will use . For a rainbow word, we will denote the number of -subwords which finish at the position . For

For simplicity, let us denote by . The -complexity of a rainbow word can be obtained by the formula

Because of (26.1) we can write in the case of

Denoting

we get

and the sequence is one of Fibonacci-type. For any we have and from this results. Therefore the numbers are defined by the following recurrence equations:

for ,

for .

These numbers can be generated by the following generating function:

The -complexity can be expressed with these numbers by the following formula:

and

or

If then

where is the generating function of the Fibonacci numbers (with , ). Then, from this formula we have

and

Figure 26.13 contains the values of for and .

The following theorem gives the value of in the case :

Theorem 26.23 For we have

The main step in the proof is based on the formula

The value of can be also obtained by computing the number of sequences of length of 's and 's, with no more than adjacent zeros. In such a sequence one 1 represents the presence, one 0 does the absence of a letter of the word in a given -subword. Let denote the number of -length sequences of zeros and ones, in which the first and last position is 1, and the number of adjacent zeros is at most . Then it can be proved easily that

, for ,

,

, for all ,

because any such sequence of length () can be continued in order to obtain a similar sequence of length in only one way (by adding a sequence of the form on the right). For the following formula also can be derived:

If we add one 1 or 0 at an internal position (e.g at the position) of each sequences, then we obtain sequences of length , but from these, sequences will have adjacent zeros.

The generating function corresponding to is

By adding zeros on the left and/or on the right to these sequences, we can obtain the number , as the number of all these sequences. Thus

( zeros can be added in ways to these sequences: on the left and on the right, on the left and on the right, and so on).

From the above formula, the generating function corresponding to the complexities can be obtained as a product of the two generating functions and , thus:

#### 26.2.1.2.  The case

In the sequel instead of we will use . In this case the distance between two letters picked up to be neighbours in a subword is at least .

Let us denote by the number of -subwords which begin at the position in a rainbow word of length . Using our previous example (abcdef), we can see that , , , , , and .

The following formula immediately results:

for , and ,

For simplicity, will be denoted by .

The -complexity of rainbow words can be computed by the formula:

This can be expressed also as

because of the formula

In the case the complexity can be computed easily: .

From (26.2) we get the following algorithm for the computation of . The numbers for a given and are obtained in the array . Initially all these elements are equal to . The call for the given and and the desired is:

Input



FOR

TO

DO

B

Array  is a global one.       Output


The recursive algorithm is the following:

B( )

  1     2
FOR

TO

3
DO

IF

4
THEN

B

5       6     7
RETURN



This algorithm is a linear one.

If the call is , the elements will be obtained in the following order: , , , , , , and .

Lemma 26.24 , where is the th Fibonacci number.

Proof. Let us consider a rainbow word and let us count all its -subwords which begin with . If we change for in each -subword which begin with , we obtain -subwords too. If we add in front of each -subword which begin with , we obtain -subwords too. Thus

So is a Fibonacci number, and because , we obtain that .

Theorem 26.25 , where is the th Fibonacci number.

Proof. From equation (26.4) and Lemma 26.24:

If we use the notation , because of the formula

a generalized middle sequence will be obtained:

Let us call this sequence -middle sequence. Because of the equality , the -middle sequence can be considered as a generalization of the Fibonacci sequence.

Then next linear algorithm computes , by using an array to store the necessary previous elements:

Middle( )

  1     2
FOR

TO

3
DO

4
FOR

TO

5
DO

6       print    7
RETURN



Using the generating function , the following closed formula can be obtained:

This can be used to compute the sum , which is the coefficient of in the expansion of the function

So . Therefore

Theorem 26.26 , where and is the th elements of -middle sequence.

Proof. The proof is similar to that in Theorem 26.25 taking into account the equation (26.7).

Theorem 26.27 , for , .

Proof. Let us consider the generating function . Then, taking into account the equation (26.6) we obtain . The general term in this expansion is equal to

and the coefficient of is equal to

The coefficient of is

By Theorem 26.26 , and an easy computation yields

### 26.2.2. 26.2.2 General words

The algorithm Warshall-Latin can be used for nonrainbow words too, with the remark that repeating subwords must be eliminated. For the word and , the result is: , and with and we have .