12.3. 12.3 Eigenvalue problems

The set of complex -vectors will be denoted by . Similarly, denotes the set of complex matrices.

Definition 12.24 Let be an arbitrary matrix. The number is the eigenvalue of if there is vector () such that

Vector is called the (right) eigenvector of that belongs to the eigenvalue .

Equation can be written in the equivalent form , where is the unit matrix of appropriate size. The latter homogeneous linear system has a nonzero solution if and only if

Equation (12.38) is called the characteristic equation of matrix . The roots of this equation are the eigenvalues of matrix . Expanding we obtain a polynomial of degree :

This polynomial called the characteristic polynomial of . It follows from the fundamental theorem of algebra that any matrix has exactly eigenvalues with multiplicities. The eigenvalues may be complex or real. Therefore one needs to use complex arithmetic for eigenvalue calculations. If the matrix is real and the computations are done in real arithmetic, the complex eigenvalues and eigenvectors can be determined only with special techniques.

If is an eigenvector, (), then is also eigenvector. The number of linearly independent eigenvectors that belong to an eigenvalue does not exceed the multiplicity of in the characteristic equation (12.38). The eigenvectors that belong to different eigenvalues are linearly independent.

The following results give estimates for the size and location of the eigenvalues.

Theorem 12.25 Let be any eigenvalue of matrix . The upper estimate holds in any induced matrix norm.

Theorem 12.26 (Gersgorin) Let ,

and

Then for any eigenvalue of we have .

For certain matrices the solution of the characteristic equation (12.38) is very easy. For example, if is a triangular matrix, then its eigenvalues are entries of the main diagonal. In most cases however the computation of all eigenvalues and eigenvectors is a very difficult task. Those transformations of matrices that keeps the eigenvalues unchanged have practical significance for this problem. Later we see that the eigenvalue problem of transformed matrices is simpler.

Definition 12.27 The matrices are similar if there is a matrix such that . The mapping is said to be similarity transformation of .

Theorem 12.28 Assume that . Then the eigenvalues of and are the same. If is the eigenvector of , then is the eigenvector of .

Similar matrices have the same eigenvalues.

The difficulty of the eigenvalue problem also stems from the fact that the eigenvalues and eigenvectors are very sensitive (unstable) to changes in the matrix entries. The eigenvalues of and the perturbed matrix may differ from each other significantly. Besides the multiplicity of the eigenvalues may also change under perturbation. The following theorems and examples show the very sensitivity of the eigenvalue problem.

Theorem 12.29 (Ostrowski, Elsner) For every eigenvalue of matrix there exists an eigenvalue of the perturbed matrix such that

We can observe that the eigenvalues are changing continuously and the size of change is proportional to the root of .

Example 12.12 Consider the following perturbed Jordan matrix of the size :

The characteristic equation is , which gives the different eigenvalues

instead of the original eigenvalue with multiplicity . The size of change is , which corresponds to Theorem 12.29. If , and , then the perturbation size of the eigenvalues is . This is a significant change relative to the input perturbation .

For special matrices and perturbations we may have much better perturbation bounds.

Theorem 12.30 (Bauer, Fike) Assume that is diagonalisable, that is a matrix exists such that . Denote an eigenvalue of . Then

This result is better than that of Ostrowski and Elsner. Nevertheless , which is generally unknown, can be very big.

The eigenvalues are continuous functions of the matrix entries. This is also true for the normalized eigenvectors if the eigenvalues are simple. The following example shows that this property does not hold for multiple eigenvalues.

Example 12.13 Let

The eigenvalues of are and . Vector is the eigenvector belonging to . Vector is the eigenvector belonging to . If , then

while the eigenvectors do not have limit.

We study the numerical solution of the eigenvalue problem in the next section. Unfortunately it is very difficult to estimate the goodness of numerical approximations. From the fact that holds with a certain error we cannot conclude anything in general.

Example 12.14 Consider the matrix

where is small. The eigenvalues of are , while the corresponding eigenvectors are . Let be an approximation of the eigenvalues and let be the approximate eigenvector. Then

If , then the residual error under estimate the true error by five order.

Remark 12.31 We can define the condition number of eigenvalues for simple eigenvalues:

where and are the right and left eigenvectors, respectively. For multiple eigenvalues the condition number is not finite.

12.3.1. 12.3.1 Iterative solutions of the eigenvalue problem

We investigate only the real eigenvalues and eigenvectors of real matrices. The methods under consideration can be extended to the complex case with appropriate modifications.

The power method.

This method is due to von Mieses. Assume that has exactly different real eigenvalues. Then the eigenvectors belonging to the corresponding eigenvalues are linearly independent. Assume that the eigenvalues satisfy the condition

and let be a given vector. This vector is a unique linear combination of the eigenvectors, that is . Assume that and compute the sequence (). The initial assumptions imply that

Let be an arbitrary vector such that . Then

Given the initial vector , the power method has the following form.

Power-Method( )

  1     2  
                        WHILE
                      
                     exit condition = FALSE   3    
                        DO
                      
                        4          5       Select vector  such that    6          7          8  
                        RETURN
                      
                      
                  

It is clear that

The convergence here means that , that is the action line of tends to the action line of . There are various strategies to select . We can select , where is defined by . If we select , then will be identical with the Rayleigh quotient . This choice gives an approximation of that have the minimal residual norm (Example 12.14 shows that this choice is not necessarily the best option).

The speed of convergence depends on the quotient . The method is very sensitive to the choice of the initial vector . If , then the process does not converge to the dominant eigenvalue . For certain matrix classes the power method converges with probability if the initial vector is randomly chosen. In case of complex eigenvalues or multiple we have to use modifications of the algorithm. The speed of convergence can be accelerated if the method is applied to the shifted matrix , where is an appropriately chosen number. The shifted matrix has the eigenvalues and the corresponding convergence factor . The latter quotient can be made smaller than with the proper selection of .

The usual exit condition of the power method is

If we simultaneously apply the power method to the transposed matrix and , then the quantity

gives an estimate for the condition number of (see Remark 12.31). In such a case we use the exit condition

The power method is very useful for large sparse matrices. It is often used to determine the largest and the smallest eigenvalue. We can approximate the smallest eigenvalue as follows. The eigenvalues of are . The eigenvalue will be the eigenvalue with the largest modulus. We can approximate this value by applying the power method to . This requires only a small modification of the algorithm. We replace line 4. with the following:

 Solve equation for  

The modified algorithm is called the inverse power method. It is clear that and hold under appropriate conditions. If we use the -method to solve , we can avoid the inversion of .

If the inverse power method is applied to the shifted matrix , then the eigenvalues of are . If approaches, say, to , then . Hence the inequality

holds for the eigenvalues of the shifted matrix. The speed of convergence is determined by the quotient

If is close enough to , then is very small and the inverse power iteration converges very fast. This property can be exploited in the calculation of approximate eigenvectors if an approximate eigenvalue, say , is known. Assuming that , we apply the inverse power method to the shifted matrix . In spite of the fact that matrix is nearly singular and the linear equation cannot be solved with high precision, the algorithm gives very often good approximations of the eigenvectors.

Finally we note that in principle the von Mieses method can be modified to determine all eigenvalues and eigenvectors.

12.3.1.1.  Orthogonalisation processes.

We need the following definition and theorem.

Definition 12.32 The matrix is said to be orthogonal if .

Theorem 12.33 (-decomposition) Every matrix having linearly independent column vectors can be decomposed in the product form , where is orthogonal and is upper triangular matrix.

We note that the -decomposition can be applied for solving linear systems of equations, similarly to the -decomposition. If the -decomposition of is known, then the equation can be written in the equivalent form . Thus we have to solve only an upper triangular linear system.

There are several methods to determine the -decomposition of a matrix. In practice the Givens-, the Householder- and the MGS-methods are used.

The MGS (Modified Gram-Schmidt) method is a stabilised, but algebraically equivalent version of the classical Gram-Schmidt orthogonalisation algorithm. The basic problem is the following: We seek for an orthonormal basis of the subspace

where () are linearly independent vectors. That is we determine the linearly independent vectors such that

and

The basic idea of the classical Gram-Schmidt-method is the following:

Let and . Assume that vectors are already computed and orthonormal. Assume that vector is such that , that is holds for . Since are orthonormal, () and . After normalisation we obtain .

The algorithm is formalised as follows.

CGS-Orthogonalization( )

  1  
                           FOR
                         
                         
                        
                           TO
                         
                           2    
                           DO
                         
                        
                           FOR
                         
                         
                        
                           TO
                         
                           3       
                           DO
                         
                           4             5          6          7  
                           RETURN
                         
                         
                     

The algorithm overwrites vectors by the orthonormal vectors . The connection with the -decomposition follows from the relation . Since

we can write that

The numerically stable MGS method is given in the following form

MGS-Orthogonalisation( )

  1  
                           FOR
                         
                         
                        
                           TO
                         
                           2    
                           DO
                         
                           3          4       
                           FOR
                         
                         
                        
                           TO
                         
                           5          
                           DO
                         
                           6                7  
                           RETURN
                         
                         
                     

The algorithm overwrites vectors by the orthonormal vectors . The MGS method is more stable than the CGS algorithm. Björck proved that for the computed matrix satisfies

where is the unit roundoff.

The -method.

Today the -method is the most important numerical algorithm to compute all eigenvalues of a general matrix. It can be shown that the -method is a generalisation of the power method. The basic idea of the method is the following: Starting from we compute the sequence , where is orthogonal, is orthogonally similar to () and the lower triangular part of tends to a diagonal matrix, whose entries will be the eigenvalues of . Here is the orthogonal factor of the -decomposition . Therefore . The basic algorithm is given in the following form.

QR-Method( )

  1     2      3  
                           WHILE
                         
                        exit condition = FALSE   4    
                           DO
                         Compute the -decomposition    5          6          7  
                           RETURN
                         
                         
                     

The following result holds.

Theorem 12.34 (Parlett) If the matrix is diagonalisable, , the eigenvalues satisfy

and has an -decomposition, then the lower triangular part of converges to a diagonal matrix whose entries are the eigenvalues of .

In general, matrices do not necessarily converge to a given matrix. If has eigenvalues of the same modulus, the form of matrices converge to the form

where the entries of the submatrix denoted by do not converge. However the eigenvalues of this submatrix will converge. This submatrix can be identified and properly handled. A real matrix may have real and complex eigenvalues. If there is a complex eigenvalues, than there is a corresponding conjugate eigenvalue as well. For pairs of complex conjugated eigenvalues is at least . Hence the sequence will show this phenomenon .

The -decomposition is very expensive. Its cost is flops for general matrices. If has upper Hessenberg form, the cost of -decomposition is flops.

Definition 12.35 The matrix has upper Hessenberg form, if

The following theorem guarantees that if has upper Hessenberg form, then every of the -method has also upper Hessenberg form.

Theorem 12.36 If has upper Hessenberg form and , then has also upper Hessenberg form.

We can transform a matrix to a similar matrix of upper Hessenberg form in many ways. One of the cheapest ways, that costs about flops, is based on the Gauss elimination method. Considering the advantages of the upper Hessenberg form the efficient implementation of the -method requires first the similarity transformation of to upper Hessenberg form.

The convergence of the -method, similarly to the power method, depends on the quotients . The eigenvalues of the shifted matrix are . The corresponding eigenvalue ratios are . A proper selection of can fasten the convergence.

The usual form of the -method includes the transformation to upper Hessenberg form and the shifting.

Shifted- -Method( )

  1  ( is of upper Hessenberg form)   2     3  
                           WHILE
                         
                        exit condition = 
                           false
                           4    
                           DO
                         compute the -decomposition    5          6          7  
                           RETURN
                         
                         
                     

In practice the -method is used in shifted form. There are various strategies to select . The most often used selection is given by .

The eigenvectors of can also be determined by the -method. For this we refer to the literature.

Exercises

12.3-1 Apply the power method to the matrix with the initial vector . What is the result of the th step?

12.3-2 Apply the power method, the inverse power method and the -method to the matrix

12.3-3 Apply the shifted -method to the matrix of the previous exercise with the choice ( is fixed).