The basic Schwarz domain decomposition iteration converges slowly because, for practical reasons, in computation fluid dynamics minimal overlap is usually used. Therefore, some acceleration procedure for this iterative procedure is needed. Krylov subspace methods are frequently used to accelerate domain decomposition methods, see for example [1] and many of the papers on iterative substructuring methods in [15,7,8,16,19]. This section presents both the acceleration procedure used with accurate solution of subdomains and the procedure used with inaccurate solution of subdomains.
Basically, the domain decomposition iteration is of the following form
with A the global discretization matrix over all domains and an approximation to the inverse of the block diagonal or block lower-diagonal matrix of A. The matrix N is called the block Jacobi and Gauss-Seidel matrix of A respectively. The matrix A is divided into blocks where each block corresponds with all unknowns in a single subdomain. For instance, for a decomposition into 3 blocks we have
In [5] we assumed that the subdomains were solved accurately so that is the exact inverse of the block diagonal or block lower-diagonal matrix of A, so
with the Gauss-Seidel version and the Jacobi version of N. The Gauss-Seidel version is suitable for implementation on a single processor and leads to the sequential or multiplicative algorithm. The Jacobi version is suitable for parallelization and is called the parallel or additive version.
It can be seen that the left-hand side of (2) only depends on the values of near an interface. This means that if we split the vector u into vectors w en v with w the non-interface unknowns and v the interface unknowns, see Figure 1, we have
with an injection operator from v into . From (5), it follows that so that the non-interface unknowns do not contribute to the computation of in (2).
Figure 1: Interface variables v and non-interface variables w for
a 5-point discretization stencil and a decomposition into
two blocks
By substituting (5) into (2) and by premultiplying with we get
Since we are interested in the stationary solution v of this iteration process we get
which is equivalent to
Therefore, accurate solution of subdomains finally leads to a system concerning only the interface equations. Accelerated domain decomposition in [5] amount to solving the interface equations (8) using GMRESR [24]. The required matrix-vector product can be computed by doing one domain decomposition iteration, see [5] for details.
Inaccurate solution of subdomains implies that we no longer take the exact inverse of N but use some approximation to N, so
with the Gauss-Seidel (sequential) version and the Jacobi (parallel) version of . In the literature, much focus is on parallel algorithms. However, parallel implementations are not immediately available and one can imagine situations where parallel execution is not possible. Suppose for instance that there is only one workstation to do computations on. Therefore, to obtain an efficient sequential domain decomposition algorithm we pay much attention the the Gauss-Seidel version of N.
The matrix vector product of is computed like
Here represents the approximate solution in subdomain i up to a low accuracy using GMRES. A special case occurs when we take to be some incomplete LU factorization of . This paper also investigates this case for ILUD factorization, see further on.
The GMRES subdomain solution implicitly constructs a polynomial of the subdomain matrix such that the final residual is minimal in the norm. Specifically, with initial guess and right-hand side , we get for the final subdomain solution . Since the polynomial, depends on both the required accuracy and the right-hand side (initial residual), the matrix can be different for each v. Therefore, the GMRES acceleration procedure cannot be used since the preconditioner varies in each step.
The GCR [14] method can be easily adapted to cope with variable preconditioners. Because of its simplicity and for completeness, we derive the GCR method here. The GCR method is based on maintaining two subspaces, a subspace for storing the search directions and a subspace with . In every operation of GCR the property is preserved. The GCR method minimizes the residual over . Clearly, if the form an orthonormal basis, we can obtain the solution by projecting onto the space . So we must find such that for , therefore,
Since we have
and by substituting (12) into (11) we get so that
Since , we have
so that . This gives and with we get . The GCR algorithm proceeds by choosing a new search direction (preferably such that approximates the residual ) and computes the vector . A modified Gram-Schmidt procedure is used to make orthogonal to (). The same linear combinations of vectors are applied to the space of search directions to ensure that still holds for all i. Figure 2 shows the resulting GCR algorithm.
Figure 2: The GCR algorithm with general search directions without
restart and with a relative stopping criterion [24]
For the special case of the search direction , we obtain the classical GCR algorithm, which is equivalent to GMRES [23]. For this choice of search direction, the space is called the Krylov space. The difference between GCR and GMRES is that, at the benefit of allowing more general search directions, GCR requires twice the storage of GMRES and times the number of floating point operations for orthogonalization. However GCR can be combined with truncation strategies, for instance the Jackson & Robinson [17] strategy, whereas GMRES can only be restarted. Because of this, GCR may converge faster than GMRES.
Recent developments have led to a more flexible GMRES algorithm which allows more general search directions, so called FGMRES [22]. Also, the amount of work in GCR can be reduced to approximately that of GMRES if restarted GCR is used, see [25]. The resulting algorithm is comparable to FGMRES both in memory requirements and work. The GCR method achieves the same goal as FGMRES in a more understandable way.
The present paper considers only restarted GCR to compare with subdomain solution (which uses restarted GMRES). The optimizations of [25] are not used in this paper but will certainly be considered for domain decomposition for the incompressible Navier-Stokes equations.
If we compute the search direction using some GMRES iterations for solving we obtain the GMRESR [24] algorithm. In the present paper, we use . If the subdomains are solved (inaccurately) using GMRES, this method reduces to GMRESR for the single domain case. If the subdomains are approximately solved using some ILU factorization we obtain a blocked version of the subdomain ILU postconditioning called IBLU, which was investigated for parallel implementation in (Incomplete Block LU) [13,18,10,11]. In this paper we also investigate the sequential version of IBLU. For a single domain, this method is equivalent to GMRES with an ILU postconditioner.
For the special case where corresponds to incomplete LU factorization, the preconditioner is constant and the GMRES acceleration procedure may also be applied. The only difference between the general method, based on GCR, and the GMRES acceleration is that GMRES requires less vector updates and less memory. Both methods of acceleration will be considered for IBLU postconditioning.
An important remark is that the stopping criterion for accurate solution differs from that for inaccurate solution. With accurate solution, the stopping criterion is based on the preconditioned residual of only the interface unknowns. On the other hand, with inaccurate solution, it is based on the unpreconditioned residual r = f - A u of all unknowns. Therefore, a comparison between the two methods is difficult since differences in computing time can either be caused by a difference in convergence behavior or by the difference in the definition of the residual. In this paper, we assume that the difference in definition of the residual does not give different accuracies of the final solution when using the relative stopping criterion .