+ All Categories
Home > Documents > DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti...

DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti...

Date post: 31-May-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
48
DFG-Schwerpunktprogramm 1324 Extraktion quantifizierbarer Information aus komplexen Systemen” Multilevel preconditioning for sparse optimization of functionals with nonconvex fidelity terms S. Dahlke, M. Fornasier, U. Friedrich, T. Raasch Preprint 155
Transcript
Page 1: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

DFG-Schwerpunktprogramm 1324

”Extraktion quantifizierbarer Information aus komplexen Systemen”

Multilevel preconditioning for sparse optimizationof functionals with nonconvex fidelity terms

S. Dahlke, M. Fornasier, U. Friedrich, T. Raasch

Preprint 155

Page 2: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

Edited by

AG Numerik/OptimierungFachbereich 12 - Mathematik und InformatikPhilipps-Universitat MarburgHans-Meerwein-Str.35032 Marburg

Page 3: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

DFG-Schwerpunktprogramm 1324

”Extraktion quantifizierbarer Information aus komplexen Systemen”

Multilevel preconditioning for sparse optimizationof functionals with nonconvex fidelity terms

S. Dahlke, M. Fornasier, U. Friedrich, T. Raasch

Preprint 155

Page 4: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

The consecutive numbering of the publications is determined by theirchronological order.

The aim of this preprint series is to make new research rapidly availablefor scientific discussion. Therefore, the responsibility for the contents issolely due to the authors. The publications will be distributed by theauthors.

Page 5: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

Multilevel preconditioning for sparse optimization

of functionals with nonconvex fidelity terms∗

S. Dahlke, M. Fornasier, U. Friedrich, T. Raasch

Abstract

This paper is concerned with the development of numerical schemesfor the minimization of functionals involving sparsity constraints andnonconvex fidelity terms. These functionals appear in a natural wayin the context of Tikhonov regularization of nonlinear inverse prob-lems with �1 penalty terms. Our method of minimization is basedon a generalized conditional gradient scheme. It is well–known thatthese algorithms might converge quite slowly in practice. Therefore,we propose an acceleration which is based on a decreasing threshold-ing strategy. Its efficiency relies on certain spectral properties of theproblem at hand. We show that under certain boundedness and con-traction conditions the resulting algorithm is linearly convergent toa global minimizer and that the iteration is monotone with respectto the functional. We study important classes of operator equationsto which our analysis can be applied. Moreover, we introduce a cer-tain multilevel preconditioning strategy which in practice promotes theaforementioned spectral properties for problems where the nonlinearityis a perturbation of a linear operator.

MSC 2010: 65K10, 65J15, 41A25, 65N12, 65T60, 47J06, 47J25.KeyWords: Conditional gradient method, non-convex optimization, sparseminimization, (nonlinear) operator equations, iterative thresholding, multi-level preconditioning, wavelets.

1 Introduction

The aim of this paper is to derive an efficient numerical algorithm for theglobal minimization of functionals of the form

Γα(u) :=��K(u)− y

��2Y+ 2�u��1,α(J ), u ∈ �2(J ), (1)

where K : �2(J ) → Y is a nonlinear, continuously Frechet differentiableoperator acting between the sequence space �2(J ) over the countable index

∗December 13, 2013. This work has been supported by Deutsche Forschungsgemein-schaft (DFG), grant DA 360/12-2 and the LOEWE Center for Synthetic Microbiology(Synmikro), Marburg.

1

Page 6: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

set J and a separable Hilbert space Y . Here y ∈ Y is a given datum, and�u��1,α(J ) :=

�µ∈J αµ|uµ| denotes the weighted �1-norm of u with respect

to a positive weight sequence α ∈ RJ+ . We shall assume that there exists

α > 0 such that αµ ≥ α for all µ ∈ J . Whenever the index set J is fixedand clear from the context, we will drop it in the notation and simply write�2 and �1,α, respectively.

Typical examples where minimization problems of the form (1) arise areTikhonov regularizations of nonlinear operator equations

K(u) = y (2)

when the forward operator K : X → Y maps a separable Hilbert spaceX into Y . We refer, e.g., to [16] for a detailed discussion of Tikhonovregularization schemes. If the unknown solution is guaranteed to have asparse expansion with respect to some suitable countable Riesz basis Ψ :={ψµ}µ∈J forX, it makes sense to utilize that the weighted �1-norm promotessparse solutions. Denoting linear synthesis operator associated to Ψ with

u =�

µ∈Juµψµ =: F(u), u ∈ �2(J ),

and setting K := K ◦ F , the minimization of (1) will produce a sparselypopulated coefficient array u with K(u) ≈ y. The modeling motivationis the search of the “simplest” (in this case modeled by the “sparsest”)explanation to the given datum y, resulting from the nonlinear process K,in the spirit of the Occam’s razor. Moreover, it is known that, under certainsmoothness conditions, the global minimizers of (1) are regularizers for theproblem.

By now there is a vast literature concerning sparse regularization ofnonlinear inverse problems [2,3,19,23]. For most of the results in the litera-ture related to minimizing algorithms for functionals of the type (1) usuallyonly convergence to critical points is shown. Unfortunately, differently fromglobal minimizers, nothing is really known concerning the regularizationproperties of critical points, significantly questioning the relevance of suchconvergence results.

The starting point of our present discussion is a generalized conditionalgradient method which is known to guarantee the computation of subse-quences converging to critical points of (1). The scope of this paper is toshow under which sufficient conditions on K one may expect to have linearconvergence of a suitable modification of this algorithm towards a globalminimizer, hence guaranteeing regularization properties.

Several authors have independently proposed such an algorithm, see [14,17,20,21] for the case of linear operators K and [2,3] for the generalizationto the nonlinear case. The general setting can be described as follows. One

2

Page 7: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

introduces an auxiliary parameter λ ∈ R+, and considers the splitting

Γα(u) = �K(u)− y�2Y − λ�u�2�2� �� �=:Γ

(1)λ (u)

+λ�u�2�2 + 2�u��1,α� �� �=:Γ

(2)λ,α(u)

. (3)

Then Γ(1)λ is continuously Frechet differentiable and Γ

(2)λ,α is convex, lower

semicontinuous, and coercive with respect to � ·��2 , so that all the necessaryproperties to set up a generalized conditional gradient method are satisfied.The algorithm is given by

Algorithm 1.1. 1. Choose u(0) ∈ �1,α; n := 0;2. Determine descent direction v(n)

v(n) ∈ arg minv∈�2

�2��K �(u(n))

�∗�K(u(n))− y

�− λu(n),v��2

+ λ�v�2�2 + 2�v��1,α�;

(4)

3. Determine step size s(n)

s(n) ∈ arg mins∈[0,1]

�Γα(u

(n) + s(v(n) − u(n)))�; (5)

4. Set u(n+1) := u(n) + s(n)(v(n) − u(n)); n := n+ 1; return to step 2.

Here�K �(u(n))

�∗ ∈ L(Y, �2) denotes the adjoint mapping of K �(u(n)) ∈L(�2, Y ). We refer to [2] for a detailed discussion and convergence analysisof Algorithm 1.1. If the parameter λ is chosen large enough, it is possibleto choose s(n) = 1 and to omit the third step of the algorithm, see Lemma2.4 in [2]. Throughout this paper we always make this assumption, hencewe focus on the minimization problem (4) in the following. Observe that byexpanding the quadratic term below, (4) is equivalent to

arg minv∈�2

��v −

�u(n) − 1

λ

�K �(u(n))

�∗�K(u(n))− y

���2�2 + 2�v��1,α

λ

�. (6)

The minimizer of such a functional combining an �2-norm fidelity term andweighted �1-norm penalization can be directly computed using a soft thresh-olding operation, see [4, 13]. By defining

Sα(x) :=

x− α, x > α,

0, |x| ≤ α,

x+ α, x < −α,

and Sα(u)µ := Sαµ(uµ) it holds that

Sα(a) = arg minv∈�2

�v − a�2�2 + 2�v��1,α . (7)

3

Page 8: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

Consequently, through (6), we obtain that (4) is uniquely solved by

v(n) = Sαλ

�u(n) +

1

λ

�K �(u(n))

�∗�y −K(u(n))

��. (8)

This explains why Algorithm 1.1 is also known as the iterated soft thresh-olding algorithm (ISTA) or the thresholded Landweber iteration.

The convergence of Algorithm 1.1 for nonlinear operators K was studiedin [3]. There it was shown that the sequence

�u(n)

�n∈N has subsequences

which are guaranteed to converge to a stationary point u∗ of (1), i.e.,

u∗ ∈ arg minv∈�2

�2��K �(u∗)

�∗�K(u∗)− y

�− λu∗,v��2 + λ�v�2�2 + 2�v��1,α

�,

or equivalently 0 ∈�∂Γα

�(u∗). However, it is known from the linear case,

that the algorithm in its most basic form converges rather slowly. Strategiesto accelerate the convergence of the method are necessary for its applicabil-ity. In [10] three of us considered the case of linear operatorsK and proposedto chose a decreasing thresholding strategy for the parameters α(n). In thisu∗ = u∗α is a global minimizer of (1). Moreover it has been possible to showthat the resulting scheme is guaranteed to converge linearly, under spectralconditions of K, the so-called restricted isometry property, see (60) below.Furthermore this property is obtainable for certain classes of operators bymeans of multilevel preconditioners, we refer to [10] for details. This paperis concerned with the generalization of this strategy to nonlinear operatorsK. That is, we are interested in the convergence analysis of the iteration

u(n+1) := S 1λα(n)

�u(n) +

1

λ

�K �(u(n))

�∗�y −K(u(n))

��, (9)

where α(n) ∈ RJ+ is an entrywise decreasing sequence with limn→∞α(n) = α

and α(n)µ , αµ ≥ α ∈ R+, µ ∈ J .

The basic convergence analysis is outlined in Section 2. Our analysis re-lies on two fundamental assumptions. We need that the operator K satisfiescertain boundedness and Lipschitz continuity conditions, see (22). More-over, we have to assume that the operator

T : �2 → �2, v �→ T (v) := v +1

λ

�K �(v)

�∗�y −K(v)

is a contraction on a sufficiently small ball around a critical point u∗ of thefunctional Γα, which will turn out to be the unique global minimizer there.Then the iteration is linearly convergent, i.e.,

�u∗ − u(n)��2 ≤ γn�u∗��2 , for some γ ≤ 1.

Moreover, the iteration is monotone in the sense that

Γα(n+1)(u(n+1)) < Γα(n)(u(n)), (10)

4

Page 9: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

provided that u(n) is not a critical point of Γα(n) . These properties arespecified in Theorem 2.4 which is the main result of this paper.

The local contraction condition (42) on T may be hard to verify. Inthe Sections 2.2 and 2.3, we discuss in detail two classes of operators whereit is satisfied. The first class consists of operators with bounded secondderivatives and first derivative that satisfies the contraction property. Thesecond class is given by nonlinear perturbations of linear operators satisfyingthe restricted isometry property (60). As already shown in [10] for largeclasses of linear operators K where (60) fails, it can actually be resumed bypreconditioning. Details will be outlined for the case of a nonlinear K whichis mild perturbation of a linear operator in Section 3.

The analysis in this paper is performed in an infinite dimensional setting.In this general setting, clearly the operator K and K � cannot be evaluatedexactly. Therefore, in Section 4, we discuss strategies to solve the infinitedimensional problem by turning it into a finite dimensional one and usingthe expected sparsity of the minimizer. If implementable approximations ofthe actions of K and K � up to prescribed tolerances are applied, then theresulting inexact, but implementable, version of the algorithm will be againlinearly convergent. If the underlying Riesz basis is of wavelet type, thenthe desired approximations are known in the literature for certain classes ofnonlinearities [8, 12, 18].

2 Convergence analysis

In this section we analyze the convergence properties of the iteration (9).As a first step we show that under relatively mild assumptions Γα(n)(u(n))decreases monotonically. It is known that in the case of constant thresh-olding parameters αn = α, n ∈ N, the sequence

�u(n)

�n∈N has a convergent

subsequence and every convergent subsequence converges to a stationarypoint of (1). However, we are in particular interested in the global mini-mizer of (1). Therefore, we prove that under more restrictive assumptionsand for decreasing thresholing parameters αn the iterates converge linearlyto the global minimizer of (1). In the remainder of this section we presentexamples of settings where our analysis can be applied. In Section 2.2, wedescribe how our assumptions can be fulfilled under smoothness conditionson the nonlinear operator K and its derivative. In Section 2.3 we presentthe important special case where K can be expressed as the sum of a linearoperator satisfying the restricted isometry property, and a small nonlinearperturbation.

2.1 A general convergence result

We are particularly interested in computing approximations with the small-est possible number of nonzero entries to solutions of (2). As a benchmark,

5

Page 10: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

we recall that the most economical approximations of a given vector v ∈ �2are provided by the best N -term approximations vN , defined by discardingin v all but the N ∈ N0 largest coefficients in absolute value. The error ofbest N -term approximation is defined as

σN (v) := �v − vN��2 . (11)

The subspace of all �2 vectors with best N -term approximation rates > 0, i.e., σN (v) � N−s for some decay rate s > 0, is commonly referredto as the weak �τ space �wτ (J ), for τ = (s+ 1

2)−1, which, endowed with

|v|�wτ (J ) := supN∈N0

(N + 1)sσN (v), (12)

becomes the quasi-Banach space (�wτ (J ), | · |�wτ (J )). Moreover, for any 0 <� ≤ 2− τ , we have the continuous embeddings �τ (J ) �→ �wτ (J ) �→ �τ+�(J ),justifying why �wτ (J ) is called weak �τ (J ). As before we omit the depen-dency on the index set J whenever it is clear from the context.

When it comes to the concrete computations of good approximationswith a small number of active coefficients, one frequently utilizes certainthresholding procedures. Here small entries of a given vector are simplydiscarded, whereas the large entries may be slightly modified. In this paper,we will make use of soft-thresholding that we already introduced in (7). Itis well-known that Sα is non-expansive for any α ∈ RJ

+ ,Moreover, for any fixed x ∈ R, the mapping β �→ Sβ(x) is Lipschitz

continuous with��Sβ(x)− Sβ�(x)

�� ≤ |β − β�|, for all β, β� ≥ 0. (13)

We readily infer the following technical estimate (for the proof we refer thereader to [10]).

Lemma 2.1. Assume v ∈ �2, α,β ∈ RJ+ such that 0 < α = min

�infµ αµ, infµ βµ

�,

and define Λα(v) :=�µ ∈ J : |vµ| > α

�. Then

��Sα(v)− Sβ(v)���2≤

�#Λα(v)

�1/2max

µ∈Λα(v)|αµ − βµ|. (14)

In the sequel, we shall also use the following support size estimate, whoseproof follows the lines of Lemma 5.1 in [6], more details are provided in [10].

Lemma 2.2. Let v ∈ �wτ and w ∈ �2 with �v − w��2 ≤ �. Assume α =(αµ)µ∈J ∈ RJ

+ and infµ αµ = α > 0. Then it holds

#supp Sα(w) ≤ #Λα(w) ≤4�2

α2+ 4C|v|τ�wτ α

−τ , (15)

where C = C(τ) > 0. In particular if v ∈ �0(J ) then the estimate is refined

#supp Sα(w) ≤ #Λα(w) ≤4�2

α2+ �v��0(J ). (16)

6

Page 11: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

For the analysis of the iteration (9), we will always assume that thedatum y ∈ Y is fixed and contained in a bounded set, i.e.,

�y�Y ≤ CY <∞. (17)

In this setting, we define the operator

T : �2 → �2, v �→ T (v) := v +1

λ

�K �(v)

�∗�y −K(v)

�. (18)

In the following we want to show the convergence of the algorithm (9)to stationary points of Γα and to estimate the rate of convergence. In orderto do that we shall in particular show that, under certain local contractionproperties of the operator T , the stationary point is actually unique in apredetermined ball around 0 and coincides with the global minimizer of Γα.First of all, we need to characterize the ball where the interesting stationarypoints should be searched.

To this end, we recall that for all α ∈ RJ+ , αµ ≥ α > 0, µ ∈ J , the

related energy functional Γα, defined by (1) is coercive, i.e., Γα(v)→∞ as�v��2 →∞. In particular this implies that

R := sup {�v��2 ,Γα(v) ≤ Γα(0)(0)} (19)

is finite and we define

B(R) := {u ∈ �2, �u��2 ≤ R}. (20)

Notice that for v ∈ �2 such that Γα(v) ≤ Γα(0)(0), we have

2α�v��2 ≤ 2�u��1,α(J ) ≤ Γα(v) ≤ Γα(0)(0),

hence,

R ≤ Γα(0)(0)

2α. (21)

For the remainder of this section we will make the following additionalstanding hypothesis. The operators K and K � are Lipschitz continuous onclosed bounded sets, i.e., for all closed and bounded O ⊂ �2 we assume

�K(u)−K(v)�Y ≤ CLipK (O)�u− v��2 , u,v ∈ O,

�K �(u)−K �(v)�L(�2,Y ) ≤ CLipK� (O)�u− v��2 , u,v ∈ O.

(22)

With a slight abuse of notation we denote the Lipschitz constants of K andK � on B(R) by CLip

K (R) and CLipK� (R), respectively. In particular (22) implies

that K and K � are bounded on closed and bounded sets. Indeed, let O ⊂ �2and v0 ∈ O. Then we may bound K by estimating

supv∈O

�K(v)�Y ≤ supv∈O

�K(v)−K(v0)�Y + �K(v0)�Y

≤ CLipK (O) sup

v∈O�v − v0��2 + �K(v0)�Y <∞,

(23)

7

Page 12: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

and K � by a similar estimate. We introduce the abbreviations

CbndK (R) := sup

v∈B(R)�K(v)�Y , Cbnd

K� (R) := supv∈B(R)

�K �(v)�L(�2,Y ). (24)

With these preliminaries, we can formulate the following proposition, whichgeneralizes Lemma 2.4 in [2].

Proposition 2.3. Suppose that (17) and (22) hold. For some λ0 > 0 andR as in (19) we define

R� := R+1

λ0Cbnd

K� (R)(CbndK (R) + CY ). (25)

Then �K(·)− y�2Y is locally Lipschitz. We choose in (3)

λ > λmin := max{λ0, CLipK� (R

�)(CbndK (R�) + CY ) + Cbnd

K� (R�)CLipK (R�)} (26)

and denote by�u(n)

�n∈N the iterates of the decreasing thresholding iteration

(9) starting from u(0) = 0. Then it holds

Γα(n+1)(u(n+1)) < Γα(n)(u(n)), (27)

provided that u(n) is not yet a critical point of Γα(n). Furthermore the iter-ates fulfill the bound

�u(n)��2 ≤ R, n ∈ N. (28)

Proof. We shall prove by induction over n that

�u(n)��2 ≤ R and Γα(n)(u(n)) ≤ Γα(0)(0). (29)

We will show that if λ is chosen according to (26) and u(n) �= u(n+1), whichis the case if u(n) is no critical point of Γα(n) , then this implies

Γα(n)(u(n+1)) < Γα(n)(u(n)). (30)

From the fact that α(n) decreases to α, together with (30) and (29) wewould obtain

Γα(u(n+1)) ≤ Γα(n+1)(u(n+1)) ≤ Γα(n)(u(n+1)) < Γα(n)(u(n)) ≤ Γα(0)(0).

By (19) this would also imply the validity of (29) for n→ n+ 1 and simul-taneously of (27) and (28) for all n ∈ N.

Notice that (29) in particular holds for n = 0. We begin by provingu(n+1) ∈ B(R�), where R� is defined in (25). We use the fact that softshrinkage is nonexpansive, together with (24) and (17) to estimate

�u(n+1)��2 ≤ �u(n) +1

λ

�K �(u(n))

�∗�y −K(u(n))

���2

≤ �u(n)��2 +1

λCbnd

K� (R)(CbndK (R) + CY ) ≤ R�.

(31)

8

Page 13: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

Hence it follows that u(n),u(n+1) ∈ B(R�). To prove (30) we shall use(4), reformulated for u(n) and Γα(u

(n)). In (3) we introduced the splitting

Γα(u) = Γ(1)λ (u)+Γ

(2)λ,α(u), where Γ

(1)λ is continuously Frechet differentiable.

The derivative of Γ(1)λ was already implicitly stated in (4) and may be refor-

mulated by means of T as follows

�Γ(1)λ

��(u)v = 2�

�K �(u)

�∗�K(u)− y

�− λu,v��2 = −2λ�T (u),v��2 . (32)

Recall that by means of (7), (6), and (32), the definition of u(n+1) in (9)can be reformulated as

u(n+1) = argminv∈�2

��v − T (u(n))�2�2 + 2�v��

1, 1λα(n)

= argminv∈�2

�− 2λ�T (u(n)),v��2 + λ�v�2�2 + 2�v��

1,α(n)

= argminv∈�2

��Γ(1)λ

��(u(n))v + Γ

(2)

λ,α(n)(v)�.

In particular it follows that

�Γ(1)λ

��(u(n))u(n+1) + Γ

(2)

λ,α(n)(u(n+1)) ≤

�Γ(1)λ

��(u(n))u(n) + Γ

(2)

λ,α(n)(u(n))

holds, which is equivalent to

�Γ(1)λ

��(u(n))(u(n+1) − u(n)) ≤ Γ

(2)

λ,α(n)(u(n))− Γ

(2)

λ,α(n)(u(n+1)). (33)

Next, we apply the fundamental theorem of calculus to Γ(1)λ and write

Γ(1)λ (u(n+1))− Γ

(1)λ (u(n))

=

� 1

0

�Γ(1)λ

��(u(n) + τ(u(n+1) − u(n)))(u(n+1) − u(n))dτ

=

� 1

0

��Γ(1)λ

��(u(n) + τ(u(n+1) − u(n)))−

�Γ(1)λ

��(u(n))

�(u(n+1) − u(n))dτ

+�Γ(1)λ

��(u(n))(u(n+1) − u(n)).

This, together with (33) and (32) yields

Γα(n)(u(n+1))− Γα(n)(u(n))

=�Γ(1)λ (u(n+1)) + Γ

(2)

λ,α(n)(u(n+1))

�−�Γ(1)λ (u(n)) + Γ

(2)

λ,α(n)(u(n))

≤� 1

0

��Γ(1)λ

��(u(n) + τ(u(n+1) − u(n)))−

�Γ(1)λ

��(u(n))

�(u(n+1) − u(n))dτ

=

� 1

02��K �(u(n) + τ(u(n+1) − u(n)))

�∗�K(u(n) + τ(u(n+1) − u(n)))− y

−�K �(u(n))

�∗�K(u(n))− y

�, (u(n+1) − u(n))��2dτ − λ�u(n+1) − u(n)�2�2 .

(34)

9

Page 14: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

Moreover, by the assumptions (22) on K and K �, we can estimate for u,v ∈B(R�)

��K �(u)

�∗(K(u)− y)−

�K �(v)

�∗(K(v)− y)��2

= ���K �(u)

�∗ −�K �(v)

�∗�(K(u)− y) +

�K �(v)

�∗�K(u)−K(v)

���2

≤ λmin�u− v��2 .

We apply this inequality for the special case u = u(n)+τ(u(n+1)−u(n)),v =u(n), to further estimate (34) as follows

Γα(n)(u(n+1))− Γα(n)(u(n))

≤� 1

02τλmin�u(n+1) − u(n)�2�2dτ − λ�u(n+1) − u(n)�2�2

= (λmin − λ)�u(n+1) − u(n)�2�2 .

(35)

Furthermore the right-hand side is negative if λ is chosen according to (26)and u(n) �= u(n+1), which is the case if u(n) is no critical point of Γα(n) , andthis shows (30) and concludes the proof.

Notice that we decided to start our iteration from u(0) = 0. On the onehand, this choice is motivated by the fact that a priori we do not dispose ofany information on potentially interesting stationary points and an arbitrarychoice of the initial iteration has to be made. On the other hand, as we willshow below, under certain assumptions, we will be able to identify in thisway the unique global minimizer of the functional Γα. As we are seeking forstationary points which are limits of the sequence

�u(n)

�n∈N of the iterates

of the decreasing thresholding iteration (9) starting from u(0) = 0, in viewof Proposition 2.3 we can assume without loss of generality that interestingstationary points u∗ belongs to the ball B(R). This assumption is not void,because a global minimizer u◦ of Γα necessarily has to lay in the ball B(R),because Γα(u

◦) < Γα(0). We shall also show below that all the iterates�u(n)

�n∈N are actually additionally confined within the ball

B := {v ∈ �2 : �u∗ − v��2 ≤ �u∗��2}, (36)

where u∗ is an arbitrary stationary point of Γα within B(R). Hence, underthe regularity assumption so far made for the operators K and K �, thereference domain of the iterations of the algorithm is B ∩ B(R). Withinthis setting we assume the following hypothesis: T satisfies the Lipschitzcondition

�T (u∗)− T (v)��2 ≤ CLipT �u∗ − v��2 , v ∈ B ∩B(R), y ∈ Y, �y�Y ≤ CY ,

(37)

10

Page 15: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

for any fixed stationary point u∗. (As we shall see below, such a condition isnot so strong as we shall apply it to only one stationary point.) Furthermorewe define for some fixed λ0 > 0 the analogue of (25) on B ∩B(R) , that is

R�� := R+1

λ0Cbnd

K� (B ∩B(R))(CbndK (B ∩B(R)) + CY ). (38)

Then, the following convergence theorem holds.

Theorem 2.4. Let u∗ be a stationary point of (1) that satisfies T (u∗) ∈ �wτfor some 0 < τ < 2. For some λ0 > 0 and R�� as in (38) we choose

λ > max{λ0,�CLip

K� (B ∩B(R��)(CbndK (B ∩B(R��)) + CY )

+ CbndK� (B ∩B(R��))CLip

K (B ∩B(R��))�}.

(39)

Furthermore let α(n),α ∈ RJ+ with α

(n)µ ≥ αµ ≥ α ∈ R+, µ ∈ J . We set

L :=4(CLip

T )2�u∗�2�2λ2α2

+ 4C|T (u∗)|τ�wτ�αλ

�−τ, (40)

with C is as in Lemma 2.2 and CLipT as in (37). Moreover we define the set

BL := {v ∈ �2 : �u∗ − v��2 ≤ �u∗��2 ,#suppv ≤ L}. (41)

Let us assume that there exists 0 < γ0 < 1, such that for all v ∈ BL andsupp(v) ⊂ Λ ⊂ J with #Λ ≤ 2L

��T (u∗)− T (v)

�|S∗∪Λ��2(S∗∪Λ) ≤ γ0�u∗ − v��2 , (42)

where S∗ := suppu∗. Then, for any γ0 < γ < 1, the sequence�u(n)

�n∈N

obtained by (9) fulfills

�u(n)

�n∈N ⊂ BL ∩B(R) (43)

and converges to u∗ at a linear rate

�u∗ − u(n)��2 ≤ �(n) := γn�u∗��2 , (44)

whenever the α(n) are chosen according to

maxµ∈J

|α(n)µ − αµ| ≤ λL− 1

2 (γ − γ0)�(n). (45)

Moreover, the iteration is monotone

Γα(n+1)(u(n+1)) < Γα(n)(u(n)),

provided that u(n) is not yet a critical point of Γα(n).

11

Page 16: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

Remark 2.5. Before proving this result, let us comment the following fun-damental implication: under the assumptions of Lipschitzianity (37) andlocal contraction property (42), the iterations

�u(n)

�n∈N of the algorithm

starting from u(0) = 0 must converge to any stationary point u∗ ∈ B(R),hence implying automatically its uniqueness! In fact, if there were anotherstationary point, it would also coincide with the limit of this sequence. Inparticular, the global minimizer u◦ of Γα necessarily lies in the ball B(R)and is a stationary point of Γα, and we have linear convergence of the iter-ates to u◦. Let us now address the proof of Theorem 2.4.

Proof. The proof is performed by induction over n. There is nothing toshow for n = 0. The first step is to prove that u(n+1) is indeed contained inBL. Let u

(n) ∈ BL ∩B(R), then since α(n) is decreasing to α it holds that

suppu(n+1) = supp S 1λα(n)

�T (u(n))

�⊂ supp S 1

λα

�T (u(n))

�. (46)

The Lipschitz property (37) implies that

�T (u∗)− T (u(n))��2 ≤ CLipT �u∗ − u(n)��2 .

By Lemma 2.2 we can estimate

# supp S 1λα

�T (u(n))

�≤ Λα

λ

�T (u(n))

≤4(CLip

T )2�u∗ − u(n)�2�2λ2α2

+ 4C|T (u∗)|�wτ�αλ

�−τ≤ L.

(47)

We conclude that # suppu(n+1) ≤ L. Let us denote S(n) = suppu(n), S∗ =suppu∗, and Λ(n) = S∗ ∪ S(n) ∪ S(n+1). Notice that #S(n) ∪ S(n+1) ≤ 2L.By the thresholding properties it is clear that after restriction to Λ(n)

u∗|Λ(n) = S 1λα(T (u

∗)|Λ(n)), (48)

andu(n+1)

|Λ(n) = S 1λα(n)(T (u

(n))|Λ(n)) (49)

hold. This, together with the nonexpansiveness of the soft-thresholding,Lemma 2.1 in the second inequality, and (42) together with # suppu(n+1) ≤

12

Page 17: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

L in the third inequality yields

�u∗ − u(n+1)��2= �u∗|Λ(n) − u(n+1)

|Λ(n) ��2(Λ(n))

= �S 1λα(T (u

∗)|Λ(n))− S 1λα(n)(T (u

(n))|Λ(n))��2(Λ(n))

≤ �S 1λα(T (u

∗)|Λ(n))− S 1λα(T (u

(n))|Λ(n))��2(Λ(n))

+ �S 1λα(T (u

(n))|Λ(n))− S 1λα(n)(T (u

(n))|Λ(n))��2(Λ(n))

≤ �T (u∗)|Λ(n) − T (u(n))|Λ(n)��2(Λ(n))

+

�#Λα

λ(T (u(n)))

�1/2

λ( maxµ∈Λα

λ(T (u(n)))

|αµ − α(n)µ |)

≤ γ0�u∗ − u(n)��2 +L1/2

λ( maxµ∈Λα(T (u(n)))

|αµ − α(n)µ |)

≤ γ0�(n) + (γ − γ0)�(n) = γ�(n) = �(n+1).

The last inequality is a consequence of induction hypothesis and (45). Thisproves u(n+1) ∈ BL. Obviously u

(n+1) ∈ B(R) because of the monotonicityof the iterations:

Γα(u(n+1)) ≤ Γα(n+1)(u(n+1)) ≤ Γα(n)(u(n+1)) < Γα(n)(u(n)) ≤ Γα(0)(0).

2.2 Nonlinear operators with bounded second derivatives

In this section we state smoothness conditions on the nonlinear operator Kwhich imply that the operator T defined in (18) fulfills (37) and (42). In thefollowing we assume that S∗ is the support of a global minimizer u∗ of Γα

in B(R). As discussed above, once we prove that T fulfills (37) and (42),then by Theorem 2.4 we automatically have that u∗ is actually the uniquestationary point of Γα in B(R).

Theorem 2.6. Let the data fulfill assumption (17). Assume that K is twicecontinuously differentiable on an open set that contains B and, together withits derivative K �, is bounded on B. Furthermore, assume that there exist0 < γ2 ≤ γ1 < 1 such that for all Λ ⊂ J ,#Λ ≤ 2L and ζ ∈ B, supp ζ ⊂S∗ ∪ Λ, the following local contraction property

��Id− 1

λ

�K �(ζ)

�∗K �(ζ)

�|S∗∪Λ×S∗∪Λ

�L(�2(S∗∪Λ),�2(S∗∪Λ)) ≤ γ2 (50)

holds. Moreover, let us assume that the uniform spectral gap condition

�� 1λ

�K ��(ζ)(·)

�∗�y−K(ζ)

��|S∗∪Λ×S∗∪Λ

�L(�2(S∗∪Λ),�2(S∗∪Λ)) ≤ γ1− γ2 (51)

13

Page 18: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

holds. Then T defined in (18) fulfills (37) and (42).

Proof. The proof is an application of the mean value theorem. In order tocompute the derivative of T we introduce the auxiliary operator

G : (u,v) �→ u+1

λ

�K �(u)

�∗�y −K(v)

�, (u,v) ∈ �2 × �2

and observe T = G ◦ (Id, Id)�. We compute

T �(ζ) z =�∂G∂u

,∂G

∂v

��(ζ, ζ)

�◦�Id, Id

��z

=�Id+

1

λ

�K ��(ζ)(·)

�∗�y −K(ζ)

�,− 1

λ

�K �(ζ)

�∗K �(ζ)(·)

�◦�Id, Id

��z

= z+1

λ

�K ��(ζ) z

�∗�y −K(ζ)

�− 1

λ

�K �(ζ)

�∗K �(ζ) z.

(52)Observe thatK : �2 → Y ,K � : �2 → L(�2, Y ), andK �� : �2 → L(�2,L(�2, Y )).Therefore K ��(ζ) z ∈ L(�2, Y ) holds. Consequently

�K ��(ζ) z

�∗ ∈ L(Y, �2),so that the composition in (52) is well defined. By our assumptions K,K �,and K �� are bounded on the bounded set B. This, together with (17) impliessupξ∈B �T �(ξ)�L(�2,�2) < ∞. Since B is convex we can use the mean valuetheorem to conclude the Lipschitz property (37). In order to prove (42) letv ∈ BL and supp(v) ⊂ Λ ⊂ J with #Λ ≤ 2L. Then, by (52), (50), and (51)the estimate

��T �(ζ)

�|S∗∪Λ×S∗∪Λ�L(�2(S∗∪Λ),�2(S∗∪Λ))

≤ ��Id− 1

λ

�K �(ζ)

�∗K �(ζ)

�|S∗∪Λ×S∗∪Λ

�L(�2(S∗∪Λ),�2(S∗∪Λ))

+ �� 1λ

�K ��(ζ)(·)

�∗�y −K(ζ)

��|S∗∪Λ×S∗∪Λ

�L(�2(S∗∪Λ),�2(S∗∪Λ))

≤ γ1

holds. The restriction B|S∗∪Λ := {u|S∗∪Λ,u ∈ B} of B onto the index setS∗ ∪ Λ is a convex set in �2(S

∗ ∪ Λ). Hence, we can apply the mean valuetheorem again to finalize the proof as follows

��T (u∗)− T (v)

�|S∗∪Λ��2(S∗∪Λ)

≤ supζ∈B|S∗∪Λ

��T �(ζ)

�|S∗∪Λ×S∗∪Λ�L(�2(S∗∪Λ),�2(S∗∪Λ))�u∗ − v��2

≤ γ1�u∗ − v��2 .

14

Page 19: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

2.3 Nonlinear perturbation of linear operators

In this section we discuss the validity of (42) for the special case that K isgiven by the sum of a linear operator and a nonlinear perturbation. To bespecific, we consider

Kσ = A+ σN, (53)

where σ ∈ R+, A ∈ L(�2, Y ) and N : �2 → Y is a nonlinear perturbationwith the property that

N : �2 → Y, and N � : �2 → L(�2, Y ),v �→ N(v), v �→ N �(v)

(54)

are Lipschitz continuous on closed bounded sets. Similarly to (22) and (24)we denote the respective Lipschitz constants and suprema on B(R) withCLip

N (R), CLipN � (R), Cbnd

N (R), and CbndN � (R).

We begin by deriving uniform bounds for those constants. We denote

R(σ) = sup {�v��2 , Γα,σ(v) ≤ Γα(0),σ(0)},

where Γα,σ is the functional Γα for K = Kσ depending on σ. Accordinglywe denote

R�(σ) := R(σ) +1

λ0Cbnd

K� (R(σ))(CbndK (R(σ)) + CY ).

We denote with C(σ) the set of critical points of Γα,σ in B(R(σ)). For anyu∗(σ) ∈ C(σ) we denote

L(u∗(σ)) :=4(CLip

T )2�u∗(σ)�2�2λ2α2

+ 4C|T (u∗(σ))|τ�wτ�αλ

�−τ. (55)

Lemma 2.7. Let the data fulfill assumption (17). Further let σ0 ∈ R+ andKσ, σ ∈ [0, σ0], be of the form (53). Suppose that the assumptions (54)hold. Then it holds that

R0 := supσ∈[0,σ0]

R(σ) <∞,

R�0 := sup

σ∈[0,σ0]R�(σ) <∞,

supσ∈[0,σ0]

CLipKσ

(R�(σ)) <∞,

supσ∈[0,σ0]

CLipK�

σ(R�(σ)) <∞.

(56)

Under the additional assumption

supσ∈[0,σ0]

supu∗(σ)∈C(σ)

|Tσ(u∗(σ))|�wτ <∞ (57)

15

Page 20: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

it further holds that

L0 := supσ∈[0,σ0]

supu∗(σ)∈C(σ)

L(u∗(σ)) <∞. (58)

Remark 2.8. Before proving this lemma, let us comment on the condition(57), requiring to consider a supremum over the set C(σ), which a priorican be very large. As we will show later, the boundedness of the quantitiesof this lemma and an additional spectral properties on the operator A, theso-called restricted isometry property, see formula (60) below, will imply theoperator T to fulfill (42). As already stated above, under this condition, theset C(σ) consists only of one point, i.e., the global minimizer of Γα,σ. Hencethe condition (57) will turn out to be much less restrictive as it seems at afirst glance. Let us now prove the lemma.

Proof. It is immediate to see that (21) implies R(σ) ≤ supσ�∈[0,σ0]

�Γα(0),σ� (0)

�2α .

Furthermore the term supσ�∈[0,σ0]

�Γα(0),σ�(0)

�is finite as

σ� �→ Γα(0),σ�(0) = �σ�N(0)− y�2Y ,

is continuous and bounded on [0, σ0], because of the assumptions (17) and(54). Hence, we conclude the boundedness of R0 in (56). By the assump-tion (54) we may bound the Lipschitz constant of Kσ on B(Rσ) as follows

CLipKσ

(R(σ)) ≤ �A�L(�2,Y ) + σ0CLipN (R0). (59)

The constant CLipK�

σ(R(σ)) may be bounded analogously. By the same reason-

ing as in (23) it follows that CbndKσ

(R(σ)) and CbndK�

σ(R(σ)) may be bounded

independently of σ ∈ [0, σ0]. This proves the existence of uniform bounds forCbnd

Kσ(R(σ)) and Cbnd

K�σ(R(σ)), and consequently of R�

0 in (56). Using assump-

tion (54) on B(R�0) allows us to estimate similarly to (59) uniform bounds

for CLipKσ

(R�(σ)) and CLipK�

σ(R�(σ)). It remains to prove the finiteness of L0

in (58). To this end observe that the Lipschitz property of K and K � onB(R0) imply that Tσ, defined in (18), is Lipschitz on B(R0) and that thecorresponding Lipschitz constants may be uniformly bounded in σ. The re-maining terms in (55) are bounded uniformly in σ by assumption (57) andthe estimate �u∗(σ)��2 ≤ R(σ) ≤ R0.

We are now able to state conditions under which the fundamental con-traction property (42) of the operators Tσ, defined by (18) can be ensureduniformly in σ for σ0 sufficiently small.

Lemma 2.9. Let the assumptions of Lemma 2.7 hold. Fix σ0 ∈ R+. Forall σ ∈ [0, σ0], we fix u∗(σ) ∈ C(σ) and denote S∗

σ := suppu∗(σ). We makethe assumption that the linear part A of Kσ fulfills the restricted isometryproperty

�(Id−A∗A)|Λ◦×Λ◦�L(�2(Λ◦),�2(Λ◦)) ≤ γ1 < 1, (60)

16

Page 21: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

for all Λ◦ ⊂ J with #Λ◦ ≤ 3L0. By using the notations as in (56), theconstant

C :=�A�L(�2,Y )CLipN (R0)

+ CLipN � (R0)

��A�L(�2,Y )R0 + σ0C

bndN (R0) + CY

+ CbndN � (R0)

��A�L(�2,Y ) + σ0C

LipN (R0)

�(61)

is bounded and for all σ ∈ [0,min(σ0, (1− γ1)C−1)) the following holds: Forall v ∈ B(R0) with #suppv ≤ L0, and supp(v) ⊂ Λ ⊂ J with #Λ ≤ 2L0,the contraction property

��Tσ(u

∗(σ))− Tσ(v)�|S∗

σ∪Λ��2(S∗

σ∪Λ) ≤ γ0�u∗ − v��2 , (62)

holds with γ0 := γ1 + σC < 1.

Proof. We begin by proving that the constant C in (61) is bounded. To thisend we apply Lemma 2.7 and observe that the Lipschitz property of N andN � on B(R0) implies similarly to (23) that N and N � are also bounded onB(R0).

Now fix σ ∈ [0, σ0] and let v ∈ B(R0), # suppv ≤ L0 and suppv ⊂ Λ ⊂J with #Λ ≤ 2L0 and denote Λ◦ := S∗

σ ∪ Λ.We use the splitting

Tσ(v)− Tσ(u∗(σ)) = v − u∗(σ)−A∗A(v − u∗(σ))− σA∗�N(v)−N(u∗(σ))

− σ�N �(v)−N �(u∗(σ))

�∗�(A+ σN)(v)− y

− σ�N �(u∗(σ))

�∗�(A+ σN)(v)− (A+ σN)(u∗(σ))

together with the assumption (60) to estimate

��Tσ(v)− Tσ(u

∗(σ))�|Λ◦��2(Λ◦)

≤ ��(Id−A∗A)(v − u∗(σ))

�|Λ◦��2(Λ◦) + σ

���A∗�N(v)−N(u∗(σ))

��|Λ◦��2(Λ◦)

+ ���N �(v)−N �(u∗(σ))

�∗�(A+ σN)(v)− y

��|Λ◦��2(Λ◦)

+ ���N �(u∗(σ))

�∗�(A+ σN)(v)− (A+ σN)(u∗(σ))

��|Λ◦��2(Λ◦)

≤�γ1 + σ

��A�L(�2,Y )C

LipN (R0)

+ CLipN � (R0)

��A�L(�2,Y )R0 + σ0C

bndN (R0) + CY

+ CbndN � (R0)

��A�L(�2,Y ) + σ0C

LipN (R0)

����v − u∗(σ)��2

=�γ1 + σC

��v − u∗(σ)��2 ,

which implies that the contraction property (62) holds for σ < (1− γ1)C−1.

17

Page 22: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

The last lemma established the contraction property (62) uniformly inσ. Therefore, for the current choice of K = Kσ as in (53), we are able toapply directly Theorem 2.4. Let us summarize the result as follows.

Theorem 2.10. Let the assumptions of Lemma 2.9 hold. Then, for allσ ∈ [0,min(σ0, (1 − γ1)C

−1)) and γ0 < γ < 1, if we choose�α(n)

�n∈N

according to (45) and λ such that

λ > max{λ0, CLipK�

σ(R�

0)(CbndKσ

(R�0) + CY ) + Cbnd

K�σ(R�

0)CLipKσ

(R�0)},

the sequence�u(n)(σ)

�n∈N defined by (9) with initial guess u(0) = 0 satisfies

�u(n)(σ)

�n∈N ⊂ B(R0). (63)

Furthermore it converges to any u∗(σ) ∈ C(σ) at a linear rate, i.e.,

�u∗(σ)− u(n)(σ)��2 ≤ γn�u∗(σ)− u(0)(σ)��2 , (64)

and moreover

Γα(n+1),σ(u(n+1)(σ)) < Γα(n),σ(u

(n)(σ)),

provided that u(n)(σ) is not yet a critical point of Γα(n),σ. In particu-lar u∗(σ) ∈ C(σ) has to be the only critical point of Γα,σ in B(R0) with#suppu∗(σ) ≤ L0, actually it is its unique global minimizer in B(R0).

3 Preconditioning

The convergence analysis in Section 2 for the iteration (9) relies on thecontraction property (42) of the operator T defined in (18). This propertyalso ensures that, despite the fact that Γα is a nonconvex functional, it hasnevertheless a unique global minimizer in a prescribed ball centered at 0and that the iteration (9) is guaranteed to converge to it with linear rate.Unfortunately, we can not expect this powerful property to hold in general,even for the case where the underlying operator K is linear and compact.Therefore, in this section, we present how preconditioning can be appliedto promote property (42) for K being a nonlinear perturbation of a linearoperator. We have to imagine the action of this preconditioning as a sort of“stretching” of the functional Γα, so that no local minimizers or stationarypoints remain around 0 other than a unique global minimizer. Precondi-tioning also changes the topology of the minimization problem related to(1). Therefore, in Section 3.1, we begin by discussing the related topolog-ical issues. In Section 3.2 we present a preconditioning strategy and stateconditions under which the restricted isometry property (60) will be satis-fied. Finally in Section 3.3 we apply our findings to an interesting class ofoperators.

18

Page 23: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

3.1 General setting

We shall consider the following modified functional�Γα ◦D−1

�(z) = �(K ◦D−1)(z)− y�2Y + 2�D−1z��1,α , z ∈ Ran(D), (65)

where D : �2 → Ran(D) is a suitable preconditioning matrix with welldefined formal inverse D−1 : Ran(D) → �2. Moreover we assume that Dmaps finitely supported vectors on finitely supported vectors and that

�D−1z��1,α ∼ �diag(D−1)z��1,α . (66)

Note, that preconditioning of the energy functional (1) changes the topol-ogy of the associated minimization problem. Moreover, the preconditioningoperator D may be unbounded in the topology of �2. However, as we will seebelow, this is not an issue here. Indeed, Theorem 4.3, which will be provedlater in Section 4, enables us to reduce the setting to a finite dimensionalone whenever needed, so that we can use the equivalence of norms on finitedimensional vector spaces.

To this end we begin with the observation that any stationary point u∗

of (1) can be characterized by the subdifferential inclusion

0 ∈ ∂�Γα

�(u∗).

An analogous characterization holds for the stationary points z∗ of (65). Bythe chain rule for subdifferentials, see [15, Proposition I.5.7], we have

0 ∈ ∂�Γα ◦D−1

�(z∗) =

�D−1

��∂�Γα

�(D−1z∗),

where�D−1

��is the dual mapping of D−1. In other words, there is a one-

to-one relationship of the stationary points of (1) and (65). Moreover, byour assumptions on D, if u∗ is a finitely supported stationary point of (1),the related stationary point z∗ = Du∗ of (65) is also finitely supported.

We will use the assumption (66) to simplify the preconditioned energyfunctional Γα◦D−1. Indeed, motivated by the observation that � diag(D−1)z��1,α =�z��1,diag(D−1)α

and with a slight abuse of notation we will consider the mod-

ified energy functional

ΓDα(z) := �(K ◦D−1)(z)− y�2Y + 2�z��1,diag(D−1)α

, z ∈ Ran(D), (67)

and the resulting minimization problem.We avoid to deal with the topology of Ran(D) in the following way. Let

z∗ be a fixed stationary point of the preconditioned energy functional (67)of finite support and Λ0 ⊂ J an arbitrary finite set such that supp z∗ ⊂ Λ0.The restriction of (67) onto Λ0 is then given by

ΓDα,Λ0

(z) := �(K ◦D−1)|Λ0(z)− y�2Y + 2�z��1,(diag(D−1)α)|Λ0

(Λ0), z ∈ RΛ0 .

(68)

19

Page 24: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

The minimization problem can now be considered in RΛ0 endowed withthe Euclidean norm. We denote the restriction of z∗ onto Λ0 by z∗|Λ0

, by

EΛ0 : �2(Λ0) → �2 the trivial extension by 0, and by�EΛ0

��its adjoint,

being actually the restriction operator z∗|Λ0=

�EΛ0

��z∗. Then it follows by

the chain rule for subdifferentials that

0 ∈�EΛ0

��∂�ΓDα

�(z∗) =

�EΛ0

��∂�ΓDα

�(EΛ0z

∗|Λ0

)

= ∂�ΓDα ◦ EΛ0

�(z∗|Λ0

)

= ∂�ΓDα,Λ0

�(z∗|Λ0

).

Consequently z∗|Λ0is also a stationary point of the finite dimensional energy

functional (68). Unfortunately, the vice versa is not valid, because�EΛ0

��is

not injective and a stationary point for ΓDα,Λ0

does not necessarily corresponda priori to the restriction to a finite dimensional set Λ0 of a stationary pointof ΓD

α in Ran(D).Nevertheless, if one could assume that ΓD

α,Λ0has actually only one crit-

ical point in RΛ0 for any choice of Λ0 ⊂ J finite, then we can argue theuniqueness of the critical point of ΓD

α in Ran(D) as well. In fact, if therewere two critical points z∗1 and z∗2 for ΓD

α in Ran(D), their support couldbe included in a finite set Λ�

0 of indexes. Without loss of generality this setcould be assumed to be a subset of Λ0 for the latter large enough. Hence,the assumed uniqueness of the critical point in RΛ0 for ΓD

α,Λ0would immedi-

ately imply that (z∗1)Λ0 = (z∗2)Λ0 or, equivalently, that z∗1 = z∗2. In turn this

means that, in the situation of a unique critical point in finite dimensions,the minimization of the finite dimensional problem is actually equivalent tothe minimization of the infinite dimensional one.

In Section 4 we will present an implementable numerical scheme, whichsolves the finite dimensional minimization problem related to (68). We shallalso show that a priori knowledge of the set Λ0 is not needed. In fact, it willbe constructed on the fly by the presented adaptive scheme.

3.2 Multilevel preconditioning

In Section 2.3 we considered the case that K consists of a dominant lin-ear part A and a nonlinear perturbation. In this setting, we were able toshow that the contraction assumption (42) can be guaranteed if the linearpart of the equation fulfills the restricted isometry property (60). In gen-eral this condition will fail to hold, even if A is a compact linear operator.Nevertheless, in this section, we show that this issue can be solved by apreconditioning strategy. To this end, we partly follow the lines of [10] andrecall the corresponding results as far as they are needed for our purposes.

In the following we will assume that Ω ⊂ Rd is a bounded Lipschitzdomain and Ψ := {ψµ}µ∈J is a compactly supported wavelet basis or frame

20

Page 25: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

of wavelet type for L2(Ω), see e.g. [5, §2.12]. Every µ ∈ J is of the formµ = (j, k, e), where j ∈ Z is the scale, often denoted as |µ|, k ∈ Zd isthe spatial location and e is the type of ψµ. We refer to [5, 7] for furtherdetails concerning this notation. We do not go into construction detailsconcerning these bases or the alternative of wavelet frames. In fact, wesimply assume the following properties for all µ ∈ J . Furthermore, for theease of presentation, we formulate them for the case of an orthogonal waveletbasis on Ω = (0, 1)d:

(W1) The support Ωµ := suppψµ fulfills |Ωµ| ∼ 2−|µ|d. Furthermore thereexists a suitable cube Q, centered at the origin, s.t., Ωµ ⊂ 2−|µ|k +2−|µ|Q, see [5, §2.12].

(W2) The basis has the cancellation property�Ω ξ

βψµ(ξ)dξ = 0, |β| =0, . . . , d∗ ∈ N.

(W3) �ψµ�L∞(Ω) ≤ C2d/2|µ|.

Examples of wavelet bases satisfying these conditions can be found in [11].In this setting the synthesis map related to Ψ reads as

F : �2 → L2(Ω), F(u) :=�

µ∈Juµψµ, u ∈ �2. (69)

Its adjoint is given by

F∗ : L2(Ω)→ �2, F∗(u) := (�u, ψµ��2)µ∈J . (70)

Let A ∈ L(L2(Ω), Y ) be a linear operator and consider its discretizationA := AF . In this section we aim at stating conditions under which (60) canbe ensured by means of a preconditioning strategy. We will make technicalassumptions on the matrix G =

�Gµ,ν

�µ,ν∈J given by

G := A∗A =��A∗Aψν , ψµ�L2(Ω)

�µ,ν∈J . (71)

To be specific, we will assume that there exist constants c1, c2, c3, s, η, r ∈R+, r > d, such that the following conditions hold for all µ = (j, k, e), ν =(j�, k�, e�) ∈ J :

• The entries of G satisfy the decay estimate

|Gµ,ν | ≤ c12−s||µ|−|ν||2−ηmin(|µ|,|ν|)

�1 + 2min(|µ|,|ν|) dist(Ωµ,Ων)

�r (72)

• On the diagonal, i.e., µ = ν, it holds that

|Gµ,µ| ≥ c22−η|µ|. (73)

21

Page 26: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

• For the same scale, i.e., |µ| = |ν|, the entries satisfy

|Gµ,ν | ≤ c32−2η|µ|

(1 + |k − k�|)r . (74)

Under these conditions the following holds.

Theorem 3.1 [10, Thm.4.6.]. Suppose that G fulfills (72), (73), and (74)with c2 > c3/(r − d). Let D be the block-diagonal matrix consisting of thediagonal level blocks of G, i.e.,

Dµ,ν :=

�Gµ,ν |µ| = |ν|,0 otherwise.

(75)

Then there exists a constant C = C(c1, c2, c3, r, d) such that for each finiteset Λ ⊂ J with |Λ| ≤ 2sC−1 the sub-matrix (D−1/2GD−1/2)|Λ×Λ satisfies

�(Id−D−1/2GD−1/2)|Λ×Λ� < C 2−(s− η2)|Λ|

and

κ�(D−1/2GD−1/2)|Λ×Λ

�≤ 1 + C 2−(s− η

2)|Λ|

1− C 2−(s− η2)|Λ|

.

3.3 Integral operators with Schwartz kernels on disjoint do-mains

In this section we study a class of operators which fits into the settingof Section 2.3. Let Ω, Ω ⊂ Rd be two bounded Lipschitz domains withdist(Ω, Ω) = δ > 0. For fixed t ∈ R+ we consider

K = A+ σN : L2(Ω)→ Ht(Ω),

where σ ∈ R+, A ∈ L(L2(Ω), Ht(Ω)) is linear, and N : L2(Ω) → Ht(Ω) is

a nonlinear operator. Furthermore, we assume that the linear part A is anintegral operator with a Schwartz kernel. To be specific, we assume that Ais given by

v �→ Av :=�

ΩΦ(·, ξ)v(ξ)dξ, (76)

where Φ : Ω× Ω→ R is a kernel of Schwartz type, i.e.,

|∂αx ∂βξ Φ(x, ξ)| ≤ cα,β|x− ξ|−(d+2t+|α|+|β|), α,β ∈ Nd, (77)

holds. Concerning the nonlinear perturbation N , we assume that it is givenby

v �→ N (v) :=

ΩΦ(·, ξ)|v(ξ)|2dξ,

22

Page 27: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

where Φ also fulfills (77). This condition implies that A and N are welldefined as operators mapping into Ht(Ω). Moreover, the nonlinear pertur-bation N is twice continuously differentiable and consequently indeed Nand N � are Lipschitz continuous on bounded closed sets: To see this, wewrite N = N1 ◦ N2 with

N1 : L1(Ω)→ Ht(Ω), N2 : L2(Ω)→ L1(Ω),

v �→�

ΩΨ(·, ξ)v(ξ)dξ, v �→ |v|2.

Here the operator N1 as well as the derivative of N2, i.e.,

N �2 : v �→ 2�v, ·�L2(Ω),

are linear. Recall that the synthesis map F associated to Ψ is given by (69).It is linear and hence Lipschitz. Together with the Lipschitz properties of N ,this implies that the discretized version of the nonlinear part, i.e., N = NF ,fulfills (54).

Let us now assume that the linear term A = AF of K = A + σN doesnot fulfill already (60). We want to show that setting

K ◦D = A ◦D + σN ◦D,

for a suitable preconditioning matrix D, will allow us now to fulfill it forA ◦ D. Moreover the new nonlinear perturbation N ◦ D will again satisfythe Lipschitz continuity conditions (54) as soon as we will remember that,eventually, the problem will be turned into a finite dimensional one. We shallconstruct the preconditioning matrix D by using the multilevel techniquespresented in Section 3.2. To be specific, the remainder of this section isdedicated to the proof of property (72) of the matrix G :=

�AF

�∗AF . (Theother required properties (73) and (74) may be difficult to be shown, butthey are often verified in practice.) To this end we follow the lines of [9]. Tobe explicit, with (71), the entries of G are given as

Gµ,ν = �A∗Aψν , ψµ�L2(Ω) = �Aψν ,Aψµ�Ht(Ω). (78)

We begin with the special case t = 0. and apply Taylor’s formula to thekernel Φ around a point ξ0 ∈ Ωµ. For every ξ ∈ Ωµ there exists θ ∈ [0, 1]such that

Φ(x, ξ) =�

|β|≤d∗

∂βξ Φ(x, ξ0)

β!(ξ−ξ0)β+

|β|=d∗+1

∂βξ Φ(x, ξ + θ(ξ − ξ0))β!

(ξ−ξ0)β.

(79)

23

Page 28: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

With (77) we can estimate

|�

|β|=d∗+1

∂βξ Φ(x, ξ + θ(ξ − ξ0))β!

(ξ − ξ0)β|

≤�

|β|=d∗+1

1

β!|(ξ − ξ0)β| sup

ξ�∈Ωµ

|∂βξ Φ(x, ξ�)|

≤�

|β|=d∗+1

1

β!|(ξ − ξ0)β|c0,β dist(x,Ωµ)

−(d+2t+d∗+1|).

(80)

The cancellation property (W2) of ψµ ∈ Ψ, together with (79) and (80)yields

|Aψµ(x)| = |�

Ωµ

Φ(x, ξ)ψµ(ξ)dξ|

≤�

|β|=d∗+1

1

β!c0,β dist(x,Ωµ)

−(d+2t+d∗+1)

Ωµ

|(ξ − ξ0)β||ψµ(ξ)|dξ.(81)

By our assumptions (W1) and (W3) on the wavelets, i.e., Ωµ ⊂ 2−|µ|k +2−|µ|Q and �ψµ�L∞(Ω) ≤ C2d/2|µ|, together with ξ0 ∈ Ωµ, it holds that

Ωµ

|(ξ − ξ0)β||ψµ(ξ)|dξ ≤ C2d2|µ|

Ωµ

|(ξ − ξ0)β|dξ

≤ C2d2|µ|

Q|(2−|µ|(ξ� + k)− ξ0)β|2−d|µ|dξ�

≤ C �2−|µ|( d2+|β|).

The combination of (81) and the last estimate implies

|Aψµ(x)| ≤ Cd∗ dist(x,Ωµ)−(d+2t+d∗+1)2−|µ|( d

2+d∗+1). (82)

Since we assumed that Ω and Ω are disjoint domains, it holds for ξ, ξ� ∈ Ωwith ξ �= ξ� and �ξ − x�2, �ξ� − x�2 ≥ δ that 1

|ξ−x||ξ�−x| ≤ Cx,δ1

|ξ−ξ�| . FurtherCx,δ may be bounded independently of x. With (82) we prove immediately,for µ �= ν with dist(Ωµ,Ων) > 0 the estimate

|�Aψµ,Aψν�L2(Ω)|

≤ (Cd∗)22−(|µ|+|ν|)( d

2+d∗+1)

Ω

�dist(x,Ωµ) dist(x,Ων)

�−(d+2t+d∗+1)dx

≤ (Cd∗)2|Ω|2−(|µ|+|ν|)( d

2+d∗+1) dist(Ωµ,Ων)

−(d+2t+d∗+1)

= (Cd∗)2|Ω|2

−||µ|−|ν||( d2+d∗+1)2−min(|µ|,|ν|)(d∗+1−2t)

�2min(|µ|,|ν|) dist(Ωµ,Ων)

�d+2t+d∗+1.

(83)

24

Page 29: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

For the case dist(Ωµ,Ων) = 0 we use again (82) and apply dist(x,Ωµ) ≥ δdirectly to derive the simpler estimate

|�Aψµ,Aψν�L2(Ω)| ≤ (Cd∗)2|Ω|2−(|µ|+|ν|)( d

2+d∗+1)δ−2(d+2t+d∗+1). (84)

Together (83) and (84) imply that��Aψµ,Aψν�L2(Ω)

�µ,ν∈J fulfills the as-

sumption (72).For the general case t > 0, we consider

��∂αx (Aψµ), ∂

αx (Aψν)�L2(Ω)

�µ,ν∈J ,

α ∈ Nd0. In this setting

∂αx (Aψµ) =

Ω∂αx Φ(·, ξ)ψµ(ξ)dξ

is again an integral with a Schwartz kernel. Indeed, an analogous argumen-tation as in the case t = 0 yields condition (72) for the case t ∈ N, andconsequently for t ∈ R+.

4 Equivalence to an inexact finite dimensional scheme

In practice, whenever we deal with infinite dimensional problems, the oper-ators K and K � can not be evaluated exactly, and one has to replace theiroutput by suitable numerical approximations. In this section we study theconvergence behavior of the resulting inexact algorithm to solve the precon-ditioned minimization problem (67). Although the original problem is posedin general in infinite dimensions, adaptive approximations will allow us toshow the confinement of the iteration within a well-determined finite dimen-sional space. In particular, in Theorem 4.3 below, we show that the globalsupport of all iterates is contained in a finite set Λ0. From a practical pointof view, there would be no difference between the iterates produced by theadaptive scheme over the whole index set J or if we would restrict the set ofpossible indices to the (a priori unknown) set Λ0. Therefore, by arguing as inSection 3.1, the combination of preconditioning and adaptive solvers yieldsan iterative scheme for the minimization of the unpreconditioned functionalΓα.

We focus on the error introduced by the inexact evaluation of the nonlin-ear functional K and the linear operator (K �(·))∗. To this end let us assumethat for given tolerances ρ, δ > 0, there exist approximation schemes whichfor every v ∈ �2 and pairs (v, w) ∈ �2 × Y , respectively, compute finitedimensional approximations [K(v)]ρ and [

�K �(v)

�∗(w)]δ such that

�K(v)− [K(v)]ρ�Y ≤ ρ,

��K �(v)

�∗(w)− [

�K �(v)

�∗(w)]δ��2 ≤ δ.

(85)

This assumption is realistic, e.g., if the exact application of K and (K �(·))∗involves the solution of partial differential or integral equations and the nu-

25

Page 30: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

merical approximations can be computed by means of adaptive discretiza-tion schemes. Let us mention two prominent examples from the context ofadaptive wavelet schemes of linear and nonlinear operator equations.

Example 4.1. Let ΨY = {ψX,µ}µ∈JXand ΨY = {ψY,µ}µ∈JY

be waveletRiesz bases for X and Y , respectively, such that the assumptions of Sec-tion 3.2 are satisfied. We denote the associated synthesis operators by FX

and FY . Furthermore let K = K◦FX : �2 → Y for some nonlinear operatorK : X → Y .

(i) For the efficient approximate application of the linear operator (K �(v))∗

to a given w ∈ Y , it is advantageous if the coefficient array w ∈ �2(JY )of w = FY (w), or at least good approximations of it, has a fast de-cay [7]. In that case, one may exploit the representation

(K �(v))∗(w) = (F∗X ◦ (K�(FX(v))∗ ◦ FY )(w) =: Avw

and the compressibility of the stiffness matrixAv ∈ L(�2(JY ), �2(JX)).In fact, if Av ∈ L(�wτ (JY ), �

wτ (JX)) for all 0 < τ0 < τ < 2, then the

second inequality in (85) can be ensured by suitable matrix compres-sion techniques. In the special case of wavelet Riesz basesΨX , ΨY andK�(FX(v)) being a differential operator or an integral operator withSchwartz kernel, e.g., we can expect that the stiffness matrix Av iss∗-compressible, i.e., there exist biinfinite matrices Av,j with at mosta constant multiple of 2j nontrivial entries per row and column, suchthat �Av −Av,j�2 ≤ Cs2

−js, 0 < s < s∗. This property implies thatAv boundedly maps �wτ (JY ) into �

wτ (JX). We refer to [7, 22] and re-

lated works on the compressibility of operators in wavelet coordinatesand the concrete realization of associated adaptive matrix-vector mul-tiplications.

(ii) The approximate evaluation of the nonlinearity K itself at a giveninput v ∈ �2 is enabled under additional assumptions on the type of thenonlinearity. In the context of nonlinear operators, tree approximationtechniques play an important role. Here a tree structure is imposedon the coefficient array of the output argument. For example, in thespecial case that X is a closed subspace of Hs(Ω), s ≥ 0, Ω ⊂ Rd abounded domain, Y = X � and K decomposes into K = A + N witha linear, boundedly invertible operator A : X → X � and a Nemytskii-type nonlinearity

N : X → X �, (N (v))(x) = f�∂β1v(x), . . . , ∂βkv(x)

�, βj ∈ Nd

0,

adaptive wavelet tree approximation techniques have been developedand implemented in [1, 8, 12, 18].

26

Page 31: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

For simplicity, we will assume in the sequel that y is given exactly. Forconvenience, we define the analogue of (18) by

Tρ,δ(v) := v −1

λ[�K �(v)

�∗([K(v)]ρ − y)]δ. (86)

An implementable version of ISTA with decreasing threshold parametersα(n), i.e., (9), is then given by

u(n+1) = S 1λα(n)

�Tρ(n),δ(n)(u(n))

�. (87)

The following theorem shows that if the parameters ρ(n), δ(n) are suitablychosen, the overall algorithm is still linearly convergent.

Theorem 4.2. Let u∗ be a stationary point of (1) that satisfies T (u∗) ∈�wτ (J ) for some 0 < τ < 2. Furthermore let α(n),α ∈ RJ

+ with α(n)µ ≥ αµ ≥

α ∈ R+, µ ∈ J . We set u(0) = 0 and assume that T fulfills condition (37).With CLip

T be as therein and C is as in Lemma 2.2 we set

L :=4�(CLip

T + γ − γ)�u∗��2�2λ2

α2+ 4C|T (u∗)|τ�wτ (J )

�αλ

�−τ,

and define BL analogously to (41). Let us assume that there exists 0 < γ0 <

1, such that for all v ∈ BL and supp(v) ⊂ Λ ⊂ J with #Λ ≤ 2L

��T (u∗)− T (v)

�|S∗∪Λ��2(S∗∪Λ) ≤ γ0�u∗ − v��2 . (88)

For the operator K � we assume

CbndK� (BL) := sup

v∈BL

�K �(v)�L(�2,Y ) <∞. (89)

Then, for any γ0 < γ < γ < 1 the inexact thresholded iteration (87) fulfills

�u(n)

�n∈N ⊂ BL

and converges to u∗ at a linear rate

�u∗ − u(n)��2 ≤ �(n) := γn�u∗��2 , (90)

whenever the parameters and tolerances are chosen according to

1

λ

�Cbnd

K� (BL)ρ(n) + δ(n)

�≤ (γ − γ)�(n), (91)

maxµ∈J

|α(n)µ − αµ| ≤ λL− 1

2 (γ − γ0)�(n). (92)

27

Page 32: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

Proof. The proof is an induction over n. The case n = 0 is covered by theassumptions. Now let u(n) ∈ BL and (90) hold for n ∈ N. We begin by

proving # supp u(n+1) ≤ L. To this end we use the standing assumption(85) on the inexact operator evaluations, together with the assumption (91)to estimate for v ∈ BL

�T (v)− Tρ,δ(v)��2=

1

�K �(v)

�∗(K(v)− y)− [

�K �(v)

�∗([K(v)]ρ − y)]δ��2

≤ 1

λ

���K �(v)

�∗(K(v)− y)−

�K �(v)

�∗([K(v)]ρ − y)��2

+ ��K �(v)

�∗([K(v)]ρ − y)− [

�K �(v)

�∗([K(v)]ρ − y)]δ��2

≤ 1

λ

���K �(v)

�∗�L(Y,�2)ρ+ δ�

≤ (γ − γ)�(n).

(93)

This inequality, applied for v = u(n) ∈ BL, implies together with the Lips-chitz continuity assumption (37) that

�T (u∗)− Tρ(n),δ(n)(u(n))��2= �T (u∗)− T (u(n)) + T (u(n))− Tρ(n),δ(n)(u(n))��2≤ (CLip

T + γ − γ)�(n),

By invoking Lemma 2.2 we can conclude

# supp(u(n+1)) = # supp S 1λα(n)

�Tρ(n),δ(n)(u(n))

�≤ L. (94)

For the second part of the proof we set Λ(n) := S∗∪supp u(n)∪supp u(n+1).Notice that # supp u(n)∪supp u(n+1) ≤ 2L. Because shrinkage is nonexpan-sive and by the assumption (88) we may estimate

�u∗|Λ(n) − S 1λα

�T (u(n))|Λ(n)

���2(Λ(n))

= �S 1λα

�T (u∗)|Λ(n)

�− S 1

λα

�T (u(n))|Λ(n)

���2(Λ(n)) ≤ γ0�

(n).(95)

Moreover, we may use the Lipschitz assumption (37) directly and invokeLemma 2.2 directly to conclude

# supp S 1λα

�T (u(n))

�≤

4�T (u∗)− T (u(n))�2�2λ2α2

+ 4C|T (u∗)|τ�wτ (J )

�αλ

�−τ

≤ 4(CLipT �(n))2λ2

α2+ 4C|T (u∗)|τ�wτ (J )

�αλ

�−τ(96)

≤ L

28

Page 33: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

Sinceα(n) is decreasing toα it holds that supp S 1λα(n)

�T (u(n))

�⊂ supp S 1

λα

�T (u(n))

�.

This, together with (92) gives

�S 1λα

�T (u(n))|Λ(n)

�− S 1

λα(n)

�T (u(n))|Λ(n)

���2(Λ(n))

≤�#supp S 1

λα(T (u

(n)))� 1

2

λmaxµ∈J

|α(n)µ − αµ|

≤ (γ − γ0)�(n).

(97)

Finally, we use that shrinkage in nonexpansive, together with (93) for u(n)

for the estimate

�S 1λα(n)

�T (u(n))|Λ(n)

�− S 1

λα(n)

�Tρ(n),δ(n)(u(n))|Λ(n)

���2(Λ(n)) ≤ (γ − γ)�(n).

(98)The combination of (95), (97), and (98) finalizes the proof

�u∗ − u(n+1)��2 ≤ �u∗|Λ(n) − S 1λα

�T (u(n))|Λ(n)

���2(Λ(n))

+ �S 1λα

�T (u(n))|Λ(n)

�− S 1

λα(n)

�T (u(n))|Λ(n)

���2(Λ(n))

+ �S 1λα(n)

�T (u(n))|Λ(n)

�− u(n+1)

|Λ(n)��2(Λ(n))

≤ γ0�(n) + (γ − γ0)�(n) + (γ − γ)�(n) = �(n+1).

We have shown that the support size of each iterate u(n) can be boundedby a uniform constant. As it turns out there also exists a bounded setΛ0 ⊂ J that contains all those supports.

Theorem 4.3. Let the assumptions of Theorem 4.2 hold. Let N ∈ N belarge enough such that there exists δ > 0 with

�(N+1) + δ ≤ 1

λinfµ∈J

αµ. (99)

Then it holds that

supp(u(n)) ⊂ Λδ(T (u∗)), n ≥ N,

and consequently

supp(u(n)) ⊂� N�

j=0

supp(u(j))�∪ Λδ(T (u

∗)) =: Λ0, n ∈ N.

Proof. We prove that for any fixed n ≥ N all µ ∈ J \ Λδ(T (u∗)) it holds

that (u(n+1))µ = 0. To this end let 0 < γ0 < γ < γ < 1 be as in Theorem 4.2and denote Λ(n) := S∗ ∪ supp u(n) ∪ u(n+1). Recall that by estimating as inequation (93) it holds that

��T (u(n))|Λ(n) − Tρ(n),δ(n)(u(n))|Λ(n)

���2(Λ(n)) ≤ (γ − γ)�(n), (100)

29

Page 34: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

and further that, since # supp u(n) ≤ L, we can use (42) to estimate

��T (u∗)|Λ(n) − T (u(n))|Λ(n)��2(Λ(n)) ≤ γ0�

(n). (101)

By definition µ ∈ Λ(n) \ Λδ(T (u∗)) implies that |

�T (u∗)

�µ| ≤ δ. Therefore,

for such µ we may use (100), (101), (99), and the fact that α(n) is decreasingto α to estimate

|�Tρ(n),δ(n)(u(n))

�µ| ≤ �T (u∗)|Λ(n) − Tρ(n),δ(n)(u(n))|Λ(n)��2(Λ(n)) + |

�T (u∗)

�µ|

≤ �(N+1) + δ ≤ infν∈J

αν ≤ infν∈J

α(n)ν .

Finally, by the definition of S 1λα(n) it follows that (u(n+1))µ = 0.

References

[1] A. Barinka, Fast computation tools for adaptive wavelet schemes, Ph.D.thesis, RWTH Aachen, 2005.

[2] T. Bonesky, K. Bredies, D.A. Lorenz, and P. Maass, A generalized con-ditional gradient method for nonlinear operator equations with sparsityconstraints, Inverse Probl. 23 (2007), no. 5, 2041–2058.

[3] K. Bredies, D.A. Lorenz, and P. Maass, A generalized conditional gra-dient method and its connection to an iterative shrinkage method, Com-put. Optim. Appl. 42 (2009), no. 2, 173–193.

[4] A. Chambolle, R.A. DeVore, N. Lee, and B.J. Lucier, Nonlinear waveletimage processing: variational problems, compression, and noise removalthrough wavelet shrinkage, IEEE Trans. Image Process. 7 (1998), no. 3,319–335.

[5] A. Cohen, Numerical analysis of wavelet methods, Studies in Mathemat-ics and its Applications, vol. 32, Elsevier, North-Holland, Amsterdam,2003.

[6] A. Cohen, W. Dahmen, and R.A. DeVore, Adaptive wavelet methodsfor elliptic operator equations — Convergence rates, Math. Comput.70 (2001), 27–75.

[7] , Adaptive wavelet methods for elliptic operator equations: Con-vergence rates, Math. Comput. 70 (2001), no. 233, 27–75.

[8] , Adaptive wavelet schemes for nonlinear variational problems,SIAM J. Numer. Anal. 41 (2003), no. 5, 1785–1823.

30

Page 35: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

[9] S. Dahlke, M. Fornasier, and T. Raasch, Multilevel preconditioningfor adaptive sparse optimization, Preprint 25, DFG-SPP 1324, August2009.

[10] , Multilevel preconditioning and adaptive sparse solution of in-verse problems, Math. Comput. 81 (2012), no. 277, 419–446.

[11] W. Dahmen and R. Schneider, Wavelets with complementary boundaryconditions – functions spaces on the cube, Result. Math. 34 (1998),255–293.

[12] W. Dahmen, R. Schneider, and Y. Xu, Nonlinear functionals ofwavelet expansions – Adaptive reconstruction and fast evaluation, Nu-mer. Math. 86 (2000), no. 1, 49–101.

[13] I. Daubechies, M. Defrise, and C. De Mol, An iterative thresholding al-gorithm for linear inverse problems with a sparsity constraint, Commun.Pure Appl. Math. 57 (2004), no. 11, 1413–1457.

[14] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle re-gression, Ann. Stat. 32 (2004), no. 2, 407–499.

[15] I. Ekeland and R. Temam, Convex analysis and variational prob-lems. Unabridged, corrected republication of the 1976 english origi-nal, Philadelphia, PA: Society for Industrial and Applied Mathematics,1999.

[16] H.W. Engl, M. Hanke, and A. Neubauer, Regularization of inverse prob-lems, Mathematics and its Applications, vol. 375, Kluwer AcademicPublishers Group, Dordrecht, 1996.

[17] M. Figueiredo and R. Nowak, An EM algorithm for wavelet-based imagerestoration, IEEE Trans. Image Process. 12 (2003), no. 8, 906–916.

[18] J. Kappei, Adaptive frame methods for nonlinear elliptic problems,Appl. Anal. 90 (2011), no. 7–8, 1323–1353.

[19] R. Ramlau and G. Teschke, A Tikhonov-based projection iteration fornonlinear ill-posed problems with sparsity constraints, Numer. Math.104 (2006), no. 2, 177–203.

[20] J.-L. Starck, D.L. Donoho, and E.J. Candes, Astronomical image rep-resentation by the curvelet transform, A&A 398 (2003), no. 2, 785–800.

[21] J.-L. Starck, M.K. Nguyen, and F. Murtagh, Wavelets and curvelets forimage deconvolution: a combined approach, Signal Process. 83 (2003),no. 10, 2279–2283.

31

Page 36: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

[22] R. Stevenson, On the compressibility of operators in wavelet coordinates,SIAM J. Math. Anal. 35 (2004), no. 5, 1110–1132.

[23] G. Teschke and C. Borries, Accelerated projected steepest descent methodfor nonlinear inverse problems with sparsity constraints, Inverse Probl.26 (2010), no. 2, 23.

Stephan Dahlke, Ulrich FriedrichPhilipps-Universitat MarburgFB Mathematik und Informatik, AG NumerikHans-Meerwein-Strasse35032 Marburg, Germany{dahlke, friedrich}@mathematik.uni-marburg.de

Massimo FornasierTU MunchenFakultat fur Mathematik, Einheit M15: Angewandte Numerische AnalysisBoltzmannstrasse 385748 Garching b. Munchen, [email protected]

Thorsten RaaschJohannes Gutenberg-Universitat MainzInstitut fur Mathematik, AG Numerische MathematikStaudingerweg 955099 Mainz, [email protected]

32

Page 37: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

Preprint Series DFG-SPP 1324

http://www.dfg-spp1324.de

Reports

[1] R. Ramlau, G. Teschke, and M. Zhariy. A Compressive Landweber Iteration forSolving Ill-Posed Inverse Problems. Preprint 1, DFG-SPP 1324, September 2008.

[2] G. Plonka. The Easy Path Wavelet Transform: A New Adaptive Wavelet Trans-form for Sparse Representation of Two-dimensional Data. Preprint 2, DFG-SPP1324, September 2008.

[3] E. Novak and H. Wozniakowski. Optimal Order of Convergence and (In-)Tractability of Multivariate Approximation of Smooth Functions. Preprint 3,DFG-SPP 1324, October 2008.

[4] M. Espig, L. Grasedyck, and W. Hackbusch. Black Box Low Tensor Rank Ap-proximation Using Fibre-Crosses. Preprint 4, DFG-SPP 1324, October 2008.

[5] T. Bonesky, S. Dahlke, P. Maass, and T. Raasch. Adaptive Wavelet Methodsand Sparsity Reconstruction for Inverse Heat Conduction Problems. Preprint 5,DFG-SPP 1324, January 2009.

[6] E. Novak and H. Wozniakowski. Approximation of Infinitely Differentiable Mul-tivariate Functions Is Intractable. Preprint 6, DFG-SPP 1324, January 2009.

[7] J. Ma and G. Plonka. A Review of Curvelets and Recent Applications. Preprint 7,DFG-SPP 1324, February 2009.

[8] L. Denis, D. A. Lorenz, and D. Trede. Greedy Solution of Ill-Posed Problems:Error Bounds and Exact Inversion. Preprint 8, DFG-SPP 1324, April 2009.

[9] U. Friedrich. A Two Parameter Generalization of Lions’ Nonoverlapping DomainDecomposition Method for Linear Elliptic PDEs. Preprint 9, DFG-SPP 1324,April 2009.

[10] K. Bredies and D. A. Lorenz. Minimization of Non-smooth, Non-convex Func-tionals by Iterative Thresholding. Preprint 10, DFG-SPP 1324, April 2009.

[11] K. Bredies and D. A. Lorenz. Regularization with Non-convex Separable Con-straints. Preprint 11, DFG-SPP 1324, April 2009.

Page 38: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

[12] M. Dohler, S. Kunis, and D. Potts. Nonequispaced Hyperbolic Cross Fast FourierTransform. Preprint 12, DFG-SPP 1324, April 2009.

[13] C. Bender. Dual Pricing of Multi-Exercise Options under Volume Constraints.Preprint 13, DFG-SPP 1324, April 2009.

[14] T. Muller-Gronbach and K. Ritter. Variable Subspace Sampling and Multi-levelAlgorithms. Preprint 14, DFG-SPP 1324, May 2009.

[15] G. Plonka, S. Tenorth, and A. Iske. Optimally Sparse Image Representation bythe Easy Path Wavelet Transform. Preprint 15, DFG-SPP 1324, May 2009.

[16] S. Dahlke, E. Novak, and W. Sickel. Optimal Approximation of Elliptic Problemsby Linear and Nonlinear Mappings IV: Errors in L2 and Other Norms. Preprint 16,DFG-SPP 1324, June 2009.

[17] B. Jin, T. Khan, P. Maass, and M. Pidcock. Function Spaces and Optimal Currentsin Impedance Tomography. Preprint 17, DFG-SPP 1324, June 2009.

[18] G. Plonka and J. Ma. Curvelet-Wavelet Regularized Split Bregman Iteration forCompressed Sensing. Preprint 18, DFG-SPP 1324, June 2009.

[19] G. Teschke and C. Borries. Accelerated Projected Steepest Descent Method forNonlinear Inverse Problems with Sparsity Constraints. Preprint 19, DFG-SPP1324, July 2009.

[20] L. Grasedyck. Hierarchical Singular Value Decomposition of Tensors. Preprint 20,DFG-SPP 1324, July 2009.

[21] D. Rudolf. Error Bounds for Computing the Expectation by Markov Chain MonteCarlo. Preprint 21, DFG-SPP 1324, July 2009.

[22] M. Hansen and W. Sickel. Best m-term Approximation and Lizorkin-TriebelSpaces. Preprint 22, DFG-SPP 1324, August 2009.

[23] F.J. Hickernell, T. Muller-Gronbach, B. Niu, and K. Ritter. Multi-level MonteCarlo Algorithms for Infinite-dimensional Integration on RN. Preprint 23, DFG-SPP 1324, August 2009.

[24] S. Dereich and F. Heidenreich. A Multilevel Monte Carlo Algorithm for LevyDriven Stochastic Differential Equations. Preprint 24, DFG-SPP 1324, August2009.

[25] S. Dahlke, M. Fornasier, and T. Raasch. Multilevel Preconditioning for AdaptiveSparse Optimization. Preprint 25, DFG-SPP 1324, August 2009.

Page 39: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

[26] S. Dereich. Multilevel Monte Carlo Algorithms for Levy-driven SDEs with Gaus-sian Correction. Preprint 26, DFG-SPP 1324, August 2009.

[27] G. Plonka, S. Tenorth, and D. Rosca. A New Hybrid Method for Image Approx-imation using the Easy Path Wavelet Transform. Preprint 27, DFG-SPP 1324,October 2009.

[28] O. Koch and C. Lubich. Dynamical Low-rank Approximation of Tensors.Preprint 28, DFG-SPP 1324, November 2009.

[29] E. Faou, V. Gradinaru, and C. Lubich. Computing Semi-classical Quantum Dy-namics with Hagedorn Wavepackets. Preprint 29, DFG-SPP 1324, November2009.

[30] D. Conte and C. Lubich. An Error Analysis of the Multi-configuration Time-dependent Hartree Method of Quantum Dynamics. Preprint 30, DFG-SPP 1324,November 2009.

[31] C. E. Powell and E. Ullmann. Preconditioning Stochastic Galerkin Saddle PointProblems. Preprint 31, DFG-SPP 1324, November 2009.

[32] O. G. Ernst and E. Ullmann. Stochastic Galerkin Matrices. Preprint 32, DFG-SPP1324, November 2009.

[33] F. Lindner and R. L. Schilling. Weak Order for the Discretization of the StochasticHeat Equation Driven by Impulsive Noise. Preprint 33, DFG-SPP 1324, November2009.

[34] L. Kammerer and S. Kunis. On the Stability of the Hyperbolic Cross DiscreteFourier Transform. Preprint 34, DFG-SPP 1324, December 2009.

[35] P. Cerejeiras, M. Ferreira, U. Kahler, and G. Teschke. Inversion of the noisy Radontransform on SO(3) by Gabor frames and sparse recovery principles. Preprint 35,DFG-SPP 1324, January 2010.

[36] T. Jahnke and T. Udrescu. Solving Chemical Master Equations by AdaptiveWavelet Compression. Preprint 36, DFG-SPP 1324, January 2010.

[37] P. Kittipoom, G. Kutyniok, and W.-Q Lim. Irregular Shearlet Frames: Geometryand Approximation Properties. Preprint 37, DFG-SPP 1324, February 2010.

[38] G. Kutyniok and W.-Q Lim. Compactly Supported Shearlets are OptimallySparse. Preprint 38, DFG-SPP 1324, February 2010.

Page 40: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

[39] M. Hansen and W. Sickel. Best m-Term Approximation and Tensor Products ofSobolev and Besov Spaces – the Case of Non-compact Embeddings. Preprint 39,DFG-SPP 1324, March 2010.

[40] B. Niu, F.J. Hickernell, T. Muller-Gronbach, and K. Ritter. Deterministic Multi-level Algorithms for Infinite-dimensional Integration on RN. Preprint 40, DFG-SPP 1324, March 2010.

[41] P. Kittipoom, G. Kutyniok, and W.-Q Lim. Construction of Compactly SupportedShearlet Frames. Preprint 41, DFG-SPP 1324, March 2010.

[42] C. Bender and J. Steiner. Error Criteria for Numerical Solutions ofBackward SDEs. Preprint 42, DFG-SPP 1324, April 2010.

[43] L. Grasedyck. Polynomial Approximation in Hierarchical Tucker Format byVector-Tensorization. Preprint 43, DFG-SPP 1324, April 2010.

[44] M. Hansen und W. Sickel. Best m-Term Approximation and Sobolev-Besov Spacesof Dominating Mixed Smoothness - the Case of Compact Embeddings. Preprint 44,DFG-SPP 1324, April 2010.

[45] P. Binev, W. Dahmen, and P. Lamby. Fast High-Dimensional Approximation withSparse Occupancy Trees. Preprint 45, DFG-SPP 1324, May 2010.

[46] J. Ballani and L. Grasedyck. A Projection Method to Solve Linear Systems inTensor Format. Preprint 46, DFG-SPP 1324, May 2010.

[47] P. Binev, A. Cohen, W. Dahmen, R. DeVore, G. Petrova, and P. Wojtaszczyk.Convergence Rates for Greedy Algorithms in Reduced Basis Methods. Preprint 47,DFG-SPP 1324, May 2010.

[48] S. Kestler and K. Urban. Adaptive Wavelet Methods on Unbounded Domains.Preprint 48, DFG-SPP 1324, June 2010.

[49] H. Yserentant. The Mixed Regularity of Electronic Wave Functions Multiplied byExplicit Correlation Factors. Preprint 49, DFG-SPP 1324, June 2010.

[50] H. Yserentant. On the Complexity of the Electronic Schrodinger Equation.Preprint 50, DFG-SPP 1324, June 2010.

[51] M. Guillemard and A. Iske. Curvature Analysis of Frequency Modulated Manifoldsin Dimensionality Reduction. Preprint 51, DFG-SPP 1324, June 2010.

[52] E. Herrholz and G. Teschke. Compressive Sensing Principles and Iterative SparseRecovery for Inverse and Ill-Posed Problems. Preprint 52, DFG-SPP 1324, July2010.

Page 41: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

[53] L. Kammerer, S. Kunis, and D. Potts. Interpolation Lattices for Hyperbolic CrossTrigonometric Polynomials. Preprint 53, DFG-SPP 1324, July 2010.

[54] G. Kutyniok and W.-Q Lim. Shearlets on Bounded Domains. Preprint 54, DFG-SPP 1324, July 2010.

[55] A. Zeiser. Wavelet Approximation in Weighted Sobolev Spaces of Mixed Orderwith Applications to the Electronic Schrodinger Equation. Preprint 55, DFG-SPP1324, July 2010.

[56] G. Kutyniok, J. Lemvig, and W.-Q Lim. Compactly Supported Shearlets.Preprint 56, DFG-SPP 1324, July 2010.

[57] A. Zeiser. On the Optimality of the Inexact Inverse Iteration Coupled with Adap-tive Finite Element Methods. Preprint 57, DFG-SPP 1324, July 2010.

[58] S. Jokar. Sparse Recovery and Kronecker Products. Preprint 58, DFG-SPP 1324,August 2010.

[59] T. Aboiyar, E. H. Georgoulis, and A. Iske. Adaptive ADER Methods Using Kernel-Based Polyharmonic Spline WENO Reconstruction. Preprint 59, DFG-SPP 1324,August 2010.

[60] O. G. Ernst, A. Mugler, H.-J. Starkloff, and E. Ullmann. On the Convergence ofGeneralized Polynomial Chaos Expansions. Preprint 60, DFG-SPP 1324, August2010.

[61] S. Holtz, T. Rohwedder, and R. Schneider. On Manifolds of Tensors of FixedTT-Rank. Preprint 61, DFG-SPP 1324, September 2010.

[62] J. Ballani, L. Grasedyck, and M. Kluge. Black Box Approximation of Tensors inHierarchical Tucker Format. Preprint 62, DFG-SPP 1324, October 2010.

[63] M. Hansen. On Tensor Products of Quasi-Banach Spaces. Preprint 63, DFG-SPP1324, October 2010.

[64] S. Dahlke, G. Steidl, and G. Teschke. Shearlet Coorbit Spaces: Compactly Sup-ported Analyzing Shearlets, Traces and Embeddings. Preprint 64, DFG-SPP 1324,October 2010.

[65] W. Hackbusch. Tensorisation of Vectors and their Efficient Convolution.Preprint 65, DFG-SPP 1324, November 2010.

[66] P. A. Cioica, S. Dahlke, S. Kinzel, F. Lindner, T. Raasch, K. Ritter, and R. L.Schilling. Spatial Besov Regularity for Stochastic Partial Differential Equationson Lipschitz Domains. Preprint 66, DFG-SPP 1324, November 2010.

Page 42: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

[67] E. Novak and H. Wozniakowski. On the Power of Function Values for the Ap-proximation Problem in Various Settings. Preprint 67, DFG-SPP 1324, November2010.

[68] A. Hinrichs, E. Novak, and H. Wozniakowski. The Curse of Dimensionality forMonotone and Convex Functions of Many Variables. Preprint 68, DFG-SPP 1324,November 2010.

[69] G. Kutyniok and W.-Q Lim. Image Separation Using Shearlets. Preprint 69,DFG-SPP 1324, November 2010.

[70] B. Jin and P. Maass. An Analysis of Electrical Impedance Tomography withApplications to Tikhonov Regularization. Preprint 70, DFG-SPP 1324, December2010.

[71] S. Holtz, T. Rohwedder, and R. Schneider. The Alternating Linear Scheme forTensor Optimisation in the TT Format. Preprint 71, DFG-SPP 1324, December2010.

[72] T. Muller-Gronbach and K. Ritter. A Local Refinement Strategy for ConstructiveQuantization of Scalar SDEs. Preprint 72, DFG-SPP 1324, December 2010.

[73] T. Rohwedder and R. Schneider. An Analysis for the DIIS Acceleration Methodused in Quantum Chemistry Calculations. Preprint 73, DFG-SPP 1324, December2010.

[74] C. Bender and J. Steiner. Least-Squares Monte Carlo for Backward SDEs.Preprint 74, DFG-SPP 1324, December 2010.

[75] C. Bender. Primal and Dual Pricing of Multiple Exercise Options in ContinuousTime. Preprint 75, DFG-SPP 1324, December 2010.

[76] H. Harbrecht, M. Peters, and R. Schneider. On the Low-rank Approximation bythe Pivoted Cholesky Decomposition. Preprint 76, DFG-SPP 1324, December2010.

[77] P. A. Cioica, S. Dahlke, N. Dohring, S. Kinzel, F. Lindner, T. Raasch, K. Ritter,and R. L. Schilling. Adaptive Wavelet Methods for Elliptic Stochastic PartialDifferential Equations. Preprint 77, DFG-SPP 1324, January 2011.

[78] G. Plonka, S. Tenorth, and A. Iske. Optimal Representation of Piecewise HolderSmooth Bivariate Functions by the Easy Path Wavelet Transform. Preprint 78,DFG-SPP 1324, January 2011.

Page 43: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

[79] A. Mugler and H.-J. Starkloff. On Elliptic Partial Differential Equations withRandom Coefficients. Preprint 79, DFG-SPP 1324, January 2011.

[80] T. Muller-Gronbach, K. Ritter, and L. Yaroslavtseva. A Derandomization of theEuler Scheme for Scalar Stochastic Differential Equations. Preprint 80, DFG-SPP1324, January 2011.

[81] W. Dahmen, C. Huang, C. Schwab, and G. Welper. Adaptive Petrov-Galerkinmethods for first order transport equations. Preprint 81, DFG-SPP 1324, January2011.

[82] K. Grella and C. Schwab. Sparse Tensor Spherical Harmonics Approximation inRadiative Transfer. Preprint 82, DFG-SPP 1324, January 2011.

[83] D.A. Lorenz, S. Schiffler, and D. Trede. Beyond Convergence Rates: Exact In-version With Tikhonov Regularization With Sparsity Constraints. Preprint 83,DFG-SPP 1324, January 2011.

[84] S. Dereich, M. Scheutzow, and R. Schottstedt. Constructive quantization: Ap-proximation by empirical measures. Preprint 84, DFG-SPP 1324, January 2011.

[85] S. Dahlke and W. Sickel. On Besov Regularity of Solutions to Nonlinear EllipticPartial Differential Equations. Preprint 85, DFG-SPP 1324, January 2011.

[86] S. Dahlke, U. Friedrich, P. Maass, T. Raasch, and R.A. Ressel. An adaptivewavelet method for parameter identification problems in parabolic partial differ-ential equations. Preprint 86, DFG-SPP 1324, January 2011.

[87] A. Cohen, W. Dahmen, and G. Welper. Adaptivity and Variational Stabilizationfor Convection-Diffusion Equations. Preprint 87, DFG-SPP 1324, January 2011.

[88] T. Jahnke. On Reduced Models for the Chemical Master Equation. Preprint 88,DFG-SPP 1324, January 2011.

[89] P. Binev, W. Dahmen, R. DeVore, P. Lamby, D. Savu, and R. Sharpley. Com-pressed Sensing and Electron Microscopy. Preprint 89, DFG-SPP 1324, March2011.

[90] P. Binev, F. Blanco-Silva, D. Blom, W. Dahmen, P. Lamby, R. Sharpley, andT. Vogt. High Quality Image Formation by Nonlocal Means Applied to High-Angle Annular Dark Field Scanning Transmission Electron Microscopy (HAADF-STEM). Preprint 90, DFG-SPP 1324, March 2011.

[91] R. A. Ressel. A Parameter Identification Problem for a Nonlinear Parabolic Dif-ferential Equation. Preprint 91, DFG-SPP 1324, May 2011.

Page 44: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

[92] G. Kutyniok. Data Separation by Sparse Representations. Preprint 92, DFG-SPP1324, May 2011.

[93] M. A. Davenport, M. F. Duarte, Y. C. Eldar, and G. Kutyniok. Introduction toCompressed Sensing. Preprint 93, DFG-SPP 1324, May 2011.

[94] H.-C. Kreusler and H. Yserentant. The Mixed Regularity of Electronic WaveFunctions in Fractional Order and Weighted Sobolev Spaces. Preprint 94, DFG-SPP 1324, June 2011.

[95] E. Ullmann, H. C. Elman, and O. G. Ernst. Efficient Iterative Solvers forStochastic Galerkin Discretizations of Log-Transformed Random Diffusion Prob-lems. Preprint 95, DFG-SPP 1324, June 2011.

[96] S. Kunis and I. Melzer. On the Butterfly Sparse Fourier Transform. Preprint 96,DFG-SPP 1324, June 2011.

[97] T. Rohwedder. The Continuous Coupled Cluster Formulation for the ElectronicSchrodinger Equation. Preprint 97, DFG-SPP 1324, June 2011.

[98] T. Rohwedder and R. Schneider. Error Estimates for the Coupled Cluster Method.Preprint 98, DFG-SPP 1324, June 2011.

[99] P. A. Cioica and S. Dahlke. Spatial Besov Regularity for Semilinear StochasticPartial Differential Equations on Bounded Lipschitz Domains. Preprint 99, DFG-SPP 1324, July 2011.

[100] L. Grasedyck and W. Hackbusch. An Introduction to Hierarchical (H-) Rank andTT-Rank of Tensors with Examples. Preprint 100, DFG-SPP 1324, August 2011.

[101] N. Chegini, S. Dahlke, U. Friedrich, and R. Stevenson. Piecewise Tensor ProductWavelet Bases by Extensions and Approximation Rates. Preprint 101, DFG-SPP1324, September 2011.

[102] S. Dahlke, P. Oswald, and T. Raasch. A Note on Quarkonial Systems and Multi-level Partition of Unity Methods. Preprint 102, DFG-SPP 1324, September 2011.

[103] A. Uschmajew. Local Convergence of the Alternating Least Squares AlgorithmFor Canonical Tensor Approximation. Preprint 103, DFG-SPP 1324, September2011.

[104] S. Kvaal. Multiconfigurational time-dependent Hartree method for describing par-ticle loss due to absorbing boundary conditions. Preprint 104, DFG-SPP 1324,September 2011.

Page 45: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

[105] M. Guillemard and A. Iske. On Groupoid C*-Algebras, Persistent Homology andTime-Frequency Analysis. Preprint 105, DFG-SPP 1324, September 2011.

[106] A. Hinrichs, E. Novak, and H. Wozniakowski. Discontinuous information in theworst case and randomized settings. Preprint 106, DFG-SPP 1324, September2011.

[107] M. Espig, W. Hackbusch, A. Litvinenko, H. Matthies, and E. Zander. EfficientAnalysis of High Dimensional Data in Tensor Formats. Preprint 107, DFG-SPP1324, September 2011.

[108] M. Espig, W. Hackbusch, S. Handschuh, and R. Schneider. Optimization Problemsin Contracted Tensor Networks. Preprint 108, DFG-SPP 1324, October 2011.

[109] S. Dereich, T. Muller-Gronbach, and K. Ritter. On the Complexity of ComputingQuadrature Formulas for SDEs. Preprint 109, DFG-SPP 1324, October 2011.

[110] D. Belomestny. Solving optimal stopping problems by empirical dual optimizationand penalization. Preprint 110, DFG-SPP 1324, November 2011.

[111] D. Belomestny and J. Schoenmakers. Multilevel dual approach for pricing Amer-ican style derivatives. Preprint 111, DFG-SPP 1324, November 2011.

[112] T. Rohwedder and A. Uschmajew. Local convergence of alternating schemes foroptimization of convex problems in the TT format. Preprint 112, DFG-SPP 1324,December 2011.

[113] T. Gorner, R. Hielscher, and S. Kunis. Efficient and accurate computation ofspherical mean values at scattered center points. Preprint 113, DFG-SPP 1324,December 2011.

[114] Y. Dong, T. Gorner, and S. Kunis. An iterative reconstruction scheme for pho-toacoustic imaging. Preprint 114, DFG-SPP 1324, December 2011.

[115] L. Kammerer. Reconstructing hyperbolic cross trigonometric polynomials by sam-pling along generated sets. Preprint 115, DFG-SPP 1324, February 2012.

[116] H. Chen and R. Schneider. Numerical analysis of augmented plane waves methodsfor full-potential electronic structure calculations. Preprint 116, DFG-SPP 1324,February 2012.

[117] J. Ma, G. Plonka, and M.Y. Hussaini. Compressive Video Sampling with Ap-proximate Message Passing Decoding. Preprint 117, DFG-SPP 1324, February2012.

Page 46: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

[118] D. Heinen and G. Plonka. Wavelet shrinkage on paths for scattered data denoising.Preprint 118, DFG-SPP 1324, February 2012.

[119] T. Jahnke and M. Kreim. Error bound for piecewise deterministic processes mod-eling stochastic reaction systems. Preprint 119, DFG-SPP 1324, March 2012.

[120] C. Bender and J. Steiner. A-posteriori estimates for backward SDEs. Preprint120, DFG-SPP 1324, April 2012.

[121] M. Espig, W. Hackbusch, A. Litvinenkoy, H.G. Matthiesy, and P. Wahnert. Eff-cient low-rank approximation of the stochastic Galerkin matrix in tensor formats.Preprint 121, DFG-SPP 1324, May 2012.

[122] O. Bokanowski, J. Garcke, M. Griebel, and I. Klompmaker. An adaptive sparsegrid semi-Lagrangian scheme for first order Hamilton-Jacobi Bellman equations.Preprint 122, DFG-SPP 1324, June 2012.

[123] A. Mugler and H.-J. Starkloff. On the convergence of the stochastic Galerkinmethod for random elliptic partial differential equations. Preprint 123, DFG-SPP1324, June 2012.

[124] P.A. Cioica, S. Dahlke, N. Dohring, U. Friedrich, S. Kinzel, F. Lindner, T. Raasch,K. Ritter, and R.L. Schilling. On the convergence analysis of Rothe’s method.Preprint 124, DFG-SPP 1324, July 2012.

[125] P. Binev, A. Cohen, W. Dahmen, and R. DeVore. Classification Algorithms usingAdaptive Partitioning. Preprint 125, DFG-SPP 1324, July 2012.

[126] C. Lubich, T. Rohwedder, R. Schneider, and B. Vandereycken. Dynamical approx-imation of hierarchical Tucker and Tensor-Train tensors. Preprint 126, DFG-SPP1324, July 2012.

[127] M. Kovacs, S. Larsson, and K. Urban. On Wavelet-Galerkin methods for semilinearparabolic equations with additive noise. Preprint 127, DFG-SPP 1324, August2012.

[128] M. Bachmayr, H. Chen, and R. Schneider. Numerical analysis of Gaussian ap-proximations in quantum chemistry. Preprint 128, DFG-SPP 1324, August 2012.

[129] D. Rudolf. Explicit error bounds for Markov chain Monte Carlo. Preprint 129,DFG-SPP 1324, August 2012.

[130] P.A. Cioica, K.-H. Kim, K. Lee, and F. Lindner. On the Lq(Lp)-regularity andBesov smoothness of stochastic parabolic equations on bounded Lipschitz domains.Preprint 130, DFG-SPP 1324, December 2012.

Page 47: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

[131] M. Hansen. n−term Approximation Rates and Besov Regularity for Elliptic PDEson Polyhedral Domains. Preprint 131, DFG-SPP 1324, December 2012.

[132] R. E. Bank and H. Yserentant. On the H1 -stability of the L2 -projection ontofinite element spaces. Preprint 132, DFG-SPP 1324, December 2012.

[133] M. Gnewuch, S. Mayer, and K. Ritter. On Weighted Hilbert Spaces and Inte-gration of Functions of Infinitely Many Variables. Preprint 133, DFG-SPP 1324,December 2012.

[134] D. Crisan, J. Diehl, P.K. Friz, and H. Oberhauser. Robust Filtering: CorrelatedNoise and Multidimensional Observation. Preprint 134, DFG-SPP 1324, January2013.

[135] Wolfgang Dahmen, Christian Plesken, and Gerrit Welper. Double Greedy Algo-rithms: Reduced Basis Methods for Transport Dominated Problems. Preprint135, DFG-SPP 1324, February 2013.

[136] Aicke Hinrichs, Erich Novak, Mario Ullrich, and Henryk Wozniakowski. The Curseof Dimensionality for Numerical Integration of Smooth Functions. Preprint 136,DFG-SPP 1324, February 2013.

[137] Markus Bachmayr, Wolfgang Dahmen, Ronald DeVore, and Lars Grasedyck. Ap-proximation of High-Dimensional Rank One Tensors. Preprint 137, DFG-SPP1324, March 2013.

[138] Markus Bachmayr and Wolfgang Dahmen. Adaptive Near-Optimal Rank TensorApproximation for High-Dimensional Operator Equations. Preprint 138, DFG-SPP 1324, April 2013.

[139] Felix Lindner. Singular Behavior of the Solution to the Stochastic Heat Equationon a Polygonal Domain. Preprint 139, DFG-SPP 1324, May 2013.

[140] Stephan Dahlke, Dominik Lellek, Shiu Hong Lui, and Rob Stevenson. AdaptiveWavelet Schwarz Methods for the Navier-Stokes Equation. Preprint 140, DFG-SPP 1324, May 2013.

[141] Jonas Ballani and Lars Grasedyck. Tree Adaptive Approximation in the Hierar-chical Tensor Format. Preprint 141, DFG-SPP 1324, June 2013.

[142] Harry Yserentant. A short theory of the Rayleigh-Ritz method. Preprint 142,DFG-SPP 1324, July 2013.

[143] M. Hefter and K. Ritter. On Embeddings of Weighted Tensor Product HilbertSpaces. Preprint 143, DFG-SPP 1324, August 2013.

Page 48: DFG-Schwerpunktprogramm 1324 · 2013-12-17 · DFG-Schwerpunktprogramm 1324 " Extraktion quanti zierbarer Information aus komplexen Systemen" Multilevel preconditioning for sparse

[144] M. Altmayer and A. Neuenkirch. Multilevel Monte Carlo Quadrature of Discon-tinuous Payoffs in the Generalized Heston Model using Malliavin Integration byParts. Preprint 144, DFG-SPP 1324, August 2013.

[145] L. Kammerer, D. Potts, and T. Volkmer. Approximation of multivariate functionsby trigonometric polynomials based on rank-1 lattice sampling. Preprint 145,DFG-SPP 1324, September 2013.

[146] C. Bender, N. Schweizer, and J. Zhuo. A primal-dual algorithm for BSDEs.Preprint 146, DFG-SPP 1324, October 2013.

[147] D. Rudolf. Hit-and-run for numerical integration. Preprint 147, DFG-SPP 1324,October 2013.

[148] D. Rudolf and M. Ullrich. Positivity of hit-and-run and related algorithms.Preprint 148, DFG-SPP 1324, October 2013.

[149] L. Grasedyck, M. Kluge, and S. Kramer. Alternating Directions Fitting (ADF) ofHierarchical Low Rank Tensors. Preprint 149, DFG-SPP 1324, October 2013.

[150] F. Filbir, S. Kunis, and R. Seyfried. Effective discretization of direct reconstructionschemes for photoacoustic imaging in spherical geometries. Preprint 150, DFG-SPP 1324, November 2013.

[151] E. Novak, M. Ullrich, and H. Wozniakowski. Complexity of Oscillatory Integrationfor Univariate Sobolev Spaces. Preprint 151, DFG-SPP 1324, November 2013.

[152] A. Hinrichs, E. Novak, and M. Ullrich. A Tractability Result for the ClenshawCurtis Smolyak Algorithm. Preprint 152, DFG-SPP 1324, November 2013.

[153] M. Hein, S. Setzer, L. Jost, and S. Rangapuram. The Total Variation on Hy-pergraphs - Learning on Hypergraphs Revisited. Preprint 153, DFG-SPP 1324,November 2013.

[154] M. Kovacs, S. Larsson, and F. Lindgren. On the Backward Euler Approximationof the Stochastic Allen-Chan Equation. Preprint 154, DFG-SPP 1324, November2013.

[155] S. Dahlke, M. Fornasier, U. Friedrich, and T. Raasch. Multilevel preconditioningfor sparse optimization of functionals with nonconvex fidelity terms. Preprint 155,DFG-SPP 1324, December 2013.


Recommended