 ## Inversion theory:  Optimization: combining norms and misfit

### Contents and objectives

 On this page: | Introduction | Combining model norms and misfit | The role of Beta | Specifying misfit to constrain the optimization | | Application to the Applet | | Effect of specifying different Beta values | | Inversion results |

Now it is the time to gather the pieces together and carry out an inversion. Study this page carefully because it explains in a conceptual manner how the ideas of misfit and model norm are used to perform an inversion. It also highlights the effects of making wrong choices when specifying how the algorithm is supposed to perform.

### Introduction

The foundations for inversion have now been laid. Referring to the flow chart, we have outlined what is required before inversion can be carried out, we have mentioned briefly how the earth is discretized, and we have discussed the two fundamental components that are used to control the inversion process - misft and model norms. The next step is to draw these concepts together into a useful inversion scheme.

### Combining misfit and model norms

We have mentioned before that inversion can be described as a process for automatically deciding which of infinitely many possible models to select, based upon measured data and prior understanding about the problem. The choice should be an optimal model, so we are trying to solve an optimization problem. There are two components to this optimization problem - (1) misfit and (2) model norms.

(1) Misfit: The ideas about the data, errors, predicted data and misfit have been introduced in the previous section. For now, we will assume that errors on the data are independent and we have some estimate of their standard deviations. The appropriate misfit function is .

Acceptable models are those for which the misfit is approximately equal to the expected value N; we will let d* denote a "target" value for our misfit.

(2) Model norm: Ideas about the model norm, m, were also introduced in the previous section. It is usual for this function to contain elements representing closeness to a reference model, and elements representing amount of structure in the spatial directions. Recall that we generally want to find a model that minimizes m. Such a solution would be close to a reference model, which represents our prior knowledge about the earth. It would also have minimum structure. Minimum structure models are useful as starting points for interpretation because they can be expected to capture the important big-scale features of the earth, even though detail may be missed. The argument is that arbitrarily complicated models can always be found that recreate the data, but arbitrarily simple models CANNOT be found. For this reason the general inverse problem is usually formed in terms of minimizing a model norm.

Combining these two points allows us to clearly state the inverse problem as:

 Find the model m that   - minimizes the model norm, and   - produces an acceptably small misfit. or : Find the model m that   minimizes subject to   d*

To solve such problems, the optimization with its constraint are often re-cast mathematically as a single optimization. The model norm and the misfit functions are combined in a single objective function and the problem is expressed as:

Find a model that minimize , where .

The quantity (Beta) is called the regularization parameter or Tikhonov paramter. Its purpose is to control the relative importance attached to making the misfit small and to reducing the value of the model norm. Its value is not known when the inversion begins. Rather, a value of is sought so that when is minimized, the computed model has a misfit that is equal to some predetermined target value or is less than some tolerance. To help consolidate intuition about the role of , we present a simple analogy that should be familiar to everyone.

### The role of Beta In our formulation of the inverse problem, we have two quantities that we want to make small. We want to minimize the model norm and we also want to minimize the misfit. Here is an example of another problem in which it is desireable to simultaneously minimize two quantities. Suppose a traveller is attempting to go from point A to point B on the map to the right. The two-part optimization problem is that you want to find a speed that will minimize the time taken on the trip, and you would also like that speed to result in minimum fuel consumption. Both time and fuel consumption are functions of speed and we could express this problem in the same form as our inversion problem:

minimize = time x fuel The objective is to find a speed that will minimize . If we set =0, then the minimizing process will ignore fuel consumption and it will find a speed that minimizes time. On the other hand, if we set to be large, the minimization process will recover a speed that keeps fuel to a minimum regardless of the time taken. Using optimization to manage this decision-making process can be illustrated with the "trade-off" curve shown here. The result we end up choosing depends upon a choice of : large values of result in low, efficient speeds, while smaller values of result in high, inefficient speeds. Now, which value of should we choose?

One good way of clarifying this travel problem is to specify that we want to
- minimize fuel consumption
- subject to getting there within 14 hours.
Conceptually, we could perform the minimization several times using a range of values (shown in the figure above). Mousing over on the figure illustrates this. Our final choice of speed is the one which, when plugged into the "time" equation, gives a result of 16 hours. For each , the minimization process will recover a speed and an associated travel time (vertical axis), and a value for fuel consumption (horizontal axis). According to this trade-off graph, any inversion which involves a value larger than the one which yielded our optimal speed will result in a longer time and lower fuel consumption. An inversion involving a smaller will yielding an opposite result. In other words, the choice of determines which of two components in our optimization problem will have the strongest influence on the outcome of the minimzation.

### Specifying misfit to constrain the optimization Let's return to our inverse problem where we had defined a model norm and a data misfit . We combine these into one objective function and minimize .

Carrying out the minimization for a range of 's produces the Tikhonov curve plotted at the right. It is named after the Russian scientist who advocated its use. The question remains regarding which solution we want. In the previous section we showed that if errors associated with data are Gaussian and have known standard deviations, the expected value of the misfit function (equation #3.3.8) is N, the number of data points. So we can find a preferred model by performing several inversions using a range of 's, and selecting the result which satisfies =N. The preferred value of misfit, or target misfit, is specified as part of the inversion, and it will be called . Mouse over on the figure to see this.

What kinds of models would be obtained if the value of is more or less than the one which produces the preferred model? Following the trade-off curve helps to anticipate what happens. Choosing the result obtained when was less than optimal means the misfit will be smaller and the model norm will be larger. Consider the meanings of this:

• Smaller misfit means predictions based on the recovered model look more like the observed data with its noise.
• Also, values of the model norm will be larger, meaning our measurement of the model (our ruler) is larger. That means the model is more complicated, or in geologic terms, there is more "structure" to the model.
• When predictions are closer to observations (misfit is less than the statistical expected value) and model norm is large, it is likely that some of that "excess structure" is there to account for noise in the data. In other words, a model that causes too small a misfit value probably has excess, erroneous features.

Conversely, larger values of will yield models that cause larger values of misfit - predictions will look less like the observations. Also, model norm values will be smaller, meaning the model will be simpler than our "optimal" model. Figures at the end of this section illustrate these effects using results from the Linear Inversion Applet.

### Application to the applet

Let us proceed with inversion using the UBC-GIF Linear Inversion Applet. As a reminder, all of the necessary steps are in the following list (the steps are shown in the figure below). Only the last two steps were not covered in the previous section: a model has been defined (green curve in the Applet's top-left window);
kernels have been specified;
noise on the data has been specified (the noise button);
data have been generated (shown in the Applet's data window), and
a model norm has been specified by setting values for the two 's in equation #3.3.7.
The last step is to specify how should be chosen. The four options will be explained in a later chapter.
For now we will follow arguments above and specify that should =N; that is the target value of misfit should equal the number of data. This is achieved by checking the "chifact" option, and setting it's value to 1.
Setting this chifact value equal to 2 would effectively ask for a target misfit value of 2xN; again, details are in a later chapter.
Finally, everything necessary for performing an inversion is ready, and you can go ahead and click the "invert" button.

The inversion result, or recovered model, will be displayed as a red line over the true (green) model's graph. Predicted data will be displayed in the data window, and two other graphs will be plotted - these are explained later. Also, some numbers will be listed under the buttons summarizing values obtained for misfit, model norm, and other parameters.

### Effect of specifying different Beta values

Three pairs of figures below illustrate what kinds of models are obtained when different values of target misfit are specified. This is done by changing the "chifact" value. Doing this corresponds to selecting an inversion result that uses a value of that is more (or less) than the one used when chifact = 1.  1. The result when chifact = 1 was specified. The model chosen is the one obtained when is such that the misfit function (using the recovered model) has a value equal to the target value of N. 2. If a value that is too small is used, then the misfit value will be over-emphasized in the optimization. The model recovered will yield a misfit value that is too small (predictions are too close to observations), and the model norm will be large. The model will contain far too much structure, which has occurs because the system is trying to fit noise in the data. Observe in the figure how far the predicted data (red X's) are from the true or noisy data. 3. If a value is used that is too large then the model norm value will be over-emphasized in the optimization. The model recovered will yield a model norm value that is small (as expected from a minimization), but the data will be poorly fit so the best model has probably not been found. The model has very little structure but it is not a good representation of the true model. Compare the predicted data values to those obtained for the "optimal" model.

### Inversion results A question that often arises is, "why isn't the inversion hard-wired to find a solution that yields a misfit that is equal to N?" The answer has two parts. Firstly, even in synthetic examples where known Gaussian noise is added, the misfit between the true data and the observations is likely somewhat different than N. More importantly, the misfit measure in equation 3.3.8 assumes that the errors are Gaussian, independent, and have known standard deviations. In field examples we do not have the luxury of knowing the standard errors of the data and hence, we have to make a guess. Being able to run an inversion with different values of allows one to compensate for incorrect estimates for the data errors.

So how should an inversion be inspected to see if it was successful?

The resulting model should be "geologically sensible," and in agreement with all available prior knowledge.
The misfit value should be checked, and it should be reasonably close to the value expected (greater, less than, or equal to N).
The model norm can also be checked and compared with the true model norm if that is known. For synthetic examples, the constructed model should have a smaller norm than that of the true model (otherwise the optimization procedure hasn't worked). However, in most practical instances we don't know enough about the true value of the model norm to make much use of this number.

Deciding exactly how to proceed with specific data sets takes some experience and understanding of the problem, the data set, and the methodology. We will discuss these issues in the sixth section of this chapter, and in other sections that discuss how to use or apply inversion.

### Conclusion so far:

On this page and the previous one, we have introduced the primary concepts that underly inversion. These two sections (3.3 and 3.4) are very dense with new concepts and information, and it is challenging to grasp it all quickly. In the last two sections of this chapter we show how these concepts are put into practice to solve real problems in geophysical inversion. But first, the next section gives a brief summary using interactive figures to re-emphasize these key concepts.