Use cases

This section provides a few ideas of how basic operation of the FitPy package may look like. It focusses on recipe-driven data analysis as its user-friendly interface that does not require the spectroscopist to know anything about programming and allows to fully focus on the actual fitting.

As a user, you write “recipes” in form of human-readable YAML files telling the application which tasks to perform on what datasets. This allows for fully unattended, automated and scheduled fitting. At the same time, it allows you to analyse the data without need for actual programming.

Important

Currently, this section is used by the developers to get an idea of how to design the interface of FitPy. Therefore, different, not yet implemented scenarios are listed as recipes. Assume the interface to change frequently for now, as it is still in the initial design phase.

Fitting of single datasets

Most basic fitting

Generally, models are fitted to data of a dataset. While the datasets are loaded as usual, models are created using a model task, while fitting is performed using an analysis task from the FitPy package. The latter inherits from the ASpecD analysis task and hence returns a calculated dataset with the fitted model as its data.

 1format:
 2  type: ASpecD recipe
 3  version: '0.2'
 4
 5datasets:
 6  - /path/to/dataset
 7
 8tasks:
 9  - kind: model
10    type: Gaussian
11    properties:
12      parameters:
13        position: 1.5
14        width: 0.5
15    from_dataset: /path/to/dataset
16    output: model
17    result: gaussian_model
18
19  - kind: fitpy.singleanalysis
20    type: SimpleFit
21    properties:
22      model: gaussian_model
23      parameters:
24        fit:
25          amplitude:
26            start: 5
27            range: [3, 7]
28    result: fitted_gaussian

Settings for the algorithm

The algorithm used for fitting (the method) as well as other settings regarding the algorithm need to be controllable by the user.

 1format:
 2  type: ASpecD recipe
 3  version: '0.2'
 4
 5datasets:
 6  - /path/to/dataset
 7
 8tasks:
 9  - kind: model
10    type: Gaussian
11    properties:
12      parameters:
13        position: 1.5
14        width: 0.5
15    from_dataset: /path/to/dataset
16    output: model
17    result: gaussian_model
18
19  - kind: fitpy.singleanalysis
20    type: SimpleFit
21    properties:
22      model: gaussian_model
23      parameters:
24        fit:
25          amplitude:
26            start: 5
27            range: [3, 7]
28        algorithm:
29          method: leastsq
30    result: fitted_gaussian

Omitting parts of the dataset

Often, real data contain parts that cannot be described by a certain model, but can safely be ignored, or they contain outliers that shall not be fitted. Therefore, fitting needs to provide means to specify regions of the dataset to be ignored during fitting.

 1format:
 2  type: ASpecD recipe
 3  version: '0.2'
 4
 5datasets:
 6  - /path/to/dataset
 7
 8tasks:
 9  - kind: model
10    type: Gaussian
11    properties:
12      parameters:
13        position: 1.5
14        width: 0.5
15    from_dataset: /path/to/dataset
16    output: model
17    result: gaussian_model
18
19  - kind: fitpy.singleanalysis
20    type: SimpleFit
21    properties:
22      model: gaussian_model
23      parameters:
24        fit:
25          amplitude:
26            start: 5
27            range: [3, 7]
28        algorithm:
29          method: leastsq
30        cut_range:
31          - [5, 6]
32          - [9, 10]
33    result: fitted_gaussian

Robust fitting (sampling of starting conditions, LHS)

One crucial aspect of the FitPy package is to provide simple means to perform optimisation starting from different starting conditions via a Latin Hypercube Sampling (LHS). Here, both, the number of samples per parameter as well as the interval the starting conditions should be sampled from for each parameter need to be provided.

One problem occurring with sampling algorithms is that the result is no longer a single dataset, at least not trivially. It might still be a single dataset, but the information from the different runs needs to be available for analysis of the goodness of the eventual fit.

 1format:
 2  type: ASpecD recipe
 3  version: '0.2'
 4
 5datasets:
 6  - /path/to/dataset
 7
 8tasks:
 9  - kind: model
10    type: Gaussian
11    properties:
12      parameters:
13        position: 1.5
14        width: 0.5
15    from_dataset: /path/to/dataset
16    output: model
17    result: gaussian_model
18
19  - kind: fitpy.singleanalysis
20    type: LHSFit
21    properties:
22      model: gaussian_model
23      parameters:
24        fit:
25          amplitude:
26            lhs_range: [1, 10]
27        lhs:
28          points: 5
29    result: fitted_gaussian

Fitting multiple species to one dataset

Different to global fitting, where one model is fitted to several independent datasets, fitting multiple species to one dataset is nothing special from a fitting perspective, as a rather complex composite model is used in this case.

There are, however, a few minor differences with respect to the parameter definitions: As the parameters will often have the same name, as they stem from the same model, the corresponding fit parameter will get lists for initial guesses, ranges, and alike. Furthermore, the weighting for the different models of the composite model needs to be fitted as well.

Usually, as the number of parameters increases dramatically with more than one species, robust fitting shall be applied.

 1format:
 2  type: ASpecD recipe
 3  version: '0.2'
 4
 5datasets:
 6  - /path/to/dataset
 7
 8tasks:
 9  - kind: model
10    type: CompositeModel
11    from_dataset: /path/to/dataset
12    properties:
13      models:
14        - Gaussian
15        - Gaussian
16      parameters:
17        - position: 5
18        - position: 8
19    output: model
20    result: multiple_gaussians
21
22  - kind: fitpy.singleanalysis
23    type: MultipleSpeciesFit
24    properties:
25      model: multiple_gaussians
26      parameters:
27        fit:
28          position:
29            start:
30              - 5
31              - 8
32            range:
33              - [3, 7]
34              - [6, 9]
35          weights:
36            start:
37              - 1
38            range:
39              - [0.5, 2]
40    result: fitted_gaussians

Global fitting

Global fitting covers multiple independent datasets to which models with a joint set of parameters are fitted. This is different to multiple species fitted to one dataset.

As such, the fitting inherits from aspecd.analysis.MultiAnalysisStep, and for each dataset a model needs to be provided, as the datasets cannot be restricted to have the same dimensions and ranges of their axes.

 1format:
 2  type: ASpecD recipe
 3  version: '0.2'
 4
 5datasets:
 6  - /path/to/first/dataset
 7  - /path/to/second/dataset
 8
 9tasks:
10  - kind: model
11    type: Gaussian
12    properties:
13      parameters:
14        position: 1.5
15        width: 0.5
16    from_dataset: /path/to/first/dataset
17    output: model
18    result: gaussian_model_1
19
20  - kind: model
21    type: Gaussian
22    properties:
23      parameters:
24        position: 1.5
25        width: 0.5
26    from_dataset: /path/to/second/dataset
27    result: gaussian_model_2
28
29  - kind: fitpy.multianalysis
30    type: GlobalFit
31    properties:
32      models:
33        - gaussian_model_1
34        - gaussian_model_2
35      parameters:
36        fit:
37          amplitude:
38            start: 5
39            range: [3, 7]
40    result: fitted_gaussian

Questions to address:

  • How to deal with constraints for parameters for the multiple datasets?

    Example: Data have been recorded in an angular-dependent fashion, and while the angle offset between datasets is known with some accuracy, the initial offset shall be fitted.

    In such case, one probably would want to provide the offsets, let the fitting adjust the offsets within a given range, and let the initial offset to be varied in a much wider range.

Graphical visualisation of fit results

Graphical visualisation of fit results is of crucial importance. The lmfit package provides straightforward and compelling means for most standard situations, and these can be used to inspire similar solutions based on the functionality provided by the ASpecD framework.

Comparing data and fitted model

Basically, data, model, and perhaps the residual should be shown.

As the results of a fit are not contained in the original experimental dataset, but rather in a calculated dataset that is returned by the fitting step, the plotters reconstruct the data of the original dataset by adding the residual to the fitted model.

In the simplest way of a 1D dataset, the complete procedure, including model creation and fitting, may look as follows:

 1format:
 2  type: ASpecD recipe
 3  version: '0.2'
 4
 5datasets:
 6  - /path/to/dataset
 7
 8tasks:
 9  - kind: model
10    type: Gaussian
11    properties:
12      parameters:
13        position: 1.5
14        width: 0.5
15    from_dataset: /path/to/dataset
16    output: model
17    result: gaussian_model
18
19  - kind: fitpy.singleanalysis
20    type: SimpleFit
21    properties:
22      model: gaussian_model
23      parameters:
24        fit:
25          amplitude:
26            start: 5
27            range: [3, 7]
28    result: fitted_gaussian
29
30  - kind: fitpy.singleplot
31    type: SinglePlotter1D
32    properties:
33      filename: fit_result.pdf
34    apply_to: fitted_gaussian

Robustness of sampling strategies

When sampling starting conditions, it is important to graphically display the results for the different samples, to evaluate the robustness of the fit and the applicability of the grid used.

Key here is to extract the statistical criterion from the result of a LHSFit. As the result is a calculated dataset, a standard plotter from the ASpecD framework can be used to diplay the results:

 1format:
 2  type: ASpecD recipe
 3  version: '0.2'
 4
 5datasets:
 6  - /path/to/dataset
 7
 8tasks:
 9  - kind: model
10    type: Gaussian
11    properties:
12      parameters:
13        position: 1.5
14        width: 0.5
15    from_dataset: /path/to/dataset
16    output: model
17    result: gaussian_model
18
19  - kind: fitpy.singleanalysis
20    type: LHSFit
21    properties:
22      model: gaussian_model
23      parameters:
24        fit:
25          amplitude:
26            lhs_range: [1, 10]
27        lhs:
28          points: 5
29    result: fitted_gaussian
30
31  - kind: fitpy.singleanalysis
32    type: ExtractLHSStatistics
33    properties:
34      parameters:
35        criterion: reduced_chi_square
36    result: reduced_chi_squares
37    apply_to: fitted_gaussian
38
39  - kind: singleplot
40    type: SinglePlotter1D
41    properties:
42      properties:
43        drawing:
44          marker: 'o'
45          linestyle: 'none'
46      filename: 'reduced_chi_squares.pdf'
47    apply_to: reduced_chi_squares

Fit reports

The importance of sensible reports cannot be overrated, and TSim is the key to the success of much of the own research, allowing a skilled student with few hours of introduction to perform fits to data without much need of further supervision besides discussing the results together.

Thanks to the report generating capabilities of the ASpecD framework, generating reports should be straight-forward. Key here is not how to generate reports, but to provide sensible templates and, where necessary and sensible, generate the necessary information to be added to the reports.

Shall reports automatically generate certain figures if these are not provided? May be sensible, but would include functionality from plotters in reports. An alternative would be to provide recipe templates for specifying the plots that can be adapted by the user upon need. For the time being, the approach will be to automatically create a plot with standard appearance.

The full process, from loading data to reporting the final fit results, and including model definition and actual fitting, may look as follows:

 1format:
 2  type: ASpecD recipe
 3  version: '0.2'
 4
 5datasets:
 6  - /path/to/dataset
 7
 8tasks:
 9  - kind: model
10    type: Gaussian
11    properties:
12      parameters:
13        position: 1.5
14        width: 0.5
15    from_dataset: /path/to/dataset
16    output: model
17    result: gaussian_model
18
19  - kind: fitpy.singleanalysis
20    type: SimpleFit
21    properties:
22      model: gaussian_model
23      parameters:
24        fit:
25          amplitude:
26            start: 5
27            range: [3, 7]
28    result: fitted_gaussian
29
30  - kind: fitpy.report
31    type: LaTeXFitReporter
32    properties:
33      template: simplefit.tex
34      filename: fit_result.tex
35    compile: true
36    apply_to: fitted_gaussian

In this particular case, the fit report will be saved to the file fit_result.tex and automatically compiled into a PDF file, as the compile flag is set to true.

Pipelines

Inspired by packages such as sklearn, it might prove useful to be able to define entire pipelines and employ a series of fitting strategies.

The question remains: Is this a separate task, or could this reasonably be done using recipe-driven data analysis and providing well-crafted example recipes?