Use cases
This section provides a few ideas of how basic operation of the FitPy package may look like. It focusses on recipe-driven data analysis as its user-friendly interface that does not require the spectroscopist to know anything about programming and allows to fully focus on the actual fitting.
As a user, you write “recipes” in form of human-readable YAML files telling the application which tasks to perform on what datasets. This allows for fully unattended, automated and scheduled fitting. At the same time, it allows you to analyse the data without need for actual programming.
Important
Currently, this section is used by the developers to get an idea of how to design the interface of FitPy. Therefore, different, not yet implemented scenarios are listed as recipes. Assume the interface to change frequently for now, as it is still in the initial design phase.
Fitting of single datasets
Most basic fitting
Generally, models are fitted to data of a dataset. While the datasets are loaded as usual, models are created using a model task
, while fitting is performed using an analysis task
from the FitPy package. The latter inherits from the ASpecD analysis task and hence returns a calculated dataset with the fitted model as its data.
1format:
2 type: ASpecD recipe
3 version: '0.2'
4
5datasets:
6 - /path/to/dataset
7
8tasks:
9 - kind: model
10 type: Gaussian
11 properties:
12 parameters:
13 position: 1.5
14 width: 0.5
15 from_dataset: /path/to/dataset
16 output: model
17 result: gaussian_model
18
19 - kind: fitpy.singleanalysis
20 type: SimpleFit
21 properties:
22 model: gaussian_model
23 parameters:
24 fit:
25 amplitude:
26 start: 5
27 range: [3, 7]
28 result: fitted_gaussian
Settings for the algorithm
The algorithm used for fitting (the method) as well as other settings regarding the algorithm need to be controllable by the user.
1format:
2 type: ASpecD recipe
3 version: '0.2'
4
5datasets:
6 - /path/to/dataset
7
8tasks:
9 - kind: model
10 type: Gaussian
11 properties:
12 parameters:
13 position: 1.5
14 width: 0.5
15 from_dataset: /path/to/dataset
16 output: model
17 result: gaussian_model
18
19 - kind: fitpy.singleanalysis
20 type: SimpleFit
21 properties:
22 model: gaussian_model
23 parameters:
24 fit:
25 amplitude:
26 start: 5
27 range: [3, 7]
28 algorithm:
29 method: leastsq
30 result: fitted_gaussian
Omitting parts of the dataset
Often, real data contain parts that cannot be described by a certain model, but can safely be ignored, or they contain outliers that shall not be fitted. Therefore, fitting needs to provide means to specify regions of the dataset to be ignored during fitting.
1format:
2 type: ASpecD recipe
3 version: '0.2'
4
5datasets:
6 - /path/to/dataset
7
8tasks:
9 - kind: model
10 type: Gaussian
11 properties:
12 parameters:
13 position: 1.5
14 width: 0.5
15 from_dataset: /path/to/dataset
16 output: model
17 result: gaussian_model
18
19 - kind: fitpy.singleanalysis
20 type: SimpleFit
21 properties:
22 model: gaussian_model
23 parameters:
24 fit:
25 amplitude:
26 start: 5
27 range: [3, 7]
28 algorithm:
29 method: leastsq
30 cut_range:
31 - [5, 6]
32 - [9, 10]
33 result: fitted_gaussian
Robust fitting (sampling of starting conditions, LHS)
One crucial aspect of the FitPy package is to provide simple means to perform optimisation starting from different starting conditions via a Latin Hypercube Sampling (LHS). Here, both, the number of samples per parameter as well as the interval the starting conditions should be sampled from for each parameter need to be provided.
One problem occurring with sampling algorithms is that the result is no longer a single dataset, at least not trivially. It might still be a single dataset, but the information from the different runs needs to be available for analysis of the goodness of the eventual fit.
1format:
2 type: ASpecD recipe
3 version: '0.2'
4
5datasets:
6 - /path/to/dataset
7
8tasks:
9 - kind: model
10 type: Gaussian
11 properties:
12 parameters:
13 position: 1.5
14 width: 0.5
15 from_dataset: /path/to/dataset
16 output: model
17 result: gaussian_model
18
19 - kind: fitpy.singleanalysis
20 type: LHSFit
21 properties:
22 model: gaussian_model
23 parameters:
24 fit:
25 amplitude:
26 lhs_range: [1, 10]
27 lhs:
28 points: 5
29 result: fitted_gaussian
Fitting multiple species to one dataset
Different to global fitting, where one model is fitted to several independent datasets, fitting multiple species to one dataset is nothing special from a fitting perspective, as a rather complex composite model is used in this case.
There are, however, a few minor differences with respect to the parameter definitions: As the parameters will often have the same name, as they stem from the same model, the corresponding fit parameter will get lists for initial guesses, ranges, and alike. Furthermore, the weighting for the different models of the composite model needs to be fitted as well.
Usually, as the number of parameters increases dramatically with more than one species, robust fitting shall be applied.
1format:
2 type: ASpecD recipe
3 version: '0.2'
4
5datasets:
6 - /path/to/dataset
7
8tasks:
9 - kind: model
10 type: CompositeModel
11 from_dataset: /path/to/dataset
12 properties:
13 models:
14 - Gaussian
15 - Gaussian
16 parameters:
17 - position: 5
18 - position: 8
19 output: model
20 result: multiple_gaussians
21
22 - kind: fitpy.singleanalysis
23 type: MultipleSpeciesFit
24 properties:
25 model: multiple_gaussians
26 parameters:
27 fit:
28 position:
29 start:
30 - 5
31 - 8
32 range:
33 - [3, 7]
34 - [6, 9]
35 weights:
36 start:
37 - 1
38 range:
39 - [0.5, 2]
40 result: fitted_gaussians
Global fitting
Global fitting covers multiple independent datasets to which models with a joint set of parameters are fitted. This is different to multiple species fitted to one dataset.
As such, the fitting inherits from aspecd.analysis.MultiAnalysisStep
, and for each dataset a model needs to be provided, as the datasets cannot be restricted to have the same dimensions and ranges of their axes.
1format:
2 type: ASpecD recipe
3 version: '0.2'
4
5datasets:
6 - /path/to/first/dataset
7 - /path/to/second/dataset
8
9tasks:
10 - kind: model
11 type: Gaussian
12 properties:
13 parameters:
14 position: 1.5
15 width: 0.5
16 from_dataset: /path/to/first/dataset
17 output: model
18 result: gaussian_model_1
19
20 - kind: model
21 type: Gaussian
22 properties:
23 parameters:
24 position: 1.5
25 width: 0.5
26 from_dataset: /path/to/second/dataset
27 result: gaussian_model_2
28
29 - kind: fitpy.multianalysis
30 type: GlobalFit
31 properties:
32 models:
33 - gaussian_model_1
34 - gaussian_model_2
35 parameters:
36 fit:
37 amplitude:
38 start: 5
39 range: [3, 7]
40 result: fitted_gaussian
Questions to address:
How to deal with constraints for parameters for the multiple datasets?
Example: Data have been recorded in an angular-dependent fashion, and while the angle offset between datasets is known with some accuracy, the initial offset shall be fitted.
In such case, one probably would want to provide the offsets, let the fitting adjust the offsets within a given range, and let the initial offset to be varied in a much wider range.
Graphical visualisation of fit results
Graphical visualisation of fit results is of crucial importance. The lmfit package provides straightforward and compelling means for most standard situations, and these can be used to inspire similar solutions based on the functionality provided by the ASpecD framework.
Comparing data and fitted model
Basically, data, model, and perhaps the residual should be shown.
As the results of a fit are not contained in the original experimental dataset, but rather in a calculated dataset that is returned by the fitting step, the plotters reconstruct the data of the original dataset by adding the residual to the fitted model.
In the simplest way of a 1D dataset, the complete procedure, including model creation and fitting, may look as follows:
1format:
2 type: ASpecD recipe
3 version: '0.2'
4
5datasets:
6 - /path/to/dataset
7
8tasks:
9 - kind: model
10 type: Gaussian
11 properties:
12 parameters:
13 position: 1.5
14 width: 0.5
15 from_dataset: /path/to/dataset
16 output: model
17 result: gaussian_model
18
19 - kind: fitpy.singleanalysis
20 type: SimpleFit
21 properties:
22 model: gaussian_model
23 parameters:
24 fit:
25 amplitude:
26 start: 5
27 range: [3, 7]
28 result: fitted_gaussian
29
30 - kind: fitpy.singleplot
31 type: SinglePlotter1D
32 properties:
33 filename: fit_result.pdf
34 apply_to: fitted_gaussian
Robustness of sampling strategies
When sampling starting conditions, it is important to graphically display the results for the different samples, to evaluate the robustness of the fit and the applicability of the grid used.
Key here is to extract the statistical criterion from the result of a LHSFit. As the result is a calculated dataset, a standard plotter from the ASpecD framework can be used to diplay the results:
1format:
2 type: ASpecD recipe
3 version: '0.2'
4
5datasets:
6 - /path/to/dataset
7
8tasks:
9 - kind: model
10 type: Gaussian
11 properties:
12 parameters:
13 position: 1.5
14 width: 0.5
15 from_dataset: /path/to/dataset
16 output: model
17 result: gaussian_model
18
19 - kind: fitpy.singleanalysis
20 type: LHSFit
21 properties:
22 model: gaussian_model
23 parameters:
24 fit:
25 amplitude:
26 lhs_range: [1, 10]
27 lhs:
28 points: 5
29 result: fitted_gaussian
30
31 - kind: fitpy.singleanalysis
32 type: ExtractLHSStatistics
33 properties:
34 parameters:
35 criterion: reduced_chi_square
36 result: reduced_chi_squares
37 apply_to: fitted_gaussian
38
39 - kind: singleplot
40 type: SinglePlotter1D
41 properties:
42 properties:
43 drawing:
44 marker: 'o'
45 linestyle: 'none'
46 filename: 'reduced_chi_squares.pdf'
47 apply_to: reduced_chi_squares
Fit reports
The importance of sensible reports cannot be overrated, and TSim is the key to the success of much of the own research, allowing a skilled student with few hours of introduction to perform fits to data without much need of further supervision besides discussing the results together.
Thanks to the report generating capabilities of the ASpecD framework, generating reports should be straight-forward. Key here is not how to generate reports, but to provide sensible templates and, where necessary and sensible, generate the necessary information to be added to the reports.
Shall reports automatically generate certain figures if these are not provided? May be sensible, but would include functionality from plotters in reports. An alternative would be to provide recipe templates for specifying the plots that can be adapted by the user upon need. For the time being, the approach will be to automatically create a plot with standard appearance.
The full process, from loading data to reporting the final fit results, and including model definition and actual fitting, may look as follows:
1format:
2 type: ASpecD recipe
3 version: '0.2'
4
5datasets:
6 - /path/to/dataset
7
8tasks:
9 - kind: model
10 type: Gaussian
11 properties:
12 parameters:
13 position: 1.5
14 width: 0.5
15 from_dataset: /path/to/dataset
16 output: model
17 result: gaussian_model
18
19 - kind: fitpy.singleanalysis
20 type: SimpleFit
21 properties:
22 model: gaussian_model
23 parameters:
24 fit:
25 amplitude:
26 start: 5
27 range: [3, 7]
28 result: fitted_gaussian
29
30 - kind: fitpy.report
31 type: LaTeXFitReporter
32 properties:
33 template: simplefit.tex
34 filename: fit_result.tex
35 compile: true
36 apply_to: fitted_gaussian
In this particular case, the fit report will be saved to the file fit_result.tex
and automatically compiled into a PDF file, as the compile
flag is set to true.
Pipelines
Inspired by packages such as sklearn, it might prove useful to be able to define entire pipelines and employ a series of fitting strategies.
The question remains: Is this a separate task, or could this reasonably be done using recipe-driven data analysis and providing well-crafted example recipes?