You're reading the documentation for a development version. For the latest released version, please have a look at v0.1.

fitpy.analysis module

Actual fitting in form of analysis steps derived from the ASpecD framework.

Fitting of a model to (experimental) data can always be seen as an analysis step in context of the ASpecD framework, resulting in a calculated dataset.

Introduction

Fitting in context of the FitPy framework is always a two-step process:

  1. define the model, and

  2. define the fitting task.

The model is an instance of aspecd.model.Model, and the fitting task one of the analysis steps contained in this module. They are, in turn, instances of aspecd.analysis.AnalysisStep.

A first, simple but complete example of a recipe performing a fit on experimental data, is given below.

 1format:
 2  type: ASpecD recipe
 3  version: '0.2'
 4
 5datasets:
 6  - /path/to/dataset
 7
 8tasks:
 9  - kind: model
10    type: Gaussian
11    properties:
12      parameters:
13        position: 1.5
14        width: 0.5
15    from_dataset: /path/to/dataset
16    output: model
17    result: gaussian_model
18
19  - kind: fitpy.singleanalysis
20    type: SimpleFit
21    properties:
22      model: gaussian_model
23      parameters:
24        fit:
25          amplitude:
26            start: 5
27            range: [3, 7]
28    result: fitted_gaussian

In this case, a Gaussian model is created, with values for two parameters set explicitly and not varied during the fit. The third parameter is varied during the fit, within a given range. Furthermore, using SimpleFit here without further parameters, a least-squares fit using the Levenberg-Marquardt method is carried out.

Note

Usually, you will have set another ASpecD-derived package as default package in your recipe for processing and analysing your data. Hence, you need to provide the package name (fitpy) in the kind property, as shown in the examples.

This seamless integration of FitPy into all packages derived from the ASpecD framework ensures full reproducibility and allows to easily pre- and postprocess the data accordingly. Particularly for analysing the results of fits, have a look at the dedicated plotters in the fitpy.plotting module and the reporters in the fitpy.report module.

Fitting strategies

Fitting models to data is generally a complex endeavour, and FitPy will not take any decisions for you. However, it provides powerful abstractions and a simple user interface, letting you automate as much as possible, while retaining full reproducibility. Thus, it is possible to create entire pipelines spanning a series of different fitting strategies, analyse the results, and making an informed decision for each individual question.

The following list provides an overview of the different fitting strategies supported by FitPy (currently, as of January 2022, only a subset of these strategies is implemented).

  • Simple fitting of single datasets

    Make informed guesses for the initial values of the variable parameters of a model and fit the model to the data. The most straight-forward strategy. Still, different optimisation algorithms can be chosen.

    If the fitness landscape is rough and contains local minima, the fit may not converge or get stuck in local minima.

  • Robust fitting via sampling of initial conditions (LHS)

    Instead of informed guesses for the initial values of the variable parameters of a model, these initial values are randomly chosen using a Latin Hypercube. For each of the resulting grid points, an optimisation is performed, analogous to what has been described above.

    Generally, this approach will take much longer, with the computing time scaling with the number of grid points, but it is much more robust, particularly with complicated fitness landscapes containing many local minima.

  • Fitting multiple species to one dataset

    Basically the same as fitting a simple model to the data of a dataset, but this time providing a aspecd.model.CompositeModel.

    Given the usually larger number of variable parameters, robust fitting strategies (LHS) should be used.

  • Global fitting of several datasets at once

    Fit models with a joint set of parameters to a series of independent datasets. Can become arbitrarily complex given that some parameters may be allowed to independently vary for each dataset, while others are constrained, while still others (typically the majority) will be identical for each dataset.

Common to all these different fitting strategies is the need to sometimes omit parts of a dataset from fitting.

Concrete fitting tasks implemented

Currently (as of January 2022), only fitting tasks are implemented that operate on single datasets.

  • SimpleFit

    Perform basic fit of a model to data of a dataset.

    The result is stored as calculated dataset and can be investigated graphically using dedicated plotters from the fitpy.plotting module as well as reporters from the fitpy.report module.

    With default settings, a least-Squares minimization using the Levenberg-Marquardt method is carried out. Initial values and ranges for each variable parameter of the model can be specified, as well as details for the algorithm.

  • LHSFit

    Fit of a model to data of a dataset using LHS of starting conditions.

    In case of more complicated fits, e.g. many variable parameters or a rough fitness landscape of the optimisation including several local minima, obtaining a robust fit and finding the global minimum requires to sample initial conditions and to perform fits for all these conditions.

    Here, a Latin Hypercube gets used to sample the initial conditions. For each of these, a fit is performed in the same way as in SimpleFit. The best fit is stored in the result as usual, and additionally, the sample grid, the discrepancy as measure for the quality of the grid, as well as all results from the individual fits are stored in the lhs property of the metadata of the resulting dataset. This allows for both, handling this resulting dataset as usual and evaluating the robustness of the fit.

Helper classes

Additionally to the fitting tasks described above, helper classes exist for specific tasks.

  • ExtractLHSStatistics

    Extract statistical criterion from LHS results for evaluating robustness.

    When performing a robust fitting, e.g. by employing LHSFit, evaluating the robustness of the obtained results is a crucial step. Therefore, the results from each individual fit starting with a grid point of the Latin Hypercube are contained in the resulting dataset. This analysis step extracts the given criterion from the calculated dataset and returns itself a calculated dataset with the values of the criterion sorted in ascending order as its data. The result can be graphically represented using a aspecd.plotting.SinglePlotter1D.

Module documentation

class fitpy.analysis.SimpleFit[source]

Bases: SingleAnalysisStep

Perform basic fit of a model to data of a dataset.

The result is stored as calculated dataset and can be investigated graphically using dedicated plotters from the fitpy.plotting module as well as reporters from the fitpy.report module.

With default settings, a least-Squares minimization using the Levenberg-Marquardt method is carried out. Initial values and ranges for each variable parameter of the model can be specified, as well as details for the algorithm.

result

Calculated dataset containing the result of the fit.

Type:

fitpy.dataset.CalculatedDataset

model

Model to fit to the data of a dataset

Type:

aspecd.model.Model

parameters

All parameters necessary to perform the fit.

These parameters will be available from the calculation metadata of the resulting fitpy.dataset.CalculatedDataset.

fitdict

All model parameters that should be fitted.

The keys of the dictionary need to correspond to the parameter names of the model that should be fitted. The values are dicts themselves, at least with the key start for the initial parameter value. Additionally, you may supply a range with a list as value defining the interval within the the parameter is allowed to vary during fitting.

algorithmdict

Settings of the algorithm used to fit the model to the data.

The key method needs to correspond to the methods supported by lmfit.minimizer.Minimizer.

To provide more information independent on the naming of the respective methods in lmfit.minimizer and the corresponding scipy.optimize module, the key description contains a short description of the respective method.

To pass additional parameters to the solver, use the parameters dict. Which parameters can be set depends on the actual solver. For details, see the scipy.optimize documentation.

Type:

dict

Raises:

ValueError – Raised if the method provided in parameters['algorithm'][ 'method'] is not supported or invalid.

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

Fitting is always a two-step process: (i) define the model, and (ii) define the fitting task. Here and in the following examples we assume a dataset to be imported as dataset, and the model is initially evaluated for this dataset (to get the same data dimensions and alike, see aspecd.model for details).

Note

Usually, you will have set another ASpecD-derived package as default package in your recipe for processing and analysing your data. Hence, you need to provide the package name (fitpy) in the kind property, as shown in the examples.

Suppose you have a dataset and want to fit a Gaussian to its data, in this case only varying the amplitude, but keeping position and width fixed to the values specified in the model:

- kind: model
  type: Gaussian
  properties:
    parameters:
      position: 1.5
      width: 0.5
  from_dataset: dataset
  output: model
  result: gaussian_model

- kind: fitpy.singleanalysis
  type: SimpleFit
  properties:
    model: gaussian_model
    parameters:
      fit:
        amplitude:
          start: 5
  result: fitted_gaussian

In this particular case, you define your model specifying position and width, and fit this to the data allowing only the parameter amplitude to vary, keeping position and width fixed at the given values. Furthermore, no range is provided for the values the amplitude can be varied.

To provide a range (boundaries, interval) for the allowed values of a fit parameter, simply add the key range:

- kind: model
  type: Gaussian
  properties:
    parameters:
      position: 1.5
      width: 0.5
  from_dataset: dataset
  output: model
  result: gaussian_model

- kind: fitpy.singleanalysis
  type: SimpleFit
  properties:
    model: gaussian_model
    parameters:
      fit:
        amplitude:
          start: 5
          range: [3, 7]
  result: fitted_gaussian

Note that models usually will have standard values for all parameters. Therefore, you only need to define those parameters in the model task that shall not change during fitting and should have values different from the standard.

If you were to fit multiple parameters of a model (as is usually the case), provide all these parameters in the fit section of the parameters of the fitting task:

- kind: model
  type: Gaussian
  properties:
    parameters:
      width: 0.5
  from_dataset: dataset
  output: model
  result: gaussian_model

- kind: fitpy.singleanalysis
  type: SimpleFit
  properties:
    model: gaussian_model
    parameters:
      fit:
        amplitude:
          start: 5
          range: [3, 7]
        position:
          start: 2
          range: [0, 4]
  result: fitted_gaussian

While the default algorithm settings are quite sensible as a starting point, you can explicitly set the method and its parameters. Which parameters can be set depends on the method chosen, for details refer to the documentation of the underlying scipy.optimize module. The following example shows how to change the algorithm to least_squares (using a Trust Region Reflective method) and to set the tolerance for termination by the change of the independent variables (xtol parameter):

- kind: model
  type: Gaussian
  properties:
    parameters:
      position: 1.5
      width: 0.5
  from_dataset: dataset
  output: model
  result: gaussian_model

- kind: fitpy.singleanalysis
  type: SimpleFit
  properties:
    model: gaussian_model
    parameters:
      fit:
        amplitude:
          start: 5
      algorithm:
        method: least_squares
        parameters:
          xtol: 1e-6
  result: fitted_gaussian
class fitpy.analysis.LHSFit[source]

Bases: SingleAnalysisStep

Fit of a model to data of a dataset using LHS of starting conditions.

In case of more complicated fits, e.g. many variable parameters or a rough fitness landscape of the optimisation including several local minima, obtaining a robust fit and finding the global minimum requires to sample initial conditions and to perform fits for all these conditions.

Here, a Latin Hypercube gets used to sample the initial conditions. For each of these, a fit is performed in the same way as in SimpleFit. The best fit is stored in the result as usual, and additionally, the sample grid, the discrepancy as measure for the quality of the grid, as well as all results from the individual fits are stored in the lhs property of the metadata of the resulting dataset. This allows for both, handling this resulting dataset as usual and evaluating the robustness of the fit.

result

Calculated dataset containing the result of the fit.

Type:

fitpy.dataset.CalculatedDatasetLHS

model

Model to fit to the data of a dataset

Type:

aspecd.model.Model

parameters

All parameters necessary to perform the fit.

These parameters will be available from the calculation metadata of the resulting fitpy.dataset.CalculatedDatasetLHS.

fitdict

All model parameters that should be fitted.

The keys of the dictionary need to correspond to the parameter names of the model that should be fitted. The values are dicts themselves, at least with the key start for the initial parameter value. Additionally, you may supply a range with a list as value defining the interval within the the parameter is allowed to vary during fitting.

algorithmdict

Settings of the algorithm used to fit the model to the data.

The key method needs to correspond to the methods supported by lmfit.minimizer.Minimizer.

To provide more information independent on the naming of the respective methods in lmfit.minimizer and the corresponding scipy.optimize module, the key description contains a short description of the respective method.

To pass additional parameters to the solver, use the parameters dict. Which parameters can be set depends on the actual solver. For details, see the scipy.optimize documentation.

lhsdict

Settings for the Latin Hypercube used to sample initial conditions.

The most important parameter is points, defining the points in each direction of the Latin Hypercube.

Additionally, all attributes of scipy.stats.qmc.LatinHypercube can be set. Currently, the relevant parameters are centered (to center the point within the multi-dimensional grid) and rng_seed to allow for reproducible results.

In case rng_seed is provided, the random number generator is reset and seeded with this value, ensuring reproducible creation of the grid.

Type:

dict

Raises:

ValueError – Raised if the method provided in parameters['algorithm'][ 'method'] is not supported or invalid.

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

Fitting is always a two-step process: (i) define the model, and (ii) define the fitting task. Here and in the following examples we assume a dataset to be imported as dataset, and the model is initially evaluated for this dataset (to get the same data dimensions and alike, see aspecd.model for details).

Note

Usually, you will have set another ASpecD-derived package as default package in your recipe for processing and analysing your data. Hence, you need to provide the package name (fitpy) in the kind property, as shown in the examples.

Suppose you have a dataset and want to fit a Gaussian to its data, in this case only varying the amplitude, but keeping position and width fixed to the values specified in the model:

- kind: model
  type: Gaussian
  properties:
    parameters:
      position: 1.5
      width: 0.5
  from_dataset: dataset
  output: model
  result: gaussian_model

- kind: fitpy.singleanalysis
  type: LHSFit
  properties:
    model: gaussian_model
    parameters:
      fit:
        amplitude:
          lhs_range: [2, 8]
      lhs:
        points: 7
  result: fitted_gaussian

In this particular case, you define your model specifying position and width, and fit this to the data allowing only the parameter amplitude to vary, keeping position and width fixed at the given values. Furthermore, a range for the LHS for this parameter is provided, as well as the number of points sampled per dimension of the Latin Hypercube.

Only those fitting parameters having set the lhs_range parameter will be used for sampling. All other parameters will be used with their starting values as defined:

- kind: model
  type: Gaussian
  properties:
    parameters:
      position: 1.5
      width: 0.5
  from_dataset: dataset
  output: model
  result: gaussian_model

- kind: fitpy.singleanalysis
  type: LHSFit
  properties:
    model: gaussian_model
    parameters:
      fit:
        amplitude:
          lhs_range: [2, 8]
        position:
          start: 2
          range: [0, 4]
      lhs:
        points: 7
  result: fitted_gaussian

Here, only the amplitude parameter will be sampled (in this particular case resulting in a 1D Latin Hypercube), while for each of the grid points, the position parameter is set as given.

Sometimes the grid created by the LHS should be reproducible. In this case, provide a seed for the random number generator used internally:

- kind: model
  type: Gaussian
  properties:
    parameters:
      position: 1.5
      width: 0.5
  from_dataset: dataset
  output: model
  result: gaussian_model

- kind: fitpy.singleanalysis
  type: LHSFit
  properties:
    model: gaussian_model
    parameters:
      fit:
        amplitude:
          lhs_range: [2, 8]
      lhs:
        points: 7
        rng_seed: 42
  result: fitted_gaussian

Similarly, if the points should be centred within the multi-dimensional grid, set the centered property accordingly:

- kind: model
  type: Gaussian
  properties:
    parameters:
      position: 1.5
      width: 0.5
  from_dataset: dataset
  output: model
  result: gaussian_model

- kind: fitpy.singleanalysis
  type: LHSFit
  properties:
    model: gaussian_model
    parameters:
      fit:
        amplitude:
          lhs_range: [2, 8]
      lhs:
        points: 7
        centered: true
  result: fitted_gaussian

While the default algorithm settings are quite sensible as a starting point, you can explicitly set the method and its parameters. Which parameters can be set depends on the method chosen, for details refer to the documentation of the underlying scipy.optimize module. The following example shows how to change the algorithm to least_squares (using a Trust Region Reflective method) and to set the tolerance for termination by the change of the independent variables (xtol parameter):

- kind: model
  type: Gaussian
  properties:
    parameters:
      position: 1.5
      width: 0.5
  from_dataset: dataset
  output: model
  result: gaussian_model

- kind: fitpy.singleanalysis
  type: LHSFit
  properties:
    model: gaussian_model
    parameters:
      fit:
        amplitude:
          lhs_range: [2, 8]
      lhs:
        points: 7
      algorithm:
        method: least_squares
        parameters:
          xtol: 1e-6
  result: fitted_gaussian
class fitpy.analysis.ExtractLHSStatistics[source]

Bases: SingleAnalysisStep

Extract statistical criterion from LHS results for evaluating robustness.

When performing a robust fitting, e.g. by employing LHSFit, evaluating the robustness of the obtained results is a crucial step. Therefore, the results from each individual fit starting with a grid point of the Latin Hypercube are contained in the resulting dataset. This analysis step extracts the given criterion from the calculated dataset and returns itself a calculated dataset with the values of the criterion sorted in ascending order as its data. The result can be graphically represented using a aspecd.plotting.SinglePlotter1D.

result

Calculated dataset containing the extracted statistical criterion.

Type:

aspecd.dataset.CalculatedDataset

parameters

All parameters necessary to perform the fit.

These parameters will be available from the calculation metadata of the resulting fitpy.dataset.CalculatedDatasetLHS.

criterionstr

Statistical criterion extracted from the LHS results

Type:

dict

Examples

For convenience, a series of examples in recipe style (for details of the recipe-driven data analysis, see aspecd.tasks) is given below for how to make use of this class. The examples focus each on a single aspect.

Suppose you have fitted a Gaussian to the data of a dataset, as shown in the example section of the LHSFit class. If you now want to extract the reduced chi square value and plot it, the whole procedure could look like this:

- kind: model
  type: Gaussian
  properties:
    parameters:
      position: 1.5
      width: 0.5
  from_dataset: dataset
  output: model
  result: gaussian_model

- kind: fitpy.singleanalysis
  type: LHSFit
  properties:
    model: gaussian_model
    parameters:
      fit:
        amplitude:
          lhs_range: [2, 8]
      lhs:
        points: 7
  result: fitted_gaussian

- kind: fitpy.singleanalysis
  type: ExtractLHSStatistics
  properties:
    parameters:
      criterion: reduced_chi_square
  result: reduced_chi_squares
  apply_to: fitted_gaussian

- kind: singleplot
  type: SinglePlotter1D
  properties:
    properties:
      drawing:
        marker: 'o'
        linestyle: 'none'
    filename: 'reduced_chi_squares.pdf'
  apply_to: reduced_chi_squares

This would plot the reduced chi square values in ascending order, showing the individual values as not connected dots.

static applicable(dataset)[source]

Check whether analysis step is applicable to the given dataset.

Returns:

applicableTrue if successful, False otherwise.

Return type:

bool