{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "# Jupyter notebook tutorial:\n",
    "#    Advanced Fitting\n",
    "                                                \n",
    "                                                Günter Quast, April 2021                                                \n",
    "---\n",
    "## Jupyter Notebook Fundamentals\n",
    "\n",
    "This file of type `.ipynb` contains a tutorial as a `Jupyter notebook`.\n",
    "`Jupyter` provides a browser interface with a (simple) development environment for *Python* code\n",
    "and explanatory texts in intuitive *Markdown* format.\n",
    "The input of formulas in *LaTeX* format is also supported.\n",
    "\n",
    "A summary of the most important commands for using *Jupyter* as a working environment can be\n",
    "found in the notebook\n",
    "[*JupyterCheatsheet.ipynb*](https://git.scc.kit.edu/yh5078/datenanalyse/-/blob/master/jupyter/JupyterCheatsheet.ipynb)\n",
    "(German).\n",
    "Basics for statistical data analysis can be found in the notebooks\n",
    "[*IntroStatistik.ipynb*](https://git.scc.kit.edu/yh5078/datenanalyse/-/blob/master/jupyter/IntroStatistik.ipynb)\n",
    "(German) and\n",
    "[*Fehlerrechnung.ipynb*](https://git.scc.kit.edu/yh5078/datenanalyse/-/blob/master/jupyter/Fehlerrechnung.ipynb) (German).\n",
    "\n",
    "In *Jupyter*, code and text are entered in individual cells.\n",
    "Active cells are indicated by a blue bar in the margin.\n",
    "They can be in two states: in edit mode the input field is white, in command mode it is grayed out.\n",
    "Clicking in the border area selects the command mode, clicking in the text field of a code cell\n",
    "switches to edit mode.\n",
    "The `esc` key can also be used to leave the edit mode.\n",
    "\n",
    "Pressing `a` in command mode creates a new empty cell above the active cell, `b` creates one below.\n",
    "Entering `dd` deletes the corresponding cell.\n",
    "\n",
    "Cells can be either of the type `Markdown` or `Code`.\n",
    "Entering `m` in command mode sets the type Markdown, entering `y` selects the type Code.\n",
    "\n",
    "The cell content is processed - i.e. setting text or executing code - by entering `shift+return`,\n",
    "or `alt+return` if a new, empty cell should also be created.\n",
    "\n",
    "The settings mentioned here as well as the insertion, deletion or execution of cells can also be\n",
    "executed via the pull-down menu at the top.\n",
    "\n",
    "---\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Section 1: Introduction\n",
    "***\n",
    "\n",
    "**Fitting data to models** is a general task in physics. Typically, parameter-dependent models\n",
    "represent theoretical knowledge, which is confronted with empirical data, i.e. measurements.\n",
    "Measured data are affected by unavoidable experimental uncertainties, and the question\n",
    "then is whether there exists a set of model parameters for which the agreement with the\n",
    "data is acceptable. Speaking in term of statistics, the step of comparing data with\n",
    "uncertainties to a parameter-dependent model is a hypothesis test. Usually, a quantity\n",
    "called goodness-of-fit is used to quantify the level of agreement between the data and\n",
    "the theoretical model, expressed as a p-value, that is the probability to observe a\n",
    "goodness-of-fit which is worse than the one actually observed.\n",
    "\n",
    "### Recap: least-squares method \n",
    "\n",
    "An example is the value of the sum of least squares, S, at the values of the optimal \n",
    "parameters, known as $\\chi^2$. The corresponding p-value is the so-called $\\chi^2$-probability\n",
    "\n",
    "> $ p_{\\chi^2} = \\displaystyle \\int_{S_{min}}^\\infty \\chi^2(x, n_{dof}) {\\rm d}x$,\n",
    "\n",
    "where $S$ ist given by the deviation of the model function $f(x, \\vec{p})$, depending on \n",
    "parameter values $\\vec{p}$ evaluated at positions $x_i$, from the measurements $y_i$ normalised\n",
    "to the measurement uncertainties $\\sigma_i$, \n",
    "\n",
    "> $ S = \\displaystyle \\sum_{i=1}^N \\left( \\frac{ f(x_i; \\vec{p}) - y_i} {\\sigma_i} \\right)^2\\,$ \n",
    "<a id='Equ-1.1'> (Equ. 1.1)</a>\n",
    "\n",
    "$S$ is minimized with respect to the parameter values $\\vec{p}$, resulting in an \n",
    "optimal parameter set $\\vec{p}_0$ and the value S_min$(x_i, y_i, \\sigma_i, \\vec{p}_0)$ \n",
    "\n",
    "If $p_{\\chi^2}$ is below a pre-defined level, the hypothesis that the model describes the\n",
    "data is discarded. If the model is not rejected, the parameter set $\\vec{p}_0$ is\n",
    "accepted as the best-fit set of model parameters. The procedure skeched here is known as\n",
    "the least-squares method originally proposed by Carl Friedrich Gauß. It is the basis of\n",
    "many fits in fundamental laboratory courses, and many implementations of this simple\n",
    "method exist in a variety of tools and libraries. \n",
    "\n",
    "The least-squares method can easily extended to cope with correlated uncertainties by\n",
    "introducing the covariance matrix $V$ of the measurement uncertainties:\n",
    "\n",
    "> $S\\left( \\vec{y},V f(\\vec x, \\vec p) \\right) = \n",
    "\\left( \\vec{y} - \\vec{f}(\\vec{x},\\vec{p} )\\right)^T \n",
    "\\,  {\\bf V^{-1}} \n",
    "\\left( \\vec{y} - \\vec{f}(\\vec{x},\\vec{p} )\\right)\\, $ \n",
    "  <a id='Equ-1.2'> (Equ. 1.2)</a>    \n",
    "\n",
    "The covariance matrix of the parameter uncertainties can be determined by\n",
    "evaluating the second derivatives of $S$ at the minimum:\n",
    "\n",
    "> $\\displaystyle{\\left( {{\\bf V}_\\hat{p}}^{-1} \\right)_{ij} \n",
    "   = \\frac{1}{2}\\,\\left.\\frac{\\partial^2 S(\\vec{p})}{\\partial p_i\\partial p_j}\n",
    "   \\right|_{\\hat{p_i}\\hat{p_j}}}$\n",
    "   <a id='Equ-1.3'> (Equ. 1.3)</a>\n",
    "\n",
    "Many of the wide-spread tools provide an implementaion, e.g. the easy-to-use function \n",
    "`curve-fit` in the `scipy.optimize` package.\n",
    "\n",
    "\n",
    "### 1.1 More complex use cases \n",
    "\n",
    "However, there are more complex use cases. Uncertainties on the abscissa values in \n",
    "addition to uncertainties on the ordinate can be implemented by transforming \n",
    "variations on the abscissa to variations on the ordinate by means of a first-order \n",
    "Taylor expansion.\n",
    "There are not many tools providing such functionality, among them the package ODR\n",
    "(orthogonal distance regression) in `scipy`, the Method `TGraphErrors` in the\n",
    "CERN root package. Correlated uncertainties on the abscissa are, however, not treated\n",
    "by these packages. \n",
    "\n",
    "Another complication consists in relative uncertainties, which are quite common in \n",
    "physics. A prominent example are calibration uncertainties of measurement devices,\n",
    "which lead to correlated, relative uncertainties of all measurements taken with\n",
    "the same device.  Another common example are approximations of Poisson uncertainties\n",
    "by a Gaussian distribution by taking the square root of the mean as the uncertainty,\n",
    "\\sigma = \\sqrt{\\mu}. Such relative uncertainties should be related to the true and \n",
    "not to the measured values in order to avoid biases. In doing so, the\n",
    "standard version of the least-squares method is no longer strictly valid, as the\n",
    "normalisation term of the Gaussian distribution becomes dependent on the parameters\n",
    "and can no longer be neglected. In this case, a fit based on the full likelihood \n",
    "of all involved uncertainties must be used, as will be explained below.\n",
    "\n",
    "To the author's knowledge, presently only the tool \n",
    "[*kafe2*](https://github.com/dsavoiu/kafe2) developed at KIT provides\n",
    "the means to simultaneously handle all of the use cases and error types\n",
    "mentioned above. \n",
    "The package [*PhyPraKit.phyFit*](https://phyprakit.readthedocs.io/en/latest/),\n",
    "also provides a light-weight, transparent layer implementing the required \n",
    "precedures on top of the function minimization and uncertainty analyis tool\n",
    "[*iminuit*](https://iminuit.readthedocs.io/en/stable/).\n",
    "Many of the examples described below will therefore use the \n",
    "*kafe2* or *phyFit* packages.\n",
    "\n",
    "### 1.2 Prerequisites and other tutorials\n",
    "\n",
    "The following material assumes familiarity with the basic principles of statistical\n",
    "data analysis and applications of the least-squares method, as explained in other\n",
    "jupyter tutorial of this series:\n",
    "\n",
    "> 1. Introduction to Jupyter notebooks: JupyterCheatsheet.ipynb\n",
    "  2. Introduction to  Python : PythonIntro.ipynb\n",
    "  3. Cheat-sheet for Python : PythonCheatsheet.ipynb\n",
    "  4. Basics of matplotlib : matplotlibTutorial.ipynb\n",
    "  5. Basics of Statistics: IntroStatistik.ipynb\n",
    "  6. Error analysis in laboratory courses: Fehlerrechnung.ipynb\n",
    "  7. Introduction to kafe2: kafe2Tutorial.ipynb\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.3 General settings for `kafe2`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from __future__ import division, print_function  # python2-compatibility\n",
    "import sys, os\n",
    "\n",
    "# Lines with % or %% at the beginning are so-called \"magic commands\",\n",
    "# that specify the cell type or options for displaying graphics.\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Imports and Presets for *kafe2*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from kafe2 import config, XYContainer, Fit, Plot\n",
    "\n",
    "import numpy as np, matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.4 A simple fit example\n",
    "\n",
    "To start, a simple, very general example may serve as an introduction to the usage of *kafe2*.\n",
    "The following, well-documented piece of code is a complete python script performing a fit of \n",
    "a quadratic function to data with errors in y-direction. \n",
    "\n",
    "Run the following cell by activating it via clicking, then type `shift+return`: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\" general example for fitting with kafe2\n",
    "      - set-up arrays for data\n",
    "      - perform fit (2nd order polynomial)\n",
    "      - show and save output\n",
    "\"\"\"\n",
    "# Imports  #\n",
    "from kafe2 import XYContainer, Fit, Plot\n",
    "import numpy as np, matplotlib.pyplot as plt\n",
    "\n",
    "\n",
    "### define the model function\n",
    "def poly2(x, a=1.0, b=0.0, c=0.0):\n",
    "    return a * x**2 + b * x + c\n",
    "\n",
    "\n",
    "### Workflow #\n",
    "# 1. the data\n",
    "x, y, e = (\n",
    "    [0.05, 0.36, 0.68, 0.80, 1.09, 1.46, 1.71, 1.83, 2.44, 2.09, 3.72, 4.36, 4.60],\n",
    "    [0.35, 0.26, 0.52, 0.44, 0.48, 0.55, 0.66, 0.48, 0.75, 0.70, 0.75, 0.80, 0.90],\n",
    "    [0.06, 0.07, 0.05, 0.05, 0.07, 0.07, 0.09, 0.10, 0.11, 0.10, 0.11, 0.12, 0.10],\n",
    ")\n",
    "\n",
    "# 2.  convert to kafe2 data structure and add uncertainties\n",
    "xy_data = XYContainer(x, y)\n",
    "xy_data.add_error(\"y\", e)  # independent erros y\n",
    "# --  set meaningful names\n",
    "xy_data.label = \"Beispieldaten\"\n",
    "xy_data.axis_labels = [\"x\", \"data & f(x)\"]\n",
    "\n",
    "# 3. create the Fit object\n",
    "my_fit = Fit(xy_data, poly2)\n",
    "# set meaningful names for model\n",
    "my_fit.model_label = \"Parabel-Modell\"\n",
    "\n",
    "# 4. perform the fit\n",
    "my_fit.do_fit()\n",
    "\n",
    "# 5. report fit results\n",
    "my_fit.report()\n",
    "\n",
    "# 6. create and draw plots\n",
    "my_plot = Plot(my_fit)\n",
    "my_plot.plot()\n",
    "\n",
    "# 7. show or save plots #\n",
    "## plt.savefig('kafe_fitexample.pdf')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The default output shows the data, the best-fit function and the uncertainty band of the model in\n",
    "graphical form, and text output with names, values and uncertainties of the best-fit values of the parameters as well as the value of $\\chi^2$ per degree of freedom and the $\\chi^2$-probability of\n",
    "the fit. \n",
    "Customisations of the output, like colors, marker type and size, axis labels are easily possible - \n",
    "see the kafe2 documentation and the tutorial *kafe2Tutorial.ipynb*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.5 A more complex *kafe2* example: comparing two models\n",
    "\n",
    "One of the strengths of *kafe2* is its ability to handle various kinds of errors and multiple fits\n",
    "to the same data set, which are ideally suited to compare different model assumptions. An example, \n",
    "taken from the *kafe2* jupyter tutorial, is shown in the code cell below. Compared to the previous\n",
    "example, errors in the x-direction are added, and two models are defined and fit to the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Comparison of two models with kafe2\n",
    "# -> insert code here\n",
    "\n",
    "\n",
    "# Our first model is a simple linear function:\n",
    "def linear_model(x, a, b):\n",
    "    return a * x + b\n",
    "\n",
    "\n",
    "# Our second model is a simple exponential function.\n",
    "# The kwargs in the function header specify parameter defaults.\n",
    "def exponential_model(x, A0=1.0, x0=5.0):\n",
    "    return A0 * np.exp(x / x0)\n",
    "\n",
    "\n",
    "# The data for this exercise:\n",
    "x = [19.8, 3.0, 5.1, 16.1, 8.2, 11.7, 6.2, 10.1]\n",
    "y = [23.2, 3.2, 4.5, 19.9, 7.1, 12.5, 4.5, 7.2]\n",
    "data2 = XYContainer(x_data=x, y_data=y)\n",
    "data2.add_error(axis=\"x\", err_val=0.3)\n",
    "data2.add_error(axis=\"y\", err_val=0.15, relative=True)\n",
    "\n",
    "data2.label = \"my data points\"\n",
    "data2.axis_labels = [\"my x values\", \"my y values\"]\n",
    "\n",
    "# Create 2 Fit objects with the same data but with different model functions:\n",
    "linear_fit = Fit(data2, model_function=linear_model)\n",
    "exponential_fit = Fit(data2, model_function=exponential_model)\n",
    "\n",
    "# Optional: Assign LaTeX strings to parameters and model functions.\n",
    "linear_fit.assign_parameter_latex_names(a=\"a\", b=\"b\")\n",
    "linear_fit.assign_model_function_latex_expression(\"{a}{x} + {b}\")\n",
    "linear_fit.model_label = \"my linear model\"\n",
    "exponential_fit.assign_parameter_latex_names(A0=\"A_0\", x0=\"x_0\")\n",
    "exponential_fit.assign_model_function_latex_expression(\"{A0} e^{{{x}/{x0}}}\")\n",
    "exponential_fit.model_label = \"my exponential model\"\n",
    "\n",
    "# Perform the fits:\n",
    "linear_fit.do_fit()\n",
    "exponential_fit.do_fit()\n",
    "\n",
    "# Optional: Print out a report on the result of each fit.\n",
    "linear_fit.report()\n",
    "exponential_fit.report()\n",
    "\n",
    "# Optional: Create a plot of the fit results using Plot.\n",
    "p = Plot(fit_objects=[linear_fit, exponential_fit], separate_figures=False)\n",
    "p.plot(fit_info=True)\n",
    "\n",
    "# Show the fit results:\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "Examining the graphical output, ist is clear that the linear model only marginally describes the data.\n",
    "The inspection of the $\\chi^2$-probability shows a value of only 5.3%, slightly above the usually chosen\n",
    "critical value of 5% for rejection of a model. On the other hand, the exponential model fits perfectly.\n",
    "\n",
    "**Proposed Exercise**:   \n",
    "You might try to add a quadratic model - you will see that this would also perfectly match the data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.6 Examining non-liner models\n",
    "\n",
    "A further, very useful feature of *kafe2* consists in producing profile scans of $S$ around the minimum and\n",
    "confidence contours for pairs of parameters. Our second model is non-linear in the sense that one of the\n",
    "parameters, $x_0$,  enters in a non-linear way. Such problems typically lead to a non-parabolic behaviour\n",
    "of $S$ as a function of the parameters, and to deviations of the contours corresponding to the one- and \n",
    "two- sigma levels from the elliptic form. \n",
    "\n",
    "Execute the code in the cell below and study the output. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Checking the nonlinear fits\n",
    "# -> enter code here\n",
    "\n",
    "from kafe2 import ContoursProfiler\n",
    "\n",
    "# Create contour plot and profiles for the exponential fit\n",
    "cpf = ContoursProfiler(exponential_fit)\n",
    "cpf.plot_profiles_contours_matrix(show_grid_for=\"contours\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case, the profiles of $\\Delta\\chi^2 = S(x, y, \\vec{p}) - S(x, y, \\vec{p_0}$ are \n",
    "approximated acceptably well up to $\\Delta\\chi^2$=1, but show stronger deviations for\n",
    "larger values. In practice, this means that the values of the parameter uncertainties are\n",
    "trustworthy, but should not be extrapolated to larger values - i.e. two-times the given\n",
    "uncertainty does not correspond to the two-sigma level. In case of stronger deviations\n",
    "another feature of *kafe2* should be used to determine confidence regions for the\n",
    "values of the fit parameters that are asymmetric around the best-fit values:\n",
    "\n",
    "```\n",
    "exponential_fit.report(asymmetric_parameter_errors=True)\n",
    "```\n",
    "\n",
    "The method  used to derive such confidence regions relies on the correspondence between $S$ and \n",
    "the negative logarithm of the likelihood function for Gaussian uncertainties on the data points\n",
    "(see explanation below). \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Section 2: Least Squares and Likelihood methods\n",
    "---\n",
    "\n",
    "The least-squares method, often also called \"$\\chi^2$-method\", is directly linked to the very general likelihood principle. Like $S$ in the section above, the likelihood $\\cal{L}$ is a function of the data, $\\vec{x},\\vec{y}$, and a model function $f(\\vec{x}; \\vec{p})$ characterised by a set of parameters $\\vec{p}$, i.e. \n",
    "$\\cal{L}\\left(\\vec{x}, \\vec{y}, f(\\vec{x}, \\vec{p})\\right)$. If the measurements are independent, the likelihood is constructed from the product of the values of the probability density function \n",
    "$\\cal{D}(\\vec{y}, f(\\vec{x}, \\vec{p}))$ describing the fluctuations of the data $\\vec{y}$ around \n",
    "the model prediction $f(\\vec{x}, \\vec{p})$. Very often, owing to the Central Limit Theorem of probability theory, $\\cal{D}$ is the Gaussian distribution. In the most general case, the\n",
    "multivariate Gaussian (or normal) distribution in $d$ dimensions is the appropriate one to use:\n",
    "\n",
    "> ${\\cal{D}}={\\cal{N}}\\left(\\vec{y}, V, f(\\vec{x}, \\vec{p}) \\right) = \n",
    "\\frac{1} {\\sqrt{(2\\pi)^{d} \\det(V)} } \\cdot \\exp\\left( -\\frac{1}{2} (\\vec{y}-\\vec{\\mu})^T V^{-1} (\\vec{y}-\\vec{\\mu}) \\right)$\n",
    "with mean values $\\vec{\\mu}=\\vec{f}\\left( \\vec{x}, \\vec{p} \\right)$. (Equ. 2.1)\n",
    "\n",
    "Instead of the product of likelihood factors, one uses the negative sum of the logarithm of the \n",
    "likelihood factors, and the \"negative log-likelihood\" or \"$nl\\cal L$\" for Gaussian distributed \n",
    "data becomes:\n",
    "\n",
    ">$ -2\\, \\ln{\\cal{L_{\\rm Gauß}}}\\left( \\vec y, V, \\vec{f}(\\vec x, \\vec {p})\\right) \\,=\\,\n",
    "  \\left(\\vec y - \\vec f(\\vec x; \\vec p ) \\right)^T V^{-1}\n",
    "  \\left(\\vec y - \\vec f(\\vec x; \\vec p ) \\right)\n",
    "  + \\ln(\\det(V)) + n\\,\\ln(2\\pi) \\,. $  (Equ. 2.2) \n",
    "\n",
    "Minimisation of this function w.r.t. $\\vec p$ results in the best estimate of the parameter values \n",
    "according to the likelihood principle. It can be shown that this estimation is asymptotically \n",
    "optimal and unbiased, i.e. the best among all possible estimators in the sense that is leads to the\n",
    "smallest variance and vanishing bias of the estimated parameter values in the limit of infinite\n",
    "sample sizes. These desirable properties already apply for small samples for the exponential \n",
    "family of distributions. The covariance matrix of the parameters for an optimal and unbiased\n",
    "likelihood-estimator is given by\n",
    "\n",
    ">${V_{ij}}^{-1}={\\left. \n",
    "\\frac {{\\partial}^2 \\ln{\\cal{L}}()}\n",
    "{\\partial p_i \\, \\partial dp_j} \n",
    "\\right|}_{\\hat p_i \\hat p_j}\\,$ \n",
    "where $\\hat p$ indicates the best-fit value of the parameter $p$ (Equ. 2.3).\n",
    "\n",
    "When minimizing $nl\\cal L$, constant terms can be omitted. If the covariance matrix is independent \n",
    "of the parameters, i.e. in the absence of relative errors or of errors in the $x$-direction,\n",
    "$-2 \\cdot \\ln{\\cal{L}}\\left(\\vec y, V, f(\\vec x, \\vec p)\\right)$ becomes equal to \n",
    "$S(\\vec{y},V f(\\vec x, \\vec p)$, the squared sum of redsiduals in Equ. 1.1. \n",
    "In other cases, the full likelihood as given in Equ. 2.2 must be used.\n",
    "\n",
    "The evaluation of the determinant in every step of an iterative numerical optimization is,\n",
    "however, computationally expensive. On modern computers and thanks to very efficient numerical\n",
    "libraries this is not a problem at all any more, and therefore still existing simplified, \n",
    "less optimal methods should be abandoned in favour of the correct likelihood treatment. \n",
    "Usage of the full $nl\\cal L$ is the default in the tools *kafe2* and in *phyFit*. \n",
    "These packages effectively fall back to the traditional $\\chi^2$-method in cases where\n",
    "it is equivalent, or on explicit user request.\n",
    "\n",
    "**Note** that sometimes $-2\\cdot \\ln {\\cal{L}}$ (\"$n2l\\cal L$\") is used instead of $nl\\cal L$ \n",
    "to keep the similarity to the $\\chi^2$ method. \n",
    "\n",
    "The **goodness-of-fit** is still correctly quantified by \n",
    "$\\chi^2\\left( S(\\vec y, V, \\vec f(\\vec x, \\vec{p}_0)), \\, n_{dof} \\right)$.\n",
    "This can be justified from the likelihood principle: the best-possible model would describe \n",
    "the data perfectly, i.e. all residuals are zero in this case, and the smallest possible value \n",
    "of $n2l\\cal L$ is $\\ln{\\det(V)}$. The difference between the observed best-fit value and the\n",
    "best possible value is therfore exactly $\\chi^2(S_{min}, \\, n_{dof})$. As we will see below, such\n",
    "differences in $n2l\\cal L$ are used to determine confidence intervals, and even in the general case\n",
    "$\\chi^2$ corresponds to such a $nl\\cal L$ difference and therefore is a good measure of the \n",
    "goodness-of-fit that can be translated to a $p$-value.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1 Gaussian likelihood in *kafe2*\n",
    "\n",
    "As long as the uncertainties are independent of the parameter values, the $nl\\cal L$ and $\\chi^2$\n",
    "methods are equivalent, and therefore fits with *kafe2* yield results that are - apart from \n",
    "numerical precision - identical to those obtained with other tools. \n",
    "The example in the code cell below illustrates the effect by re-using the data of the previous example, but this time adding a relative uncertainty. The two fit results refer to a fit where the relative uncertainties are defined w.r.t. the measured values, or in the other case to the model values. To make the difference in the uncertainties visible, the x-data are shifted by 0.02 in the second fit. The example is a slightly modified version of the one contained in the kafe2 jupyter\n",
    "tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# same linear fit as above, but with relative errors w.r.t. the model values.\n",
    "\n",
    "x_1 = np.array(x) - 0.02\n",
    "data2_1 = XYContainer(x_data=x_1, y_data=y)\n",
    "data2_1.add_error(axis=\"x\", err_val=0.3)\n",
    "## relative y-errors w.r.t. model are added below, via method of fit class\n",
    "\n",
    "data2_1.label = \"same data, rel. uncertainties w.r.t. model\"\n",
    "\n",
    "# Create Fit:\n",
    "linear_fit2_1 = Fit(data2_1, model_function=linear_model)\n",
    "# --> relative uncertainties with reference to model specified here:\n",
    "linear_fit2_1.add_error(axis=\"y\", err_val=0.15, relative=True, reference=\"model\")\n",
    "\n",
    "# Optional: Assign LaTeX strings to parameters and model functions.\n",
    "linear_fit2_1.assign_parameter_latex_names(x=\"x\", a=\"a\", b=\"b\")\n",
    "linear_fit2_1.assign_model_function_latex_expression(\"{a}{x} + {b}\")\n",
    "\n",
    "# Optional: Assign LaTeX strings to parameters and model functions.\n",
    "linear_fit2_1.assign_parameter_latex_names(a=\"a\", b=\"b\")\n",
    "linear_fit2_1.assign_model_function_latex_expression(\"{a}{x} + {b}\")\n",
    "linear_fit2_1.model_label = \"same model, rel. uncertainties w.r.t. model\"\n",
    "\n",
    "# Perform the fit:\n",
    "linear_fit2_1.do_fit()\n",
    "\n",
    "# Optional: print report.\n",
    "# linear_fit2_1.report(asymmetric_parameter_errors=True)\n",
    "\n",
    "# Optional: Create a plot of the fit results using Plot.\n",
    "p2_1 = Plot([linear_fit, linear_fit2_1])\n",
    "# Assign colors to data ...\n",
    "p2_1.customize(\"data\", \"marker\", [(0, \"o\"), (1, \"o\")])\n",
    "p2_1.customize(\"data\", \"markersize\", [(0, 5), (1, 5)])\n",
    "p2_1.customize(\n",
    "    \"data\", \"color\", [(0, \"grey\"), (1, \"red\")]\n",
    ")  # note: although 2nd label is suppressed\n",
    "p2_1.customize(\n",
    "    \"data\", \"ecolor\", [(0, \"grey\"), (1, \"red\")]\n",
    ")  # note: although 2nd label is suppressed\n",
    "# ... and model.\n",
    "p2_1.customize(\"model_line\", \"color\", [(0, \"mistyrose\"), (1, \"orange\")])\n",
    "p2_1.customize(\n",
    "    \"model_error_band\", \"label\", [(0, r\"$\\pm 1 \\sigma$\"), (1, r\"$\\pm 1 \\sigma$\")]\n",
    ")\n",
    "p2_1.customize(\"model_error_band\", \"color\", [(0, \"mistyrose\"), (1, \"orange\")])\n",
    "\n",
    "p2_1.plot(asymmetric_parameter_errors=True)\n",
    "\n",
    "# Create contour plot and profiles for the linear fit:\n",
    "cpf2_1 = ContoursProfiler(linear_fit2_1)\n",
    "cpf2_1.plot_profiles_contours_matrix(show_grid_for=\"contours\")\n",
    "\n",
    "# Show the fit results.\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There is a significant shift compared to the parameter uncertainties when taking relative uncertainties \n",
    "w.r.t. the model values. The reason is obvious, because clearly visibly different uncertainties are assigned \n",
    "to the data in the two cases. The profile-likelihood and the contour show that model-dependent uncertainties\n",
    "introduce some non-parabolic behaviour; even the fit of a linear function is no longer a \"linear\" problem and\n",
    "usually numerical methods are needed to find the optimum of $nl\\cal L$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 Confidence regions for parameters\n",
    "\n",
    "All fitting tools implement Equ. 1.3 to determine the covariance matrix describing the parameter\n",
    "uncertainties. It should be noted that correlations of parameter uncertainties are quite common \n",
    "and hard to avoid - and therefore the covariance or correlation matrix of the fit results must\n",
    "be examined and quoted as part of the results if correlations exceed ~10%.\n",
    "However, in non-linear cases the description of the conficence regions of the parameter values\n",
    "by the covariance matrix alone is not sufficient. An example was shown above \n",
    "for the exponential model - the scan of the likelihood contour deviated from a parabola, and the\n",
    "contour plot revealed a non-elliptical behaviour. In such cases Equ. 1.3 does not provide the \n",
    "a fully satisfying descritption of the confidence regions of the best-fit parameter values.\n",
    "In such cases, more advanced methods must be applied. \n",
    "\n",
    "The challenge is to estimate valid confidence regions for the fit parameters, taking into \n",
    "account parameter inter-dependencies, i.e. general correlations. Reporting confidence regions \n",
    "instead of point estimates of parameters with symmetric errors has now become standard in \n",
    "science. However, there are not many tools implementing this feature. \n",
    "\n",
    "One way out is to use simulated data with fluctuations as  given by the uncertainty model; \n",
    "this requires running a large number of fits and hence is computationally expensive, but exact. \n",
    "Another approach consists in marginalisation of the multi-dimensional likelihood function by \n",
    "integrating out all parameters except one and analysing its marginal distribution. This method \n",
    "also is computationally expensive. \n",
    "\n",
    "A more favourable method relies on the so-called profile likelihood, which is determined for each \n",
    "parameter in turn by varying it around the best-fit value and minimizing the likelihood w.r.t. all\n",
    "other parameters. The resulting curves are exactly the profile curves that where already shown\n",
    "in the *kafe2* examples above.\n",
    "A change in $nl\\cal L$ by $+\\frac{1}{2}$ corresponds to a range of $\\pm 1 \\sigma$ around the mean\n",
    "of a Gaussian distribution, i.e. a confidence region of 68.3%. Since the change in the value of the\n",
    "likelihood is invariant under parameter transformations, this procedure also works for non-parabolic\n",
    "(i.e. non-Gaussian) cases. The same method works, with a significantly larger numerical effort, \n",
    "for contours in two dimensions, which thus become \"confidence contours\". \n",
    "\n",
    "An efficient implementation of the profile-liklelihood method exists in the minimizer and \n",
    "uncertainty-examination tool MINUIT developed at CERN, which is used by *phyFit* and is the \n",
    "default option in *kafe2*. \n",
    "In *phyFit*, confidence ranges determined by profiling of the likelihood are the default, \n",
    "in kafe2 a special parameter is needed in the report function:\n",
    ">  Fit.report(asymmetric_parameter_errors=True)\n",
    "\n",
    "### Example: non-linear fit\n",
    "\n",
    "The code cell below show an extremely non-linear example. In such a case, asymmetric \n",
    "uncertainties must be reported as the fit result, and the confidence contours should be \n",
    "shown as well. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def exponential(x, A0=1, tau=1):\n",
    "    return A0 * np.exp(-x / tau)\n",
    "\n",
    "\n",
    "# define the data as simple Python lists\n",
    "x = [8.0e-01, 2.34, 1.44, 1.28, 2.84,3.49, 3.78, 4.56, 4.48, 5.38]\n",
    "xerr = 3.0e-01\n",
    "y = [4.35e-01, 1.51e-01, 8.1e-02, 1.7e-01, 5.3e-02, 1.98e-02, 2.07e-02, 1.230e-02, 9.7e-03, 2.41e-03]\n",
    "yerr = [9.3e-02, 4.1e-02, 3.8e-02, 3.9e-02, 3.1e-02, 1.04e-02, 0.95e-02, 0.89e-02, 0.79e-02, 0.20e-02]\n",
    "\n",
    "# create a fit object from the data arrays\n",
    "fit = Fit(data=[x, y], model_function=exponential)\n",
    "fit.add_error(axis=\"x\", err_val=xerr)  # add the x-error to the fit\n",
    "fit.add_error(axis=\"y\", err_val=yerr)  # add the y-errors to the fit\n",
    "\n",
    "fit.do_fit()  # perform the fit\n",
    "fit.report(\n",
    "    asymmetric_parameter_errors=True\n",
    ")  # print a report with asymmetric uncertainties\n",
    "\n",
    "# Optional: create a plot\n",
    "plot = Plot(fit)\n",
    "plot.plot(\n",
    "    asymmetric_parameter_errors=True, ratio=True\n",
    ")  # add the ratio data/function and asymmetric errors\n",
    "\n",
    "# Optional: create the contours profiler\n",
    "cpf = ContoursProfiler(fit)\n",
    "cpf.plot_profiles_contours_matrix()  # plot the contour profile matrix for all parameters\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3 Practical example with non-trivial covariance matrix: Characteristics of a diode\n",
    "\n",
    "In many cases the set of uncertainties needed to describe the behaviour of measruements\n",
    "is not trivial. There may be independent and/or correlated absolute and/or relative\n",
    "uncertainties in the x- and/or y-direction. Typically, more than one of these eight kinds\n",
    "of uncertainties are relevant for a given problem, and getting the covariance matrix\n",
    "right is often not an easy task.  \n",
    "*kafe2* supports constructing the full covariance matrix with the method \n",
    "\n",
    "> ``add_error(err_val, axis=?, name=None, correlation=0, relative=False, reference='data')``,\n",
    "\n",
    "which allows to specify various kinds of uncertainties for all or for sub-sets of the data\n",
    "one after the other. The resulting individual covariance matrices are all added to form the \n",
    "full covariance matrix used in the fit. This covariance matrix is re-calculated from the\n",
    "uncertainty components in each minimization step, if needed. \n",
    "\n",
    "As an example, consider a typical digital amperemeter or voltmeter. Decive characteristics\n",
    "are specified as 4000 Counts, +/-(0.5% + 3 digits), where the first, the relative calibration\n",
    "uncertainty, refers to the true value of the current or voltage and is fully correlated among \n",
    "all measurements, while sedond one, the digitisation uncertainty, is independent for each\n",
    "of the measurements. There often is an additional noise component, which also constitutes a set\n",
    "of independent uncertainties.\n",
    "\n",
    "The code in the cell below shows how these uncertainty components for a set of voltage\n",
    "and current measurements with a diode can be specified for a fit of the Shockley diode\n",
    "equation. Most of the code is indeed needed to specify the uncertainties, the fit itself\n",
    "and the ouput of the results are very similar to the examples discussed already.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"Fit of Shockley equation to I-U characteristic of a diode\n",
    "\"\"\"\n",
    "\n",
    "\n",
    "# model function to fit\n",
    "def Shockley(U, I_s=0.5, U0=0.03):\n",
    "    \"\"\"Parametrisation of a diode characteristic\n",
    "\n",
    "    U0 should be limited such that  U/U0<150 to avoid\n",
    "    exceeding the 64 bit floating point range\n",
    "\n",
    "    Args:\n",
    "      - U: Voltage (V)\n",
    "      - I_s: reverse saturation current (nA)\n",
    "      - U0: thermal voltage (V) * emission coefficient\n",
    "\n",
    "    Returns:\n",
    "      - float I: diode current (mA)\n",
    "    \"\"\"\n",
    "    return 1e-6 * I_s * np.exp((U / U0) - 1.0)\n",
    "\n",
    "\n",
    "# measurements:\n",
    "# voltmeter characteristics:\n",
    "#  - voltage, measurement range 2V\n",
    "voltages = [0.450, 0.470, 0.490, 0.510, 0.530, 0.550, 0.560, 0.570, 0.580, 0.590,\n",
    "            0.600, 0.610, 0.620, 0.630, 0.640, 0.645, 0.650, 0.655, 0.660, 0.665]\n",
    "# - current: 2 measurements in range 200µA, 12 in range 20mA and 6 in range 200mA\n",
    "currents = [0.056, 0.198, 0.284, 0.404, 0.739, 1.739, 1.962, 2.849, 3.265, 5.706,\n",
    "            6.474, 7.866, 11.44, 18.98, 23.35, 27.96, 38.61, 46.73, 49.78, 57.75]\n",
    "\n",
    "# create a data container\n",
    "diode_data = XYContainer(x_data=voltages, y_data=currents)\n",
    "diode_data.label = \"I vs. U \"\n",
    "diode_data.axis_labels = [\"Voltage (V)\", \"Current (mA)\"]\n",
    "\n",
    "# --- define uncertainties\n",
    "\n",
    "#  - precision voltmeter: 4000 Counts, +/-(0.5% + 3 digits)\n",
    "#     - range 2 V\n",
    "crel_U = 0.005\n",
    "Udigits = 3\n",
    "Urange = 2\n",
    "Ucounts = 4000\n",
    "deltaU = Udigits * Urange / Ucounts\n",
    "# - noise contribution delta U = 0.005 V\n",
    "deltaU_noise = 0.005\n",
    "# add voltage unertainties to data object\n",
    "diode_data.add_error(axis=\"x\", err_val=deltaU)\n",
    "diode_data.add_error(axis=\"x\", err_val=deltaU_noise)\n",
    "# note: relative uncertainties w.r.t. model to be added to fit object later\n",
    "\n",
    "#  - precision amperemeter: 2000 Counts, +/-(1.0% + 3 digits)\n",
    "#     - measurement ranges 200µA, 20mA und 200mA\n",
    "crel_I = 0.010\n",
    "Idigits = 3\n",
    "Icounts = 2000\n",
    "Irange1 = 0.2\n",
    "Irange2 = 20\n",
    "Irange3 = 200\n",
    "deltaI = np.asarray(\n",
    "    2 * [Idigits * Irange1 / Icounts]\n",
    "    + 12 * [Idigits * Irange2 / Icounts]\n",
    "    + 6 * [Idigits * Irange3 / Icounts]\n",
    ")\n",
    "#  noise contribution delta I = 0.050 mA\n",
    "deltaI_noise = 0.050\n",
    "# add current unertainties to data object\n",
    "diode_data.add_error(axis=\"y\", err_val=deltaI)\n",
    "diode_data.add_error(axis=\"y\", err_val=deltaI_noise)\n",
    "# note: relative uncertainties w.r.t. model to be added to fit object\n",
    "\n",
    "# --- start of fit\n",
    "\n",
    "# create Fit object\n",
    "ShockleyFit = Fit(diode_data, model_function=Shockley)\n",
    "ShockleyFit.model_label = \"Shockley equation\"\n",
    "\n",
    "# add relative errors with reference to model\n",
    "ShockleyFit.add_error(\n",
    "    axis=\"x\", err_val=crel_U, correlation=1.0, relative=True, reference=\"model\"\n",
    ")\n",
    "ShockleyFit.add_error(\n",
    "    axis=\"y\", err_val=crel_I, correlation=1.0, relative=True, reference=\"model\"\n",
    ")\n",
    "# to avoid overflow of 64 bit floats, limit U0\n",
    "ShockleyFit.limit_parameter(\"U0\", lower=0.005)\n",
    "\n",
    "ShockleyFit.do_fit()\n",
    "\n",
    "# create plot objec\n",
    "plotShockleyFit = Plot(ShockleyFit)\n",
    "plotShockleyFit.plot(asymmetric_parameter_errors=True)\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "# Section 3: Fitting Histograms\n",
    "---\n",
    "\n",
    "In quantum-, nuclear- and particle-physics measurements typically are rates of detected particles, classified according to energy, scattering angle, life time or others. The resulting data structure\n",
    "is a freqency spectrum, or a histogram, which shows the number of occurences per interval in the\n",
    "classifing variable, called a \"bin\". \n",
    "\n",
    "A graphical representation of a typical histogram is created by the code in the  cell below with\n",
    "the help of *maplotlib*. Also indicated are mean and standard deviation by the marker with error bar,\n",
    "and quantiles of the distribution by means of a boxplot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# histogram with matplotlib\n",
    "np.random.seed(1963539301)  # fix random seed (=> same date each time)\n",
    "x = np.random.randn(30)\n",
    "\n",
    "mu_x = x.mean()\n",
    "sig_x = x.std()\n",
    "bc, be, _ = plt.hist(x, 20, color=\"lightgrey\")\n",
    "print(\"entries per bin\", bc)\n",
    "print(\"bin edges\", be)\n",
    "plt.xlabel(\"x\")\n",
    "plt.ylabel(\"counts\")\n",
    "\n",
    "# show an error bar\n",
    "plt.errorbar(mu_x, 1.5, xerr=sig_x, fmt=\"o\")\n",
    "\n",
    "# show a boxplot horizontally with frequence distribution\n",
    "plt.boxplot(x, vert=False)\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1 Characteristics of histograms and naive fits\n",
    "\n",
    "The frequencies per bin, or the number of counts per bin, $n_i$ follow a Poisson distribution with expectation value $\\lambda_i$ and are (typically) independent of one another:\n",
    "\n",
    "> $P(n_i;\\lambda_i)=\\frac{{\\lambda_i}^{n_i}} {n_i\\,!} \\, {\\rm e}^{-\\lambda_i}$ (Equ. 3.1)\n",
    "\n",
    "The variances of the Poisson distributions are also given by the parameters $\\lambda_i$,\n",
    "${\\sigma_i}^2=\\lambda_i$.\n",
    "\n",
    "In cases of large sample sizes and no bins without entries, very often a Gaussian \n",
    "approximation is used, replacing the Poisson distributions by Gaussian distributions\n",
    "with the same expectation values $\\mu_i=\\lambda_i$and standard deviations \n",
    "$\\sigma_i = \\sqrt{\\lambda_i}$:\n",
    "\n",
    "> $P(n;\\lambda)=\\frac{{\\lambda}^n}{n\\,!}{\\rm e}^{-\\lambda}$ $\\to$ \n",
    "$\\frac{1}{\\sqrt{2\\pi\\,\\mu}} \\cdot \\exp\\left(-\\frac{(n-\\mu)^2)} {2\\mu}\\right)$\n",
    "(Equ. 3.2).\n",
    "\n",
    "With this substitution, the problem of fitting a probability density to a histogram\n",
    "becomes identical to the problem of fitting a function to data with Gaussian uncertainties.\n",
    "As a further simplification, the number of bin entries is taken\n",
    "as the value of the fitted function evaluated at the bin centre. With a small\n",
    "number of entries per bin, or, much more severe - for bins with zero entries,\n",
    "this approach does not work if the uncertainty is taken from the observed number\n",
    "of entries, $n_i$. Downward fluctuations lead to a smaller assigned uncertainty, \n",
    "and hence an increased weight in the fit. The result is a bias towards bins with\n",
    "downward fluctuations of the observation. Bins with zero entries have to be\n",
    "omitted in this approach, because data points with zero uncertainty would receive\n",
    "an infinite weight. A remedy is possible by iterating the fitting\n",
    "process - in a second fit, the uncertainties taken from the observation can\n",
    "be replaced by the uncertainties from the model values obtained in the pre-fit.\n",
    "As you can see, this is not simple at all, and full of risks! Nonetheless, \n",
    "lacking other simple tools, this method is quite often used.\n",
    "\n",
    "\n",
    "The **correct approach** is a fitting procedure using the Poisson likelihood\n",
    "with the parameter-dependent model prediction $\\lambda(\\vec p)$ as expectation values:\n",
    "\n",
    "> ${\\cal{L}}_{Poisson}=\\displaystyle\\prod_{i=1}^N{\\rm P}\\left(n_i;\\lambda_i(\\vec p)\\right)$ (Equ. 3.3). \n",
    "\n",
    "In practice, one again uses the negative logarithm of the likelihood,\n",
    "> $nl{\\cal{L}}_{Poisson}= -\\ln {\\cal{L}}_{Poisson} =\n",
    "  \\displaystyle\\sum_{i=1}^N - n_i \\cdot \\ln(\\mu_i(\\vec p))\\,+\\,\\mu_i(\\vec p)$\n",
    "  (Equ. 3.4),\n",
    "\n",
    "which is minimized w.r.t. the parameters.\n",
    "\n",
    "Unfortunately, there are not many implementations of this method. An example\n",
    "is the Class *THistogram* of the *Root* data analysis framework developed at CERN.\n",
    "\n",
    "*kafe2* and *phyFit*also support fitting a probability distribution function to histogram data,\n",
    "which, in most standard cases, is as easy as performing a $\\chi^2$ fit of a model function \n",
    "to data.\n",
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 Example: Histrogram fit with kafe2\n",
    "\n",
    "*kafe2* contains a dedicated class, *HistFit*, and a data container, *HistContainer*,\n",
    "to support fits of probability distribution functions to histogram data using \n",
    "${\\cal{L}}_{Poisson}$ as the cost function in the fit. This method is called\n",
    "a \"binned log-likelihood fit\". \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.2.1 Genearting histogram data\n",
    "\n",
    "Before starting, we need some data. The code fragment in the cell below generates \n",
    "simulated data for a histogram, in this case a typical example of a (Gaussian-shaped) peak on top of a flat background."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# parameters of data sample, signal and background parameters\n",
    "N = 200  # number of entries\n",
    "min = 0.0  # range of data, mimimum\n",
    "max = 10.0  # maximum\n",
    "s = 0.8  # signal fraction\n",
    "pos = 6.66  # signal position\n",
    "width = 0.33  # signal width\n",
    "\n",
    "\n",
    "def generate_data(N=100, min=0, max=1.0, pos=0.0, width=0.25, signal_fraction=0.1):\n",
    "    \"\"\"generate a random dataset:\n",
    "    Gaussian signal at position p with width w and signal fraction s\n",
    "    on top of a flat background between min and max\n",
    "    \"\"\"\n",
    "    # signal sample\n",
    "    data_s = np.random.normal(loc=pos, scale=width, size=int(signal_fraction * N))\n",
    "    # background sample\n",
    "    data_b = np.random.uniform(low=min, high=max, size=int((1 - signal_fraction) * N))\n",
    "    return np.concatenate((data_s, data_b))\n",
    "\n",
    "\n",
    "# generate a histogram data sample\n",
    "SplusB_data = generate_data(N, min, max, pos, width, s)\n",
    "\n",
    "# and show the histogram\n",
    "bc, be, _ = plt.hist(SplusB_data, bins=35, rwidth=0.9)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.2.2 Definition of a probability density function (pdf)\n",
    "\n",
    "The histogram is a finite sample following a probability density function (pdf). \n",
    "The pdf for the data generated above is shown in the cell below. \n",
    "Note that the pdf must be normalized, e.g. to a total area of one, for all posible \n",
    "values of the parameters!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def signal_plus_background(x, mu=3.0, sigma=2.0, s=0.5):\n",
    "    \"\"\"(normalized) pdf of a Gaussian signal on top of flat background\"\"\"\n",
    "    normal = np.exp(-0.5 * ((x - mu) / sigma) ** 2) / np.sqrt(2.0 * np.pi * sigma**2)\n",
    "    flat = 1.0 / (max - min)\n",
    "    return s * normal + (1 - s) * flat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.2.3 Performing the histogram fit with kafe2 using a binned likelihood\n",
    "\n",
    "Performing the fit with kafe2 is now very easy and follows the scheme already used for \n",
    "function fitting. The container for histogram data, HistContainer, produces\n",
    "a histogram from the input data. This object is passed to an object of the Fit class, \n",
    "which needs, of course, the pdf to be fitted to the histogram. \n",
    "Performing the fit, reporting the results in text and graphical form  work all\n",
    "as shown before. \n",
    "\n",
    "Execute the code in the cell below to see the result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"Fitting a density distribution to a histogram \n",
    "\"\"\"\n",
    "from kafe2 import HistContainer\n",
    "\n",
    "# Create a histogram from the dataset\n",
    "SplusB_histogram = HistContainer(n_bins=35, bin_range=(min, max), fill_data=SplusB_data)\n",
    "# create the Fit object by specifying a density function\n",
    "hist_fit = Fit(\n",
    "    data=SplusB_histogram, density=True, model_function=signal_plus_background\n",
    ")\n",
    "\n",
    "hist_fit.do_fit()  # do the fit\n",
    "hist_fit.report()  # Optional: print a report to the terminal\n",
    "\n",
    "# Optional: create a plot and show it\n",
    "hist_plot = Plot(hist_fit)\n",
    "hist_plot.plot(asymmetric_parameter_errors=True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Under the hood, *kafe2* does even more for you than was described above. \n",
    "*kafe2* by default uses the negative logarithm of the Poisson likelihood as the cost\n",
    "function. As stated avove, bins with small nubers of entries or no entries at all do\n",
    "not pose a problem in this case. \n",
    "There are, however, options to change the cost function, which allows you to\n",
    "define your own cost function.  \n",
    "By default, the model prediction for the number of entries in each bin is calculated\n",
    "as the integral of the pdf over the bin. \n",
    "The only loss in information compared to an unbinned fit (which we will discuss below),\n",
    "results from the binning process itself, which may lead to biases if the pdf is varying strongly\n",
    "over the bin range. Again, options exist to speed-up the fitting procedure by using the value \n",
    "of the cost function at the bin centre multiplied by the bin width. \n",
    "On modern computers, this should, however, rarely be necessary.\n",
    "\n",
    "Using a cost function with a Gaussian likelihood or even falling back to a\n",
    "simple $chi^2$ may be necessary in cases where additional uncertainties, like\n",
    "uncertainties in the model prediction, need to be added to the Poisson uncertainties.\n",
    "Often, histogram representations are also used to display quantities other than pure\n",
    "counting rates, e.g. rates corrected for detector acceptance of inefficiencies,\n",
    "or normalised rates. In such cases, the default $nl{\\cal L}_{Poisson}$ is \n",
    "inappropriate as a cost function and must be replaced by a customized one. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.2.3 Judging the fit quality\n",
    "\n",
    "Checking the level of agreement between histogram data and the model requires some\n",
    "thinking. By constructing a model that perfectly describes the data, given\n",
    "by a model perfectly agreeing with the observation, $n_i = \\lambda_i$ (a so-called\n",
    "\"fully saturated model\"), the best-possible value of the Likelihood, \n",
    "$nl{\\cal{L_0}}_{Poisson}$, can be calculated. The $nl\\cal L$-difference between\n",
    "the acutally obseved value of $nl{\\cal{L_0}}_{Poisson}$ can be used in order to define\n",
    "a reasonable measure for the quality of the fit. This is the same argument we used\n",
    "when justifying the use of $\\chi^2$ for fits with a full Gaussian likelihood. And \n",
    "indeed, the measure of goodness-of-fit defined in this way for binned likelihood fits\n",
    "to histogram data becomes equivalent to $\\chi^2$ for infinite sample sizes. In the \n",
    "text field of the *kafe2* graph the measure of goodness-of-fit is indicated by the label \n",
    "$-2\\ln{\\cal L}_R/$ndf, to be interpreted in analogy to $\\chi^2/$ndf used in case of \n",
    "Gaussian fits. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Section 4: Unbinned likelihood fits \n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When the sample size is small, representing the data as a histogram may not be appropriate, because the number of\n",
    "entries per bin could become too small, or the bin widths would have to be increased too much so that binning may \n",
    "introduce a bias. In such cases a fit to unbinned data can be performed using the most basic version of the \n",
    "Maximum Likelihood estimation.\n",
    "\n",
    "Given a probability distribution function ${\\rm pdf}(\\vec x; \\vec p)$ for $d$ independent data values $x_i$ \n",
    "and the array of parameters $\\vec p$, the likelihood is \n",
    "\n",
    "> ${\\cal L}(\\vec x, \\vec p) = \\displaystyle\\prod_{i=1}^d \\, {\\rm pdf}(x_i, \\vec p)$ (Equ. 4.1).\n",
    "\n",
    "The negative logarithm of this likelihood is\n",
    "\n",
    "> $nl{\\cal L} = -\\displaystyle\\sum_{i=1}^d \\, \\ln \\left({\\rm pdf}(x_i, \\vec p)\\right)$ (Equ. 4.2).\n",
    "\n",
    "$nl\\cal L$ can be directly used as the cost function and the best set $\\vec p_0$ of parameter values can be determined.\n",
    "Evaluation of parameter uncertainties, profiling of parameters and the extraction of confidence regions for the\n",
    "parameters all work as previously described. Note that the pdf must be normalised for all values of the parameters!\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4.1 Example of an unbinned likelihood fit\n",
    "\n",
    "We can use the data and pdf of example 3.2 and perform an unbinned fit this time.\n",
    "*kafe2* offers a special data container, *UnbinnedContainer*, for this purpose.\n",
    "Everything else in the fitting process is identical. \n",
    "\n",
    "Try the code in the cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"Fitting a density distribution to unbinned data\n",
    "\"\"\"\n",
    "from kafe2.fit import UnbinnedContainer\n",
    "\n",
    "unbinned_SplusB = UnbinnedContainer(SplusB_data)  # create the kafe data object\n",
    "\n",
    "# create the fit object and set the pdf for the fit\n",
    "unbinned_fit = Fit(data=unbinned_SplusB, model_function=signal_plus_background)\n",
    "\n",
    "unbinned_fit.do_fit()  # perform the fit\n",
    "\n",
    "unbinned_plot = Plot(unbinned_fit)  # create a plot object\n",
    "unbinned_plot.plot(asymmetric_parameter_errors=True)  # plot the data and the fit\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result is very similar to the one obtained from the binned liklihood fit. \n",
    "The computational effort of an unbinned fit rises with increased number of data\n",
    "points, while it remains constant for the binned version. For very large data\n",
    "sets, the unbinned fit may become unmanagable and a binned fit must be used\n",
    "instead. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4.2 A more complex example of an unbinned likelihood fit\n",
    "\n",
    "As another, very typical application we study the case of a life time measurement \n",
    "of an elementary particle. The data below stem from a real experiment: the numbers \n",
    "are time differences (in µs) between the registration of an incoming muon and \n",
    "a second, delayed pulse often caused by an electron from the decay of the stopped muon.\n",
    "Some delayed pulses are caused by another incoming muon or by noise in the detector."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\" the data for the myon life time example\"\"\"\n",
    "# real data from measurement with a Water Cherenkov detector (\"Kamiokanne\")\n",
    "#  numbers represent time differences (in µs) between the passage of a cosmic and\n",
    "#  the registration of a second pulse, often caused by an electron from the muon decay\n",
    "dT = [7.42, 3.773, 5.968, 4.924, 1.468, 4.664, 1.745, 2.144, 3.836, 3.132, 1.568, 2.352,\n",
    "      2.132, 9.381, 1.484, 1.181, 5.004, 3.06, 4.582, 2.076, 1.88, 1.337, 3.092, 2.265,\n",
    "      1.208, 2.753, 4.457, 3.499, 8.192, 5.101, 1.572, 5.152, 4.181, 3.52, 1.344, 10.29,\n",
    "      1.152, 2.348, 2.228, 2.172, 7.448, 1.108, 4.344, 2.042, 5.088, 1.02, 1.051, 1.987,\n",
    "      1.935, 3.773, 4.092, 1.628, 1.688, 4.502, 4.687, 6.755, 2.56, 1.208, 2.649, 1.012,\n",
    "      1.73, 2.164, 1.728, 4.646, 2.916, 1.101, 2.54, 1.02, 1.176, 4.716, 9.671, 1.692,\n",
    "      9.292, 10.72, 2.164, 2.084, 2.616, 1.584, 5.236, 3.663, 3.624, 1.051, 1.544, 1.496,\n",
    "      1.883, 1.92, 5.968, 5.89, 2.896, 2.76, 1.475, 2.644, 3.6, 5.324, 8.361, 3.052,\n",
    "      7.703, 3.83, 1.444, 1.343, 4.736, 8.7, 6.192, 5.796, 1.4, 3.392, 7.808, 6.344,\n",
    "      1.884, 2.332, 1.76, 4.344, 2.988, 7.44, 5.804, 9.5, 9.904, 3.196, 3.012, 6.056,\n",
    "      6.328, 9.064, 3.068, 9.352, 1.936, 1.08, 1.984, 1.792, 9.384, 10.15, 4.756, 1.52,\n",
    "      3.912, 1.712, 10.57, 5.304, 2.968, 9.632, 7.116, 1.212, 8.532, 3.000, 4.792, 2.512,\n",
    "      1.352, 2.168, 4.344, 1.316, 1.468, 1.152, 6.024, 3.272, 4.96, 10.16, 2.14, 2.856,\n",
    "      10.01, 1.232, 2.668, 9.176]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "The pdf of the problem is shown in the code cell below. Modeled is an exponential \n",
    "decay with mean life time *tau* on a flat background. Because the exponential\n",
    "fuction is only fit to life time values in a restricted range, the required nomalisation\n",
    "makes the pdf look somewhat complicated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def lt_pdf(t, tau=2.2, fbg=0.1, a=1.0, b=9.75):\n",
    "    \"\"\"\n",
    "    Probability density function for the decay time of a myon using the Kamiokanne-Experiment.\n",
    "    The pdf is normalised for the interval (a, b).\n",
    "\n",
    "    :param t: decay time\n",
    "    :param fbg: background\n",
    "    :param tau: expected mean of the decay time\n",
    "    :param a: the minimum decay time which can be measured\n",
    "    :param b: the maximum decay time which can be measured\n",
    "    :return: probability for decay time x\n",
    "    \"\"\"\n",
    "    pdf1 = np.exp(-t / tau) / tau / (np.exp(-a / tau) - np.exp(-b / tau))\n",
    "    pdf2 = 1.0 / (b - a)\n",
    "    return (1 - fbg) * pdf1 + fbg * pdf2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The fitting code itself is very similar to the one of example 3.1. Some additional lines\n",
    "of code illustrate how to set meaningful names to be displayed in the graphical output.\n",
    "In order to avoid unphysical regions to be explored during parameter variation \n",
    "in the fitting process, the background fraction *fbg* is limited between 0. and 1.\n",
    "The edges of the validity range, $a$ and $b$ are not subject to variations in the fit \n",
    "and must be fixed.  \n",
    "The generation of the profile likelihood and contour curves work exactly in the same way as\n",
    "for all other fits that have been discussed in this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from kafe2.fit import UnbinnedContainer\n",
    "\n",
    "lt_data = UnbinnedContainer(dT)  # create the kafe data object\n",
    "lt_data.label = \"lifetime measurements\"\n",
    "lt_data.axis_labels = [\"life time $\\\\tau$ (µs)\", \"Probability Density\"]\n",
    "\n",
    "# create the fit object and set the pdf for the fit\n",
    "lt_fit = Fit(data=lt_data, model_function=lt_pdf)\n",
    "\n",
    "# Fix the parameters a and b.\n",
    "# Those are responsible for the normalization of the pdf for the range (a, b).\n",
    "lt_fit.fix_parameter(\"a\", 1)\n",
    "lt_fit.fix_parameter(\"b\", 11.5)\n",
    "# constrain parameter fbg to avoid unphysical region\n",
    "lt_fit.limit_parameter(\"fbg\", 0.0, 1.0)\n",
    "\n",
    "# assign latex names for the parameters for nicer display\n",
    "lt_fit.model_label = \"exponential decay law + flat background\"\n",
    "lt_fit.assign_parameter_latex_names(tau=r\"\\tau\", fbg=\"f\", a=\"a\", b=\"b\")\n",
    "# assign a latex expression for the fit function for nicer display\n",
    "lt_fit.assign_model_function_latex_expression(\n",
    "    \"\\\\frac{{ (1-{fbg}) \\\\, e^{{-{t}/{tau}}} }}\"\n",
    "    \"{{{tau} \\\\, (e^{{-{a}/{tau}}}-e^{{-{b}/{tau}}}) }}\"\n",
    "    \"+ \\\\frac{{ {fbg}}} {{{b}-{a}}}\"\n",
    ")\n",
    "lt_fit.do_fit()  # perform the fit\n",
    "\n",
    "# lt_fit.report(asymmetric_parameter_errors=True)  # print a fit report to the terminal\n",
    "\n",
    "lt_plot = Plot(lt_fit)  # create a plot object\n",
    "lt_plot.plot(\n",
    "    fit_info=True, asymmetric_parameter_errors=True\n",
    ")  # plot the data and the fit\n",
    "\n",
    "# Optional: create contours & profile\n",
    "lt_cpf = ContoursProfiler(lt_fit, profile_subtract_min=False)\n",
    "# Optional: plot the contour matrix for tau and fbg\n",
    "lt_cpf.plot_profiles_contours_matrix(parameters=[\"tau\", \"fbg\"])\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Section 5: Multiple fits with common parameters\n",
    "\n",
    "It is quite common that a parameter of interest depends on other parameters, which are \n",
    "determined in an auxiliary measurement. One possibility to deal with\n",
    "such situations is to perform a common fit - the main and the auxiliary one - at the same\n",
    "time, a \"MultiFit\", as they are called in *kafe2*. \n",
    "\n",
    "Another use case of multi-fits arises when the same signal is measured under varying \n",
    "conditions, e.g. in different detector regions with different resolutions and background\n",
    "conditions.\n",
    "\n",
    "The first case is shown in the *kafe2* jupyter totorial (Example 8), the second one\n",
    "is explained here.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.1 Multifit of a Gaussian signal from two measruements with different resloution and background\n",
    "\n",
    "We start from the earlier example, the distribution of a signal on top of a flat background.\n",
    "Additional smearing is added to the original data to mimic the finite detector resolution. \n",
    "A second, similar data-set at the same position and with the same intrinsic width is generated, albeit with a differing number of signal events, a smaller signal fraction and less resolution \n",
    "semaring. This is done by the code snippet in the cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate two data sets with different numbes of events, resolution smearings and signal fractions\n",
    "\n",
    "# apply resolution smearing to data set SplusB_data\n",
    "r1 = 2 * width  # smearing twice as large as natural width\n",
    "SplusB_data1 = SplusB_data + np.random.normal(loc=0.0, scale=r1, size=len(SplusB_data))\n",
    "\n",
    "# generate a second data set at the same position and width,\n",
    "# but with smaller signal fraction, better resolution and more events\n",
    "s2 = 0.25\n",
    "N2 = 500\n",
    "r2 = width / 3.0\n",
    "SplusB_raw2 = generate_data(N2, min, max, pos, width, s2)\n",
    "SplusB_data2 = SplusB_raw2 + np.random.normal(loc=0.0, scale=r2, size=len(SplusB_raw2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The fit functions of the two fits are given in the cell below. Note that for parameters which\n",
    "are different in the two data sets different names must be chosen, here $res1$ and $sf1$ resp. \n",
    "$res2$ and $sf2$. This implies that we also have to use two model functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def SplusBmodel1(x, mu=5.0, width=0.3, res1=0.3, sf1=0.5):\n",
    "    \"\"\"pdf of a Gaussian signal at position mu, with natural width width,\n",
    "    resolution res1 and signal fraction sf1 on a flat background\n",
    "    \"\"\"\n",
    "    sigma2 = width * width + res1 * res1\n",
    "    normal = np.exp(-0.5 * (x - mu) ** 2 / sigma2) / np.sqrt(2.0 * np.pi * sigma2)\n",
    "    flat = 1.0 / (max - min)\n",
    "    return sf1 * normal + (1 - sf1) * flat\n",
    "\n",
    "\n",
    "def SplusBmodel2(x, mu=5.0, width=0.3, res2=0.3, sf2=0.5):\n",
    "    \"\"\"pdf of a Gaussian signal at position mu, with natural width width,\n",
    "    resolution res2 and signal fraction sf2 on a flat background\n",
    "    \"\"\"\n",
    "    sigma2 = width * width + res2 * res2\n",
    "    normal = np.exp(-0.5 * (x - mu) ** 2 / sigma2) / np.sqrt(2.0 * np.pi * sigma2)\n",
    "    flat = 1.0 / (max - min)\n",
    "    return sf2 * normal + (1 - sf2) * flat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fitting these models will not work yet, because the two components forming the total width are\n",
    "ambiguous. Later in the fit we must therefore use some constaints, i.e. \"external\" knowledge on \n",
    "the values and uncertainties of the detector resolution in the two cases. \n",
    "\n",
    "Setting up the multifit and exectuting it works as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"Perform a simultaneous fit to two binned distributions \n",
    "   with common parameters with kafe2.MultiFit()\n",
    "\"\"\"\n",
    "from kafe2 import MultiFit\n",
    "\n",
    "# Create histogram containers from the two datasets\n",
    "SplusB_histogram1 = HistContainer(\n",
    "    n_bins=30, bin_range=(min, max), fill_data=SplusB_data1\n",
    ")\n",
    "SplusB_histogram2 = HistContainer(\n",
    "    n_bins=50, bin_range=(min, max), fill_data=SplusB_data2\n",
    ")\n",
    "\n",
    "# create Fit objects by specifying their density functions with corresponding parameters\n",
    "hist_fit1 = Fit(data=SplusB_histogram1, model_function=SplusBmodel1, density=True)\n",
    "hist_fit2 = Fit(data=SplusB_histogram2, model_function=SplusBmodel2, density=True)\n",
    "# to make the fit unambiguous, external knowledge on the resolutions must be applied\n",
    "hist_fit1.add_parameter_constraint(name=\"res1\", value=r1, uncertainty=r1 / 3.0)\n",
    "hist_fit2.add_parameter_constraint(name=\"res2\", value=r2, uncertainty=r2 / 2.0)\n",
    "\n",
    "# combine the two fits to a MultiFit\n",
    "multi_fit = MultiFit(fit_list=[hist_fit1, hist_fit2])\n",
    "\n",
    "multi_fit.do_fit()  # do the fit\n",
    "# multi_fit.report()  # Optional: print a report to the terminal\n",
    "\n",
    "# Optional: create ouput graphics\n",
    "multi_plot = Plot(multi_fit, separate_figures=True)\n",
    "multi_plot.plot(asymmetric_parameter_errors=True)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As can be seen, the values of the position of the peak and its natural widh are determined \n",
    "by a simultaneous fit to two very different distributions. It is worth mentioning that,\n",
    "as an alternative, two separate fits could have been performed and the results combined \n",
    "using \"error propagation by hand\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Section 6: Sgifnificane and limit from profile likelihood\n",
    "\n",
    "With the help of the profile likelihood, more detailed statements can be made about a parameter \n",
    "than just ist best-fit value and uncertainty. \n",
    "A frequently encountered example is the search for a signal peak above a background, as we had \n",
    "already discussed above in section 5. The size of the signal, i.e. the value of the signal \n",
    "fraction $𝑠_f$ in a distribution, was clearly different from zero. In the case of smaller signals, \n",
    "the question arises as to whether they are \"significant\" at all or whether they could also be the \n",
    "result of a fluctuation in the background. A frequently used measure of the significance $𝑆$ of a \n",
    "signal strength \"$s$\" is that of its uncertainty $\\sigma_s$, i.e. the normalized distance of the \n",
    "value of the signal parameter from zero; this corresponds to the statement: \n",
    "\"We have observed a signal with a significance of z sigma\". The greater this distance, the less \n",
    "likely a statistical fluctuation of the background is the cause of the local excess of histogram\n",
    "entries, which is misinterpreted as a signal.\n",
    "\n",
    "If the course of the likelihood deviates from the parabolic shape at the minimum, this, however, \n",
    "is no longer a good measure. The distance from zero is then specified as the logarithm of the ratio \n",
    "of the profile likelihoods at the optimum, $s = \\hat{s}$ and for signal size $𝑠 = 0$.\n",
    "The square root of this quantity represents the so-called $𝑧$ value, which we introduced above and\n",
    "which expresses the equivalent number of standard deviations of a Gaussian distribution by which\n",
    "the expected value deviates from zero, or, as a forumula\n",
    "\\begin{equation}\n",
    "z^2 = \\left( -2\\ln{\\cal L}_p(s=0) \\right)\n",
    "                - \\left( -2\\ln{\\cal L}_p(s=\\hat{s}) \\right)           \n",
    "\\end{equation}\n",
    "\n",
    "$S = z$ is the significance in \"number $\\sigma_s$\", as we had expressed \n",
    "imprecisely above. It is better and more generally understandable\n",
    "if you give the corresponding $p$ value, which can be obtained by calculating the\n",
    "corresponding value of the cumulative distribution density of the\n",
    "Standard Normal Distribution,\n",
    "$ {\\cal N} (x) = 1 / \\sqrt {2 \\pi} \\cdot \\exp {(- x ^ 2/2)} $\n",
    "using the function\n",
    "$p_s$ = *scipy.stats.norm.cdf*($z$).\n",
    "Usually one chooses the $p$ value with respect to the null hypothesis, \n",
    "$ p_0 = 1 - p_s$,\n",
    "which quantifies the agreement of the observation with the null hypothesis.\n",
    "The smaller the value of $p_0$, the worse is the null hypothesis and hence\n",
    "more certain is the existence of the observed signal.\n",
    "\n",
    "If the signal is not significant enough, one may consider specifying an upper limit\n",
    "for the signal size, usually expressed as a \"95% confidence level. Translated into a \n",
    "change in profile likelihood relative to its value at the optimum, this limit is \n",
    "again determined through the use of the corresponding quantile of the standard normal \n",
    "distribution, which is obtained via the function $ z_ {cl} $ = *scipy.statsnorm.ppf(cl)* \n",
    "(ppf = percent point function, the inverse of the cumulative distribution function cdf).\n",
    "The associated change in the logarithm of the profile likelihood is $z_{cl}^2$\n",
    "and corresponds to the value 2.71 for cl = 95%. \n",
    "\n",
    "An analysis of at typical Profile-Likelihood curve is illustrated in the code cell below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# a typical profile Likelihood\n",
    "#     may be obtained from\n",
    "#      - kafe2:  method get_profile() of class ContoursProfiler\n",
    "#      - phyFit: method getProfile() of class mnFit\n",
    "\n",
    "vals = np.array([0.0, 0.00459184, 0.00918367, 0.01377551, 0.01836735, 0.02295918, 0.02755102,\n",
    "                 0.03214286, 0.03673469, 0.04132653, 0.04591837, 0.0505102, 0.05510204, 0.05969388,\n",
    "                 0.06428571, 0.06887755, 0.07346939, 0.07806122, 0.08265306, 0.0872449, 0.09183673,\n",
    "                 0.09642857, 0.10102041, 0.10561224, 0.11020408, 0.11479592, 0.11938776, 0.12397959,\n",
    "                 0.12857143, 0.13316327, 0.1377551, 0.14234694, 0.14693878, 0.15153061, 0.15612245,\n",
    "                 0.16071429, 0.16530612, 0.16989796, 0.1744898, 0.17908163, 0.18367347, 0.18826531,\n",
    "                 0.19285714, 0.19744898, 0.20204082, 0.20663265, 0.21122449, 0.21581633, 0.22040816,\n",
    "                 0.225])\n",
    "n2lLp = np.array([4.59945140, 4.07173212, 3.58924819, 3.14842505, 2.74614514, 2.37967154, 2.04658728,\n",
    "                  1.74474652, 1.47223486, 1.22733686, 1.00850913, 8.14357873e-01, 6.43620175e-01,\n",
    "                  4.95148053e-01, 3.67894990e-01, 2.60904401e-01, 1.73299730e-01, 1.04275900e-01,\n",
    "                  5.30919061e-02, 1.90643612e-02, 1.56185396e-03, 0., 1.38370828e-02, 4.25702024e-02,\n",
    "                  8.57318619e-02, 1.42886933e-01, 2.13629951e-01, 2.97582697e-01, 3.94392028e-01,\n",
    "                  5.03727936e-01, 6.25281791e-01, 7.58764759e-01, 9.03906377e-01, 1.06045325,\n",
    "                  1.22816788, 1.40682761, 1.59622361, 1.79616005, 2.00645321, 2.22693083, 2.45743135,\n",
    "                  2.69780332, 2.94790483, 3.20760299, 3.47677341, 3.75529980, 4.04307351, 4.33999319,\n",
    "                  4.64596441, 4.96089937])\n",
    "\n",
    "# analyze profile likelihood\n",
    "def analyze_signal(vals, n2lLp):\n",
    "    from scipy.stats import norm\n",
    "    from scipy.interpolate import CubicSpline\n",
    "    from scipy.optimize import newton\n",
    "\n",
    "    mx = np.amax(vals)\n",
    "    pmin = vals[np.argmin(n2lLp)]\n",
    "    n2lLp_spline = CubicSpline(vals[vals >= 0], n2lLp[vals >= 0])\n",
    "\n",
    "    # Determine significance of parameter\n",
    "    z = np.sqrt(n2lLp_spline(0.0))\n",
    "    p_0 = 1.0 - norm.cdf(z)\n",
    "\n",
    "    # Determin upper limit on parameter\n",
    "    z_95 = norm.ppf(0.95)  #  z-value for 95%\n",
    "\n",
    "    def f(x):\n",
    "        return n2lLp_spline(x) - z_95**2\n",
    "\n",
    "    s95 = newton(f, x0=pmin, x1=mx)\n",
    "\n",
    "    # - plot profile likelihood\n",
    "    plt.plot(vals, n2lLp)\n",
    "    plt.xlabel(\"Signal fraction s\", size=\"x-large\")\n",
    "    plt.ylabel(\"$-2 \\Delta \\ln L_p$\", size=\"x-large\")\n",
    "\n",
    "    # - plot significance\n",
    "    z2 = z * z\n",
    "    plt.arrow(0, 0, 0, z2, head_width=0.01, color=\"orange\")\n",
    "    plt.text(-mx / 15, z2, \" $z^2$\", size=\"large\", color=\"sienna\")\n",
    "\n",
    "    # - plot limit\n",
    "    plt.hlines(z_95**2, 0.66 * mx, mx, color=\"maroon\", linestyle=\"dotted\")\n",
    "    plt.vlines(s95, 0, n2lLp_spline(mx), color=\"maroon\")\n",
    "    plt.text(\n",
    "        mx,\n",
    "        z_95**2,\n",
    "        \" ${{z_{{95}}}}^2=$\" + str(\"{:.3g}\".format(z_95**2)),\n",
    "        size=\"large\",\n",
    "        color=\"darkred\",\n",
    "    )\n",
    "    plt.text(s95, 0.0, \" 95% limit\", size=\"large\", color=\"darkblue\")\n",
    "\n",
    "    return z, p_0, s95\n",
    "\n",
    "\n",
    "# perform analysis of signal properties using function\n",
    "z, p_0, s95 = analyze_signal(vals, n2lLp)\n",
    "print(\"\\n\")\n",
    "print(\" == Analysis of profile likelihood\")\n",
    "print(\"  Significance, z-value: {:.3g}\".format(z))\n",
    "print(\"  p-value of Null-hypothesis: {:.3g}\".format(p_0))\n",
    "print(\"  95% limit:  < {:.3g}\".format(s95))\n",
    "print(\"\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Concluding remarks\n",
    "\n",
    "I hope you found the advanced examples of fitting models to data with *kafe2*\n",
    "informative, useful and enlightening. The code may serve as the basis for your\n",
    "own work. \n",
    "\n",
    "In case you find any problems with *kafe2*, have ideas about missing features or\n",
    "want to contribute own developments, get in contact with the  authors on the \n",
    "[*kaf2 home page on github*](https://github.com/dsavoiu/kafe2/) and open an issue.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}