{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "# Moderne Methoden der Datenanalyse SS2023\n",
    "# Practical Exercise 1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## A Short Introduction to Jupyter Notebooks\n",
    "\n",
    "You can find some detailed information on the jupyter notebook concept in the [documentation of the project](https://jupyter-notebook.readthedocs.io/en/stable/).\n",
    "\n",
    "**Here are some basic instructions:**\n",
    "- Each code block, a so called \"cell\", can be executed by pressing **shift + enter**.\n",
    "- You can run multiple cells by marking them and then pressing **shift + enter** or via the options in the \"Run\" menu in the top bar.\n",
    "- The order of execution matters! The order in which the cells have been executed is indicated by the integers in the brackets to the left of the cell. For instance `In [1]` was executed first. This means that code at the end of the notebook can affect the code at the beginning, if the cells at the beginning are executed after the cells at the end.\n",
    "- You can change between three cell types in Jupyter Lab: \"Code\", \"Markdown\" or \"Raw\".\n",
    "    * The \"Code\" cells will be interpreted by Python.\n",
    "    * The \"Markdown\" cells will be rendered with [Markdown](https://www.markdownguide.org/) and can be used for documentation and text, such as this cell. You can use them for your answers and also add LaTeX formulas such as $f(x) = \\frac{1}{x}$. If you double click **this** Markdown cell you can see the raw code of this LaTeX equation. By pressing **shift + enter** the cell will be rendered with Markdown again.\n",
    "    * The \"Raw\" cells won't be interpreted at all.\n",
    "- If you want to reset your notebook, to *\"forget\"* all the defined functions, classes and variables, go to `Kernel -> Restart Kernel` in the top bar. Your code and text will remain untouched when doing this.\n",
    "- If you write or read files and provide only the file name, the notebook will look for the file in the directory it is located itself. Use relative paths or absolute paths to read or write files from or to somewhere else.\n",
    "\n",
    "For some more information on the JupyterLab interface and some useful shortcuts you can check out:\n",
    "- [the JupyterLab interface documentation](https://jupyterlab.readthedocs.io/en/stable/user/interface.html)\n",
    "- [an overview of some shortcuts](https://yoursdata.net/jupyter-lab-shortcut-and-magic-functions-tips/)\n",
    "- [and another shortcut overview](https://blog.ja-ke.tech/2019/01/20/jupyterlab-shortcuts.html)\n",
    "\n",
    "If you have any issues with the notebook or additional questions, contact your tutors or try google."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## Exercise 1\n",
    "\n",
    "To complete the exercises, follow the steps described within this notebook and fill in the blank parts of the code.\n",
    "\n",
    "You can make use of common Python packages such as numpy or pandas for handling of data and matplotlib for plotting. Alternatively, you can use CERN's ROOT to solve the exercises. Hints for both approaches will be provided.\n",
    "\n",
    "**Some of the cells in this template will throw errors, as some code is missing!**\n",
    "**It is your job to add the code!**\n",
    "\n",
    "**You do not have to implement both the Python and the ROOT approach to the exercises! Simply delete the cells containing the templates and hints containing the approach you choose not to use.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Exercise 1.1\n",
    "\n",
    "Write a code snippet (function, class, etc.) that\n",
    "- creates `N` Gaussian distributed random numbers with mean `m=0` and a standard deviation of sigma `s=1` and\n",
    "- plots these numbers as a histogram.\n",
    "\n",
    "The parameter `N` should be an argument of the code snippet."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### Python Approach:\n",
    "If you want to use the Python approach, you should have a look at the package `numpy` and the therein provided method [`numpy.random.normal`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html) in particular. This method can create a numpy array of random numbers.\n",
    "\n",
    "A simple way to visualize these numbers is the [`hist`](https://matplotlib.org/3.2.1/api/_as_gen/matplotlib.pyplot.hist.html) method provided by the matplotlib sub-package pyplot."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "def create_gaussian_histogram(N, mean=0., sigma=1.):\n",
    "    # Create a numpy array with gaussian distributed numbers\n",
    "    random_numbers = 1 # TODO: This should be an array of N gaussian distributed random numbers\n",
    "    \n",
    "    # Visualize the content of the array as histogram with the help of matplotlib.\n",
    "    # TODO\n",
    "    \n",
    "    # Return the numpy array\n",
    "    return random_numbers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "gaussian_numbers = create_gaussian_histogram(N=100000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### ROOT Approach:\n",
    "If you want to use `ROOT`, you should check out [`gRandom->Gaus()`](https://root.cern.ch/doc/master/classTRandom.html#a0e445e213eae1343b3d22086ecb87314). You can use this method to [fill](https://root.cern.ch/doc/master/classTH1.html#a77e71290a82517d317ea8d05e96b6c4a) a `ROOT` [`TH1F` histogram](https://root.cern.ch/doc/master/classTH1F.html) one by one with the random numbers.\n",
    "\n",
    "Alternatively you can use the [`FillRandom`](https://root.cern.ch/doc/master/classTH1.html#a1e9d6258ae798a0eb52aef58a72758a5) method to fill the histogram directly with Gaussian distributed numbers, e.g. `my_hist.FillRandom(\"gaus\", 1000)`. If you want to use a ROOT's gauss function with other values for mean and sigma than the default values, you have to define a [one dimensional function `TF1`](https://root.cern.ch/doc/master/classTF1.html) with the respective parameters first:\n",
    "```python\n",
    "gaussian = TF1(\"gaussian\",\"gaus\",-3,3)\n",
    "gaussian.SetParameters(1,0,1)  # last parameter is sigma, second to last is the mean of the gaussian.\n",
    "```\n",
    "\n",
    "Plotting with ROOT in jupyter notebooks is a bit tricky, which is why we will give you some detailed hints on how to do this.\n",
    "First of all, the plot will not be shown if it is created in a function. Thus, your function should return the created `TH1F` object and you can then draw it on a canvas:\n",
    "```python\n",
    "canvas = TCanvas(\"c1\", \"c1\")\n",
    "\n",
    "root_histogram = create_gaussian_histogram_with_root(N=1000)\n",
    "\n",
    "root_histogram.Draw()\n",
    "canvas.Modified()\n",
    "canvas.Update()\n",
    "canvas.Draw()\n",
    "```\n",
    "\n",
    "A second problem arises if you try to run the code a second time, because the created ROOT objects are still present in the notebook and jupyter will not be able to create them again unless you delete them first, e.g. with a code snippet such as:\n",
    "```python\n",
    "try:\n",
    "    del canvas\n",
    "except NameError:\n",
    "    pass\n",
    "```\n",
    "This will delete the ROOT object `canvas` if it was already created. If it has not been created, yet, the `try`-`except` approach will catch the `NameError` that will be thrown, as the object `canvas` does not exist, yet. In this case nothing will be done (look up what the python build-in keyword `pass` does). You can also restart the jupyter kernel to achieve this, but this is rather inconvenient."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from ROOT import TH1F, gRandom, TFile, TCanvas, TF1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "def create_gaussian_histogram_with_root(N, mean=0., sigma=1.):\n",
    "    # Create an histogram with 20 bins from -3 to 3\n",
    "    root_histogram = TH1F(\"myHisto\", \"Histogram containing random numbers\", 20, -3, 3)\n",
    "  \n",
    "    # Initialize the random numbers generator\n",
    "    gRandom.SetSeed(1234)\n",
    "  \n",
    "    # Generate N random numbers following a gaussian distribution and fill the histogram with them\n",
    "    # TODO: Fill the histogram\n",
    "    \n",
    "    return root_histogram"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "try:\n",
    "    del c1\n",
    "except NameError:\n",
    "    pass\n",
    "\n",
    "try:\n",
    "    del root_histogram\n",
    "except NameError:\n",
    "    pass\n",
    "\n",
    "c1 = TCanvas(\"c1\", \"c1\")\n",
    "\n",
    "root_histogram = create_gaussian_histogram_with_root(N=100000)\n",
    "\n",
    "# TODO: Draw the histogram onto the canvas c1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Exercise 1.2\n",
    "\n",
    "Extend your code from Exercise 1.1 so that the histogram data is written to a file.\n",
    "\n",
    "This step is a bit more tricky if you are following the python approach, so please take a look at the hints. If you are using ROOT, you can write your `TH1F` histogram directly to a ROOT file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### Python Approach:\n",
    "\n",
    "Numpy does not provide a dedicated class for histograms as ROOT does. In Exercise 1.1 we created an array containing the random numbers and visualized these numbers as histogram.\n",
    "\n",
    "You can use numpy's [`numpy.histogram`](https://numpy.org/doc/stable/reference/generated/numpy.histogram.html) method to interpret the data as histogram. However, this method will not create or return a \"histogram\" object, it will simply give you the bin counts and the bin edges of the histogram in form of two numpy arrays. You will have to decide yourself how to handle these return values.\n",
    "\n",
    "**Option 1) Store the data, not the histogram.**\n",
    "    \n",
    "You can use [`numpy.save`](https://numpy.org/doc/stable/reference/generated/numpy.save.html) or [`pandas.to_hdf`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_hdf.html) (you have to convert your data into the [`pandas.DataFrame`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) or the [`pandas.Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html) format for this) to save the data you created directly. You can also write the data to a csv file or any other format you might be familiar with.\n",
    "Using this option, you have to recreate the histogram again, because you stored the raw data and not the histogram interpretation. This has the advantage of being able to reinterpret the data.\n",
    "    \n",
    "**Option 2) Define a histogram object that can be stored to a file.**\n",
    "\n",
    "Storing the data in form of a histogram has two advantages: Firstly, it is clear how the data should be interpreted, and secondly, when handling large amounts of data, the histogram is a storage efficient way to save the data. Instead of saving each individual data point, only the bin count and the bin edges are stored. This means, if you consider a histogram with `n` bins, `2*n+1` (1 bin count per bin -> `n` bin counts; and `n+1` bin edge positions) numbers have to be stored. Keep in mind, however, that the amount of information stored in the histogram is also reduced compared to the full raw data set, as you only store one interpretation of the data.\n",
    "\n",
    "You can also use Python's [`pickle`](https://docs.python.org/3/library/pickle.html) to directly dump the tuple of bin counts and bin edges returned by the numpy histogram method to store these python objects in a pickle file. See also the [example given on the python website](https://docs.python.org/3/library/pickle.html#examples) which shows how a python dictionary is written and loaded again.\n",
    "\n",
    "The downside of these methods is, that you have to put a bit more effort into defining the object that is stored.\n",
    "\n",
    "You can try to implement both options, but focus on the **Option 2)**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# First save the data via one of the methods described in Option 1:\n",
    "\n",
    "# Save the data..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "**Have a look at the custom histogram class `HistogramClass` below and try to understand how it works and which features it provides!**\n",
    "\n",
    "You can use this class or try your own method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# Define a histogram object\n",
    "import pandas as pd\n",
    "\n",
    "class HistogramClass:\n",
    "    def __init__(self, data, bins=20, bin_range=None):\n",
    "        if isinstance(data, tuple) and all(isinstance(e, pd.Series) for e in data):\n",
    "            self._bins = len(data[0].index)\n",
    "            self._bin_counts = data[0].values\n",
    "            self._bin_edges = data[1].values\n",
    "            self._mean = data[2].values[0]\n",
    "            self._std = data[2].values[1]\n",
    "            self._entries = data[2].values[2]\n",
    "            self._underflow = data[2].values[3]\n",
    "            self._overflow = data[2].values[4]\n",
    "        elif isinstance(data, np.ndarray) and isinstance(bins, int):\n",
    "            self._bins = bins\n",
    "            bin_counts, bin_edges = np.histogram(data, bins=bins, range=bin_range)\n",
    "            self._bin_counts = bin_counts\n",
    "            self._bin_edges = bin_edges\n",
    "            \n",
    "            if bin_range is not None:\n",
    "                bounds = (data >= bin_range[0]) & (data <= bin_range[1])\n",
    "            else:\n",
    "                bounds = np.full(shape=data.shape, fill_value=True)\n",
    "            self._mean = np.mean(data[bounds])\n",
    "            self._std = np.std(data[bounds])\n",
    "            self._entries = len(data[bounds])\n",
    "            \n",
    "            self._underflow = 0 if bin_range is None else len(data[data < bin_range[0]])\n",
    "            self._overflow = 0 if bin_range is None else len(data[data > bin_range[1]])\n",
    "        else:\n",
    "            raise ValueError(\"The parameter 'data' must be a 1 dimensional numpy array and the parameter 'bins' an integer!\")\n",
    "    \n",
    "    @property\n",
    "    def bins(self):\n",
    "        return self._bins\n",
    "    \n",
    "    @property\n",
    "    def bin_edges(self):\n",
    "        return self._bin_edges\n",
    "    \n",
    "    @property\n",
    "    def bin_mids(self):\n",
    "        return (self._bin_edges[1:] + self._bin_edges[:-1]) / 2.\n",
    "    \n",
    "    @property\n",
    "    def bin_counts(self):\n",
    "        return self._bin_counts\n",
    "    \n",
    "    @property\n",
    "    def mean(self):\n",
    "        return self._mean\n",
    "    \n",
    "    @property\n",
    "    def std(self):\n",
    "        return self._std\n",
    "    \n",
    "    @property\n",
    "    def entries(self):\n",
    "        return self._entries\n",
    "    \n",
    "    @property\n",
    "    def underflow(self):\n",
    "        return self._underflow\n",
    "    \n",
    "    @property\n",
    "    def overflow(self):\n",
    "        return self._overflow\n",
    "    \n",
    "    def draw(self, *args, **kwargs):\n",
    "        plt.hist(x=self.bin_mids, bins=self.bin_edges, weights=self.bin_counts, *args, **kwargs)\n",
    "        \n",
    "    def save(self, file_path):\n",
    "        with pd.HDFStore(path=file_path, mode=\"w\") as hdf5store:\n",
    "            hdf5store.append(key=\"bin_edges\", value=pd.Series(self.bin_edges))\n",
    "            hdf5store.append(key=\"bin_counts\", value=pd.Series(self.bin_counts))\n",
    "            meta_info = pd.Series([self.mean, self.std, self.entries, self.underflow, self.overflow])\n",
    "            hdf5store.append(key=\"meta_info\", value=meta_info)\n",
    "        \n",
    "    @classmethod\n",
    "    def load(cls, file_path):\n",
    "        with pd.HDFStore(path=file_path, mode=\"r\") as hdf5store:\n",
    "            bin_edges = hdf5store.get(key=\"bin_edges\")\n",
    "            bin_counts = hdf5store.get(key=\"bin_counts\")\n",
    "            meta_info = hdf5store.get(key=\"meta_info\")\n",
    "        assert len(bin_counts.index) + 1 == len(bin_edges.index)\n",
    "        \n",
    "        instance = cls(data=(bin_counts, bin_edges, meta_info))\n",
    "        return instance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# TODO: Initialize a HistogramClass filled with your data\n",
    "# TODO: Draw and save the histogram."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### ROOT Approach:\n",
    "\n",
    "With ROOT you can just store the `TH1F` object to a ROOT file using the build-in methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "root_file = TFile(\"root_histogram_with_gaussian_random_numbers.root\", \"recreate\")\n",
    "\n",
    "# TODO: Write the ROOT histogram to the root_file"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Exercise 1.3\n",
    "\n",
    "Load the histogram you wrote to disk in Exercise 1.2 again and plot it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### Python Approach:\n",
    "\n",
    "Using the interface of the `HistogramClass` this is now easy. You can initialize a `HistogramClass` instance from a given file by using the class method `HistogramClass.load`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# TODO: Load and draw your histogram again"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### ROOT Approach:\n",
    "\n",
    "Create the `TFile` object for the file you saved earlier and get your ROOT `TH1F` histogram from it. Use a canvas as described in Exercise 1.1 to draw the loaded histogram."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# TODO: Load and your ROOT histogram again and draw it to a new canvas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### Compare the sizes of the different files, if you made the effort to implement more than one approach to store a histogram and/or the raw data\n",
    "\n",
    "You can use for instance [`os.path.getsize`](https://docs.python.org/3/library/os.path.html#os.path.getsize) or `!ls -lh` to do this. Evaluate the file sizes for different amounts of gaussian random numbers `N`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "# TODO: Try any of the methods to check the file sizes of your histograms or raw data,\n",
    "#       if you saved them in different formats"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Exercise 1.4\n",
    "\n",
    "Fit a Gaussian function to the histogram(s) you created in the previous exercises."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### Python Approach:\n",
    "\n",
    "To perform a fit to your numpy histogram, you can use the fitting tools provided in the [`scipy.optimize` package](https://docs.scipy.org/doc/scipy/reference/optimize.html), for instance the [`curve_fit` method](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html). You can find an example on how to use it [here](https://riptutorial.com/scipy/example/31081/fitting-a-function-to-data-from-a-histogram).\n",
    "\n",
    "You will have to define the function describing the gaussian distribution you want to fit to the histogram. To plot this function you can use the [`matplotlib.pyplot.plot` plot function](https://matplotlib.org/3.2.1/api/_as_gen/matplotlib.pyplot.plot.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from scipy.optimize import curve_fit\n",
    "\n",
    "# TODO: Define your Gaussian fit function\n",
    "\n",
    "# TODO: Implement the fit\n",
    "\n",
    "# Plot the histogram and the fitted function.\n",
    "# TODO: Draw the histogram again\n",
    "# TODO: Plot the fitted gaussian into the plot of the histogram"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### ROOT Approach:\n",
    "\n",
    "Define a ROOT `TF1` `gaus` function to be fitted to the ROOT histogram. To perform the fit, use the predefined [`Fit` method of the ROOT histogram](https://root.cern.ch/doc/master/classTH1.html#a63eb028df86bc86c8e20c989eb23fb2a).\n",
    "\n",
    "Draw the histogram again onto a new canvas using the same approach as above to visualize the result of the fit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# Define a gaussian function\n",
    "# TODO: Define the TF1 gaussian function to be fitted to the histogram\n",
    "\n",
    "# Fit the histogram with this function\n",
    "# TODO: Perform the fit of the gaussian to the ROOT root_histogram\n",
    "\n",
    "c3 = TCanvas(\"c3\", \"c3\")\n",
    "# TODO: Draw the histogram to the canvas c3 to show the fitted function and the histogram itself"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Exercise 1.5\n",
    "\n",
    "Make the plot nicer and save it as vector graphic, e.g. eps or pdf. The latter can be displayed within jupyter lab by clicking on it in your file browser on the left.\n",
    "\n",
    "The plot should:\n",
    "- use blue filled boxes for the histogram with horizontal error bars to indicate the bin width\n",
    "- show the fitted gaussian function as red line with a thickness/width of 3\n",
    "- label the `x` and `y` axes with \"x\" and \"Entries\", respectively\n",
    "- display the mean and standard deviation of the histogram in the legend\n",
    "- display the fitted parameters with uncertainties as well as the fit probability in the legend."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### Python Approach\n",
    "\n",
    "Matplotlib provides a lot of information about the available plot style options in the documentation of the respective plot functions. You will also find a lot of matplotlib examples when googling for certain key words.\n",
    "To get change the style of your plot more drastically, you might have to change the plot function you are using. Have a look at `matplotlib.pyplot.errorbar` instead of `matplotlib.pyplot.hist`, for instance.\n",
    "\n",
    "Obtaining the fit probability in python is not as simple as with ROOT, so you can skip it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "fig = plt.figure(figsize=(10,8))\n",
    "# TODO: Plot the histogram with the style improvements\n",
    "\n",
    "# TODO: Plot the fitted gaussian with the style improvements\n",
    "\n",
    "# TODO: Add a legend and axis labels.\n",
    "\n",
    "# TODO: Save to plot as vector graphic (pdf files can be viewed with JupyterLab)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### ROOT Approach\n",
    "\n",
    "To improve the histogram plot, you can for instance have a look at the options described in [the overview of ROOT's `TStyle` Class](https://root.cern.ch/doc/master/classTStyle.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "from ROOT import TH1F, TFile, TF1, gStyle\n",
    "\n",
    "# TODO: Change the properties of gStyle, the ROOT histogram and the gaussian fit function\n",
    "#       to improve the style of the plot\n",
    "\n",
    "c4 = TCanvas(\"c4\", \"c4\")\n",
    "\n",
    "# TODO: Draw the histogram to the canvas c4 and save it as vector graphic (pdf files can be viewed with JupyterLab)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### **Exercise 1.6 (Obligatory)**\n",
    "\n",
    "Fill a histogram with the quotient $f(x_1,x_2) = x_1/x_2$ of two Gaussian distributed random numbers $x_1$ and $x_2$ with the mean $m_1 = 2$ and standard deviation $\\sigma_1 = 1.5$ and $m_2 = 3$, $\\sigma_2 = 2.2$, respectively.\n",
    "\n",
    "Assuming standard error propagation without correlations\n",
    "$$ \\sigma_f^2 = \\sum_i \\left( \\frac{\\partial f}{\\partial x_i} \\right)^2 \\sigma^2_i $$\n",
    "calculate the propagated uncertainty for this function $f(x_1,x_2)$ (using the mean values of $x_1$ and $x_2$).\n",
    "\n",
    "How does the result compare with the properties of the created histogram?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "### Theoretical Calculation\n",
    "\n",
    "**TODO: Make your calculation here using the LaTeX syntax in a Markdown cell!**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### Python Approach\n",
    "\n",
    "Use the methods you learned in the previous exercises to create three numpy arrays containing the random numbers with the properties of $x_1$, $x_2$ and $f(x_1, x_2)$ where the last is simply the quotient of the first two arrays. Plot and evaluate the histograms of these random numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "def create_histograms(N, mean1=2., sigma1=1.5, mean2=3., sigma2=2.2, bins=100):\n",
    "    # Create numpy arrays with gaussian distributed numbers for x_1 and x_2\n",
    "    # TODO: Create the arrays gauss1 and gauss2 with the random numbers\n",
    "    \n",
    "    # Calculate the array containing the quotient of x_1 and x_2\n",
    "    # TODO: Calculate the array f\n",
    "    \n",
    "    # Visualize the content of the arrays as histogram with the help of matplotlib.\n",
    "    # TODO: Plot the histograms of the distributions of the three random numbers\n",
    "    #       Plot them into three different plots and add titles to be able to identify them.\n",
    "    \n",
    "    # Return the numpy arrays\n",
    "    return gauss1, gauss2, f"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "x1, x2, quotient = create_histograms(N=1000000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# TODO: Create three HistogramClass instances from the three arrays you created.\n",
    "#       Use 100 bins and a range from -10 to 10.\n",
    "\n",
    "# TODO: Plot the histograms and determine their mean and standard deviation to be\n",
    "#       able to compare them with the original values and the theoretical calculation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "#### Root Approach\n",
    "\n",
    "Similar to the methods used in Exercise 1.1, use the ROOT method `gRandom.Gaus` to generate the random numbers $x_1$ and $x_2$ and fill `TH1F` histograms with them. Fill also a histogram with the quotient `f = x_1/x_2`.\n",
    "Draw and evaluate the three histograms with the methods you learned in the previous exercises."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "def create_root_histograms(N, bins=100, bin_range=(-10., 10.)):\n",
    "    # Create histogram for the distribution of x, y and f(x,y) = x/y with 100 bins from -10 to 10\n",
    "    min_bin, max_bin = bin_range\n",
    "    # Initialize three TH1F ROOT histograms for x_1, x_2 and f\n",
    "\n",
    "    # Initialize the random numbers generator\n",
    "    gRandom.SetSeed()\n",
    "\n",
    "    # Generate 2 x N random numbers following a gaussian distribution\n",
    "    # one with mean = 2 and sigma = 1.5 and\n",
    "    # one with mean = 3 and sigma = 2.2 and\n",
    "\n",
    "    for i in range(N):\n",
    "        pass # TODO: Fill the three histograms with the TH1F.Fill method\n",
    "    \n",
    "    # TODO: Return the three ROOT histograms"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "x1_hist, x2_hist, f_hist = create_root_histograms(N=100000)\n",
    "\n",
    "# Draw the three histograms onto the canvases c5, c6 and c7\n",
    "\n",
    "c5 = TCanvas(\"c5\", \"c5\")\n",
    "\n",
    "c6 = TCanvas(\"c6\", \"c6\")\n",
    "\n",
    "c7 = TCanvas(\"c7\", \"c7\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "**Write down what you observe when you compare the result of the theoretical calculation with what you obtained using the random numbers.**\n",
    "\n",
    "**TODO: Write down Observation in this Markdown cell!**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "hide_input": false,
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}