{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "b7768702-cdfe-4a59-b678-1dd2a66e7a1e",
   "metadata": {},
   "source": [
    "# <center>Particle Physics II - Physics Beyond the Standard Model</center>\n",
    "\n",
    "## <center>Exercise sheet 2</center>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0fe1bc4f-f3cf-48c9-b5f0-a1f999db2d1f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# some python imports\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import vector as vec\n",
    "import scipy\n",
    "from itertools import *\n",
    "from scipy.optimize import curve_fit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0d487e90-e115-4f8d-a2e9-e8abd057685e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# KIT packages\n",
    "import kafe2 \n",
    "import PhyPraKit as ppk\n",
    "from utils import * # package found in utils.py. Contains the vector class, which might be known from TP1\n",
    "# suppressing some warnings\n",
    "pd.options.mode.chained_assignment = None"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed913036-1912-4b31-9ce0-3514ddc22d8c",
   "metadata": {},
   "source": [
    "## Effective field theory\n",
    "\n",
    "Although physics beyond the SM (BSM) might occur at an not directly experimentally accessible energy scale $\\Lambda$, its\n",
    "low-energy effects can be described by an effective field theory (EFT).\n",
    "\n",
    "A well-known example of an EFT is the contact interaction model of charged currents as proposed by Fermi. It describes nuclear beta decay by introduction of a contact interaction of four fermions, which is accurate below the energy scale provided by the mass of the electroweak gauge bosons, $m_\\mathrm{W} \\approx 80 \\, \\mathrm{GeV}$. The degrees of freedom associated with the heavy bosons are accounted for by the coupling constant $G_\\mathrm{F}$.\n",
    "\n",
    "<table><tr>\n",
    "<td style=\"padding-right:40px\"> <img src=\"figures/fermi_contact.png\" alt=\"Drawing\" style=\"width: 250px;\"/> </td>\n",
    "<td style=\"padding-left:40px\"> <img src=\"figures/fermi_intermediate.png\" alt=\"Drawing\" style=\"width: 250px;\"/> </td>\n",
    "</tr></table>\n",
    "<center>\n",
    "<figcaption align = \"center\">Fig.1: Feynman diagrams illustrating the four-fermion interaction of strength $\\mathrm{G_F}$ valid up to a scale $\\Lambda \\approx \\mathrm{m_W}.$</figcaption>\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29603050-4ee9-4737-8412-bca8f651ebea",
   "metadata": {},
   "source": [
    "<font color='orange'>**Exercise 1: Effective theories in physics**</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d4101eb-4769-4aae-8062-8f1597f51a24",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "Which other effective theories did you encounter in physics? Think about your studies and list a few examples."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6fcd691a-0501-4615-9cfd-15873e501350",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59f92bb5-4226-4e97-9970-c4c2562eb6dc",
   "metadata": {},
   "source": [
    "<font color='orange'>**Exercise 2: SMEFT**</font>\n",
    "\n",
    "In the case of the SM as an EFT (SMEFT), additional terms are included in the Lagrangian with higher mass dimensions, $d > 4$,\n",
    "\n",
    "$$\\mathcal{L}_\\mathrm{EFT} = \\mathcal{L}_\\mathrm{SM} + \\sum _{i} \\frac{C_i^{(5)}}{\\Lambda} \\mathcal{O}_i^{(5)}\n",
    "+ \\sum _{i} \\frac{C_i^{(6)}}{\\Lambda ^2} \\mathcal{O}_i^{(6)}\n",
    "+ \\sum _{i} \\frac{C_i^{(7)}}{\\Lambda ^3} \\mathcal{O}_i^{(7)}\n",
    "+ \\sum _{i} \\frac{C_i^{(8)}}{\\Lambda ^4} \\mathcal{O}_i^{(8)}\n",
    "+ \\, ...\n",
    "$$\n",
    "\n",
    "\n",
    "Contributions with $d > 4$ are suppressed by powers of $1 / \\Lambda$ with $\\Lambda$ being the scale of new physics that's not accessible at current energies. Furthermore, odd-dimensional operators are usually not considered (see Exercise 2c), such that $d = 6$ or $d = 8$ are investigated. $\\mathcal{O}_i^{(d)}$ are all possible operators with mass-dimension $d$ that can be built with the SM fields, leading to very a large number. Certain assumptions can be made, which reduces this number drastically. $C_i$ are the Willson coeffients, which are dimensionless free parameters. The goal of an EFT analysis is to set limits on $C^{(d)}_i / \\Lambda^{d-4}$, where the SM is recovered when $C_i = 0$ for all $i$.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe648659-b66b-45e5-a64d-2d9707da56a4",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "    \n",
    "a) How many dim-5 operators are there and what do they describe?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5e0a660-c5ca-4dc2-a9b3-dcb91ce71784",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3e3a4068-b4d7-48ca-96fa-4212db0cc592",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "    \n",
    "b) Why are odd-dimensional operators typically not considered at the LHC?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65abbfc1-4c4a-40b2-92cf-d2ff11ee85db",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb72b647-3ff5-4d0f-92a3-047417bd091a",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "    \n",
    "c) How many dim-6 and dim-8 operators are there?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15bd56b7-d87d-4299-8942-d2d474a81f7a",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b3a87a5-94d9-4e53-8591-d75f722a5441",
   "metadata": {},
   "source": [
    "In this exercise, we are looking at the so-called Warsaw basis of dim-6 operators as implemented in SMEFTsim ([github](https://smeftsim.github.io/)) with maximum flavor symmetry $U(3)^5$. Out of the remaining parameters, we are looking at only 3 of them. This is because of simplicity, practicality, and because they are the operators, which only affect the coupling of vector bosons.\n",
    "\n",
    "<center>\n",
    "    \n",
    "<table>\n",
    "<tr>\n",
    "    <td style=\"background-color:#FFFFFF;\">\n",
    "        $Q_{W}= \\varepsilon^{IJK} W_\\mu^{I\\nu} W_\\nu^{J\\rho} W_\\rho^{K\\mu}$ \n",
    "    </td>\n",
    "</tr> <tr>\n",
    "    <td style=\"background-color:#FFFFFF;\">\n",
    "        $Q_{\\varphi W}  = \\varphi^\\dagger \\varphi\\, W^I_{\\mu\\nu} W^{I\\mu\\nu}$ \n",
    "    </td>\n",
    "</tr> <tr>\n",
    "    <td style=\"background-color:#FFFFFF;\">\n",
    "        $Q_{\\varphi\\Box} = (\\varphi^\\dagger \\varphi)\\Box(\\varphi^\\dagger \\varphi)$ \n",
    "    </td>  \n",
    "</tr></table>\n",
    "</center>    \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b86ff8d-21e4-4371-95ab-6c1fd0d4134d",
   "metadata": {},
   "source": [
    "## Vector boson scattering\n",
    "\n",
    "Vector boson scattering (VBS) is a process where at the LHC two incoming quarks each radiate off a vector boson (V=W,Z) which then scatter and subsequently decay into leptons or quarks. This process contains contributions to triple and quartic vector boson self-couplings, as well as the coupling of vector bosons to the Higgs boson.\n",
    "As such, it is a key process at the LHC to probe the electroweak symmetry breaking (EWSB). It is even historically interesting, since the exchange of the Higgs boson is required that the cross section does not diverge at large energies.\n",
    "\n",
    "<center><img src=\"figures/vbs.png\" alt=\"Drawing\" style=\"width: 1000px;\"/></center>\n",
    "<center><figcaption align = \"center\">Fig.2: Schematic representation of VBS and the contained interactions.$</figcaption></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d73b2fa-4470-442b-a779-1924179e72da",
   "metadata": {},
   "source": [
    "The event signature consists of two forward jets also called \"tagging jets\", in Fig.2 noted with the letter $j$ and two vector bosons that decay either into leptons or quarks. We will investigate this topology in the next part of the exercise. \\\n",
    "Depending on the decay of the W- or Z-Boson, it can be categorized as a leptonic, semi-leptonic, or hadronic decay.\\\n",
    "In this exercise, we are looking at the hadronic decay channel, which is powerful for EFT constraints due to the large branching ratio. However, the QCD multijets background is very large. Towards the end of the exercise, we will see how this affects the limit setting procedure."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e75fd3d2-52d3-4cd6-88e6-b085984daec9",
   "metadata": {},
   "source": [
    "<font color='orange'>**Exercise 3: VBS channels**</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0ae4ffb-9fe3-4ab5-95b1-a0ff600f4aa2",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "a) Which are the four diboson compositions in which VBS is typically classified?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "732c41a8-bd42-4a39-a08b-7a80c698ad18",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a375391-9449-439c-bb7b-88acfd7f2cd8",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "b) Now consider the decay of the vector bosons. What are the benefits of the leptonic, semi-leptonic and hadronic decay channel?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6782e11-5d72-4067-b4eb-d956012fd67f",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d7371b7-2602-4131-a217-aa3c5e4bb720",
   "metadata": {},
   "source": [
    "In VBS, effects modifying the trilinear (TGC) and quartic (QGC) gauge boson couplings are of interest. They are subject to dim-6 (aTGC) or dim-8 (aQGC) operators respectively. There are also other processes at the LHC, which allow to investigate TGC or QGC and are thus sensitive to the same EFT operators. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74af1670-8029-44f2-985c-f1c3c09d4a4c",
   "metadata": {},
   "source": [
    "<font color='orange'>**Exercise 4: related processes**</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "64a96433-033b-4e16-844d-66455e7e985b",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "\n",
    "Can you think of other processes featuring triple or quartic vector boson self-interactions?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "697a78d0-df09-4857-a344-75b6079a8373",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b05abcdd-32dd-4c03-8311-a6550f366468",
   "metadata": {},
   "source": [
    "## Data sets\n",
    "\n",
    "This exercise uses multiple data sets for signal and background. The data is taken from simulations according to the CMS experiment during the 2016 data taking period at $13\\,\\mathrm{TeV}$. \n",
    "The goal of this exercise is to derive expected limits on the EFT coefficients of the three dimension-6 operators mentioned above. This is done one at a time, so setting one operator to a non-zero value with the other two fixed at zero.\n",
    "\n",
    "In this sections, we will focus on the signal and the main background: events from QCD multijets production. Futher backgrounds will be included in the final section for the limit extraction.\n",
    "\n",
    "All relevant data for this exercise can be found at `/data_share/tp2_bms/Exercise02`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef5b2299-d182-47be-ad43-850c19977beb",
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls /data_share/tp2_bsm/Exercise02"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e492977-6aa3-4bad-94b3-7966bc00450e",
   "metadata": {},
   "source": [
    "### Signal\n",
    "\n",
    "First, let's have a look at the signal sample. The technical implementation for this exercise is based on pandas dataframes and the [Scikit-HEP Vector](https://github.com/scikit-hep/vector) library, which you might already know from TP1. If you are not familiar with pandas dataframes, you can look up the first exercise of TP1 from last semester [TP1 Ex1](https://gitlab.etp.kit.edu/Lehre/tp1_forstudents/-/tree/master/Exercise01) and the vector class found in `utils.py` was introduced in the second exercise [TP1 Ex2](https://gitlab.etp.kit.edu/Lehre/tp1_forstudents/-/tree/master/Exercise02)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "edf300f0-a789-41d3-a7ee-d008dba78e6b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# reading the data files\n",
    "signal = pd.read_pickle('/data_share/tp2_bsm/Exercise02/vbs_eft.pkl.gz')\n",
    "signal.head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6742d88d-2388-4019-ae6f-304e3786a107",
   "metadata": {},
   "source": [
    "As you can see, each event in the list contains two AK8 jets (`ak8_j1` and `ak8_j2`), two AK4 jets (`ak4_j1` and `ak4_j2`), as well as a bunch of event weights (`EFT_weight_1st`, `EFT_weight_2nd`, `EFT_weight_3rd`), which describe the effect of the respective EFT operator on an event by event basis:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12f506f3-5d0b-4ce9-94ca-935b27e70de7",
   "metadata": {},
   "source": [
    "* EFT_weight_1st: cHBox\n",
    "* EFT_weight_2nd: cHW\n",
    "* EFT_weight_3rd: cW"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "750fd575-1854-4a24-8235-1c4b4d24034c",
   "metadata": {},
   "source": [
    "The AK4 jets are selected to correspond to the tagging jets mentioned above. The AK8 jets are clustered with a larger radius and are selected to correspond to the hadronically decaying vector bosons (V = W,Z). For large values of $p_\\mathrm{T}$, the angle between the two quarks from the vector bosons is small enough, such that they are clustered as a single large-radius jet."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4afe92c3-3549-47bf-a255-52e61b945fe7",
   "metadata": {},
   "source": [
    "**Excurse 1: Jet tagging**\n",
    "\n",
    "Since the simulated Monte Carlo (MC) events have gone through the whole chain for event generation, including parton shower, detector simulation and event reconstruction, one cannot say with certainty that the AK8 jets originate from a hadronically decaying W- or Z-Boson. However, there are ways to describe that such a jet originates from a W- or Z-Boson by exploiting its substructure. \n",
    "For this exercise, we are using the $N$-subjettiness $\\tau_{21} = \\tau_2/\\tau_1$ ([arxiv:1108.2701](https://arxiv.org/abs/1108.2701)), which exploits the fact that AK8 jets from hadronically decaying W- or Z-Bosons have a 2-prong structure due to the two quarks inside the AK8 jet. There are also more advanced machine learning based algorithms (e.g. [DeepAK8](https://cds.cern.ch/record/2683870?ln=de) or [ParticleNet](https://arxiv.org/abs/1902.08570))\\\n",
    "In the dataset above, you can find values for $\\tau_{21}$ and applying DeepAK8. Looking at the signal process, you cannot see the impact of this shaping, since there is a peak at the W- or Z-Boson mass anyway. When looking at the QCD background in Exercise 7, we will investigate this further.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59b47092-90a1-484a-ad55-7f9757e9148a",
   "metadata": {},
   "source": [
    "**Excurse 2: EFT weights**\n",
    "\n",
    "When generating events in [MadGraph](http://madgraph.phys.ucl.ac.be), each event comes with a MC weight such that the sum of these weights is the computed cross-section. To study the impact of EFT, the reweighting module was used, which changes these weights according to the matrix element.\\\n",
    "In practice, this means that the effect of EFT operators is described by event weights in our Monte Carlo sample. Each event has 81 weights per EFT operator symmetrically distributed around the SM value ($C_i = 0$) such that, e.g., `signal_mc[\"EFT_weight_1st\"][\"0\"]` and `signal_mc[\"EFT_weight_2nd\"][\"0\"]` are both the event weight for the SM.\\\n",
    "**Be careful: the index is a string and not an integer!**\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "876ae5e3-9ecb-4690-8008-f1c877bd1a94",
   "metadata": {},
   "source": [
    "<font color='orange'>**Exercise 5: VBS topology**</font>\n",
    "\n",
    "Plot some variables for the SM scenario to get familiar with the event topology. An example on how to access the transverse momentum $p_T$ of the first AK4 jet is given below."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7847d8ed-2663-4d0d-ac07-210921f2f8bf",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "\n",
    "a) Why are the AK4 jets called \"forward jets\"? Make a plot."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd923452-ffad-4440-98c9-595add02dd86",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a1c3b104-5530-43b3-b0e5-24fca132823e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# example on how to access variables\n",
    "signal['ak4_j1'].v4.pt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b074e678-005e-463d-a971-04b6a8556fc3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# example on how to access the weights, here for the SM\n",
    "signal['EFT_weight_1st','0']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dbf5ade1-b868-40b9-8aa8-eebba88e2319",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e9fd759-21e1-4ddf-83a2-92ea3280ce4e",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "\n",
    "b) Where in the detector are the vector bosons located? Make a plot.\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "50536953-420b-4763-9913-ef0389daa9d4",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "74dce8f6-ad86-4bb5-84b8-b49f33861f17",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef1b1312-b983-496e-aed0-169dc98fcb43",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "\n",
    "c) It is instructive to look at the invariant mass of the pair of AK8 jets and the pair of AK4 jets. Complete the function \"calc_M2\" below such that it calculates and adds the combined 4-vectors of two jets to the dataframe. Then plot $m_{V1}$, $m_{V2}$, $m_{VV}$, and $m_{jj}$."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d46c5226-4896-420b-b31f-ebb8be1045ef",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "422c712e-fbed-4555-b8ed-1ee19730437f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# this function should calculate the combined four vector of two jets.\n",
    "def calc_M2(df, j1 = 'ak8_j1', j2 = 'ak8_j2', jj='ak8_j1_j2'):\n",
    "    \"\"\"\n",
    "    calculate the combined 4-vector of 2 jets and add it to the df\n",
    "    \n",
    "    @param df: pd.DataFrame, jet1 name, jet1 name, name of combined 4-vector\n",
    "    @return: pd.DataFrame\n",
    "    \"\"\"\n",
    "    \n",
    "    # define new columns for the sum of both jets\n",
    "    # you can access, e.g., the energy of jet 1 with 'getattr(df[j1].v4,\"E\")' or 'df[j1].v4.E'\n",
    "    \n",
    "    # Your code goes here:\n",
    "    \n",
    "    return df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4bcf2f40-c0f9-4bdc-b78b-0e5706059f4d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# now apply your function to calculate the 4-vectors\n",
    "signal = calc_M2(signal, 'ak8_j1', 'ak8_j2', 'ak8_j1_j2')\n",
    "signal = calc_M2(signal, 'ak4_j1', 'ak4_j2', 'ak4_j1_j2')\n",
    "signal.head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "603f137c-8dbe-47a6-acdc-a6282405ceab",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Now make the plots\n",
    "# Your code goes here:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1913f1c1-dfae-4f2c-b65b-149dd9e71ac9",
   "metadata": {},
   "source": [
    "<font color='orange'>**Exercise 6: EFT**</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5b77607-9fb3-4cdc-8a18-0409d762f8ff",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "a) Repeat the plots from Exercise 5 now with weights not corresponding to the SM. How do they change? Scan one EFT operator to see its impact. Is every variable affected in the same way? Do you have an intuitive explanation for your observation?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cfef9aba-fa19-4b20-a395-5af70db62aa3",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "704432c4-d4ab-4650-af9c-3eb1e74fe9a8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code goes here:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a554973-5f0a-4377-ab8e-70555dcdbd36",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "b) Calculate for each of the 81 values of the first EFT operator the sum of all event weights. Normalize the sum of weights to the SM value (index \"0\"), such that the entry for the SM is 1 per definition. Finally, plot them against the corresponding value of the first EFT operator. How does this correspond to the equation on sl.34 of the 3rd lecture?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5f9c8525-467c-43f3-be44-beef04d7f10a",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5702a67c-6779-41b3-9820-34addd822eee",
   "metadata": {},
   "outputs": [],
   "source": [
    "# this function should calculate the normalized sum of events for all 81 values of the first EFT operator\n",
    "def weights_in_slice(df, col = \"ak8_j1_j2\", var = \"mass\", mini = 0, maxi = 300):\n",
    "    \"\"\"\n",
    "    return the normalized weights given a range from \"mini\" to \"maxi\" in a specific variable \"col.v4.var\" (var has to be defined in vector class of utils.py)\n",
    "    normalization is such, that the SM weight is 1.\n",
    "    The cut is only necessary for Ex.6c) and can be left out for Ex.6b).\n",
    "    \n",
    "    @param df: pd.DataFrame, col: str, var: str, mini: int or float, maxi: int or float\n",
    "    @return: list\n",
    "    \n",
    "    \"\"\"\n",
    "    # apply the given cut\n",
    "    # Your code:\n",
    "    df_cut = \n",
    "\n",
    "    # then extract the weights and normalize to SM value\n",
    "    # Your code:\n",
    "    tmp_sm_sum = df_cut['EFT_weight_1st','0'].sum()\n",
    "    weight_list = \n",
    "    \n",
    "    return weight_list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "96002e2c-5942-4eca-ba9b-a93f81f9668c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# now make the plots.\n",
    "mvv_bins=[0,20000]\n",
    "for i in range(len(mvv_bins) - 1):\n",
    "    plt.plot(range(-40,41),weights_in_slice(signal, \"ak8_j1_j2\", \"mass\", mvv_bins[i],mvv_bins[i+1]), 'r+', label='Simulation')\n",
    "    \n",
    "plt.xlabel(\"$\\mathrm{c_{WWW}}$ / $\\mathrm{\\Lambda}^{-2}$\")\n",
    "plt.ylabel(\"yield / SM\")\n",
    "plt.legend(title = \"$m_\\mathrm{VV}$ [GeV]\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b28a2fe1-e867-42c4-8549-78583f3d6ff9",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "c) Repeat the plot from Exercise 6b for $M_{VV}$ in the following ranges: [0,300], [300,500], [500,800], [800,1000], [1000,1200], [1200,1400], [1400,2000]. Which range is affected the most?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53c4f196-1995-4c41-889c-67e50e8396fe",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2f6b3e2d-4e14-4066-bd0b-35268760a742",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "263c271e-a32e-4f42-acce-cb27e40135cc",
   "metadata": {},
   "source": [
    "### QCD Background\n",
    "\n",
    "Now we are looking at the major background from QCD multijets. What we are interested in, is getting familiar with the general topology and follow up with the above Excurse 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e3f89d95-183a-4dd0-9e04-78052fae8e10",
   "metadata": {},
   "outputs": [],
   "source": [
    "# read in QCD MC\n",
    "import glob\n",
    "qcd_files=glob.glob(\"/data_share/tp2_bsm/Exercise02/QCD_Pt_*.pkl.gz\")\n",
    "qcd_binned = [pd.read_pickle(b) for b in qcd_files]\n",
    "qcd = pd.concat(qcd_binned, axis=0, ignore_index=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24091b92-2ae6-4091-9810-a3656a24c21a",
   "metadata": {},
   "outputs": [],
   "source": [
    "qcd.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87e472de-b51c-4c3a-9f68-0080701145c6",
   "metadata": {},
   "source": [
    "**Caution:** Since these MC events were produced in multiple bins of $p_\\mathrm{T}$ on the level of hard scattering, the correct weights have to be applied when adding them. These weights (['event','weight']) are calculated from the generated number of events and the cross-section per bin in $p_\\mathrm{T}$."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a361905-239e-4db6-b52f-7d39f7abbfc6",
   "metadata": {},
   "source": [
    "<font color='orange'>**Exercise 7: QCD background**</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "911822e4-db0c-4c70-baa4-366bd1039041",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "a) Plot some basic variables, but at least $m_{VV}$ and $m_{V1}$."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05c40f72-086b-4e19-90b5-294192ed14a5",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "88298c6e-2428-4a8f-8f97-7231a39491cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "# calculating the combined 4-vector of both AK4 and of both AK8 jets in the same way as above\n",
    "# Your code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd38d053-11fd-472a-b9a1-76fb19e561bf",
   "metadata": {},
   "outputs": [],
   "source": [
    "# now make the plots\n",
    "# Your code:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16aad55d-915f-4b52-9862-4d7270b52e96",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "b) Apply a cut on the N-subjettiness, $\\tau_{21} < 0.79$ for both AK8 jets and repeat the two plots. How does the shape of $m_{V1}$ ($m_{V2}$) change?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61675065-1d96-42ea-b93b-67cb6f92283a",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0957fbb4-bff4-41a6-aa7e-de375c37094b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# apply the cut on tau21 < 0.79\n",
    "# Your code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "41b2279c-0d95-4c58-82d9-8f01a239d6b0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# now make the plots\n",
    "# Your code:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ccd02c85-5a6e-45d4-acbc-40e381835ed6",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "c) Now instead cut on the DeepAK8 score, $\\mathrm{DeepAK8} > 0.5$, and repeat the two plots. How does the shape of $m_{V1}$ change?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c66ed74-effa-4d41-8711-e2ffb6699836",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer: </font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a604c09d-84e4-455e-a7cb-a3c502a2fb75",
   "metadata": {},
   "outputs": [],
   "source": [
    "# apply the cut on DeepAK8 > 0.5\n",
    "# Your code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6eceac2f-b337-4434-9218-a486d9eb6d41",
   "metadata": {},
   "outputs": [],
   "source": [
    "# now make the plots\n",
    "# Your code:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3cbfa205-e562-4fd5-afaa-b21e4c85f530",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "d) Which of the two algorithms is better suited for this analysis and why?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b66263b-ce84-45cb-9c70-f6f232269ff3",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4bbeea05-886e-4c6d-bf8f-64b2b66412a6",
   "metadata": {},
   "source": [
    "## Statistical analysis\n",
    "\n",
    "Since we are looking at the hadronic decay channel of VBS, the overwhelming QCD background requires special treatment. That's why this analysis features a 3D Fit exploiting the different shape of QCD background and our signal process.\\\n",
    "The ultimate fit for limit extraction is done in a 3D plane of ($m_\\mathrm{VV}, m_\\mathrm{V1}, m_\\mathrm{V2}$). This is because, as we have seen above, our signal process is resonant in $m_\\mathrm{V1}$ and $m_\\mathrm{V2}$, whereas the QCD background is exponentially falling. The 3rd axis, $m_\\mathrm{VV}$ is sensitive to the change of our EFT parameter. Furthermore, using $m_\\mathrm{VV}$ as a variable gives the intuitive interpretation of the low energy tail of a resonance, as well as a way to tackle a theoretical problem of EFT, namely unitarity restoration (not in this exercise!).\n",
    "\n",
    "In the following, we will derive the template for our signal process and QCD background. Other backgrounds follow a similar or easier treatment than QCD multijet production and are not explicitely included here.\n",
    "\n",
    "**Important:** From now on, please always work with the cut on the N-subjettiness $\\tau_{21} > 0.79$."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "045369c1-c7cf-4390-948d-ad2d01e02992",
   "metadata": {},
   "source": [
    "<center><img src=\"figures/3dfit_overview.png\" alt=\"Drawing\" style=\"width: 300px;\"/></center>\n",
    "<center><figcaption align = \"center\">Fig.3: Schematic overview of the fitting strategy in 3 dimension.</figcaption></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de58fac3-8bcd-4bd0-b288-d79c2b363372",
   "metadata": {},
   "source": [
    "### Deriving signal templates\n",
    "\n",
    "Now, we will derive parametric templates of the signal process in 3D, i.e., a 3D probability density function to describe the signal process for a given value of an EFT parameter. For this, we assume the shape of $m_\\mathrm{V1}$, $m_\\mathrm{V2}$, and $m_\\mathrm{VV}$ to be uncorrelated, such that it falls into three parts:\n",
    "\n",
    "$$ P^\\mathrm{EFT}(m_\\mathrm{VV}, m_\\mathrm{V1}, m_\\mathrm{V2}) = P(m_\\mathrm{VV}) \\times P(m_\\mathrm{V1}) \\times P(m_\\mathrm{V2})$$\n",
    "\n",
    "The overall normalization follows from the scaling derived in Exercise 6b) and the cross-section of the SM process such that we have to focus now on $P(m_\\mathrm{V1})$, $P(m_\\mathrm{V2})$ and $P(m_\\mathrm{VV})$.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9d949ed2-e4a0-402a-8a5e-e015b09de3a1",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# apply the cut on tau21 > 0.79 for both AK8 jets also to the signal\n",
    "signal_tau21 = signal[(signal['ak8_j1','tau21'] < 0.79) & (signal['ak8_j2','tau21'] < 0.79)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f032f13c-75bf-4a38-85d8-fd42ffa9bb86",
   "metadata": {},
   "outputs": [],
   "source": [
    "# We now apply some further analysis cuts:\n",
    "# m_{VV} > 800 GeV and m_{VV} < 5000 GeV\n",
    "signal_tau21 = signal_tau21[(signal_tau21['ak8_j1_j2'].v4.mass > 800) & (signal_tau21['ak8_j1_j2'].v4.mass < 5500)]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61441bcb-16eb-4ce7-99f5-d7775382435d",
   "metadata": {},
   "source": [
    "#### 1D template for $m_\\mathrm{V1}$ and $m_\\mathrm{V2}$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67916e85-a613-486e-9fb9-cba58c311023",
   "metadata": {},
   "source": [
    "The $m_\\mathrm{V1}$ - and $m_\\mathrm{V2}$ - distributions of the signal process show a resonance at the W- or Z-Boson peak. This can be modelled by a double-sided Crystal Ball function: a Gaussian distribution in the middle with two powe-law tails. This function has 6 parameters: the center and width of the Gaussian core, and four values that describe where the tails start ( $\\alpha_{i}$ ) and how they fall off ( $\\mathrm{N}_i$ ). "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6faf4590-966d-44e7-adc7-8cd9fa9b5660",
   "metadata": {},
   "source": [
    "\n",
    "$$t = \\frac{x - mean}{width} $$\n",
    "\n",
    "$$\\mathrm{DS-CrystalBall} (x; mean, width, \\alpha_1, N_1, \\alpha_2, N_2) =\n",
    "\\left\\{\n",
    "\t\\begin{array}{ll}\n",
    "\t\t[1 - \\frac{\\alpha _1}{N _1} (\\alpha _1 + t)]^{-\\alpha _1} \\exp{(-\\frac{1}{2}\\alpha _1^2)}  & \\mathrm{if} \\  t \\leq - \\alpha _1 \\\\\n",
    "\t\t\\hspace{3.5cm} \\exp{(-\\frac{1}{2}t^2)} & \\mathrm{if} \\  - \\alpha _1 < t < \\alpha _2 \\\\\n",
    "\t\t[1 - \\frac{\\alpha _2}{N _2} (\\alpha _2 - t)]^{-\\alpha _2} \\exp{(-\\frac{1}{2}\\alpha _2^2)} & \\mathrm{if} \\  t \\geq \\alpha _2\n",
    "\t\\end{array}\n",
    "\\right.\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9c534e0d-f128-45d5-bbc7-e44dfb796fc7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# implementation of double-sided Crystal Ball function\n",
    "def DSCB(x, mean = 80.0, width = 1.0, a1 = 1.0, N1=1.0, a2=1.0, N2=1.0, scale=1000):\n",
    "                      \n",
    "    lower_bound = (-1. * np.abs(a1 * width)) + mean\n",
    "    upper_bound = (+1. * np.abs(a2 * width)) + mean\n",
    "    \n",
    "    condlist=[ (x <= lower_bound), (x > lower_bound) & (x < upper_bound), (x >= upper_bound)]\n",
    "    \n",
    "    funclist=[ lambda x: scale * (1 - (a1 / N1)*(a1+((x - mean) / width)))**(-1. * a1) * np.exp(-0.5 * a1 * a1),\\\n",
    "               lambda x: scale * np.exp(-0.5 * ((x - mean) / width) * ((x - mean) / width)),\\\n",
    "               lambda x: scale * ( (1 - (a2 / N2)*(a2-((x - mean) / width)))**(-1. * a2) ) * np.exp(-0.5 * a2 * a2)]\n",
    "                    \n",
    "    return np.piecewise(x, condlist, funclist)\n",
    "\n",
    "# this function can be used to fix parameters and enforce a one-sided CB function or a Gaussian\n",
    "def DSCB_fix(x, mean, width, a2, N2, scale):\n",
    "    ret = DSCB(x, mean, width, 1., 0.5, a2, N2, scale)\n",
    "\n",
    "    return ret"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19bae5d4-431f-4f71-b444-d4bfcafe8a29",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%capture --no-stderr\n",
    "# make a histogram of m_v1 \n",
    "eft_values=[5,10,15,20]\n",
    "histos_mv1, bin_centers, fit_results = [], [], []\n",
    "for eftv in eft_values:\n",
    "    histos_mv1.append( np.histogram(signal_tau21['ak8_j1'].v4.mass, bins=32, weights=signal_tau21['EFT_weight_1st',str(eftv)]) )\n",
    "    bin_centers.append( (histos_mv1[-1])[1][:-1] + np.diff( (histos_mv1[-1])[1]) / 2 )\n",
    "\n",
    "    # now run the fit\n",
    "    #fit_results.append( curve_fit(DSCB_fix, bin_centers[-1], (histo_mv1[-1])[0], p0=[90.0, 11.0, 1.6, 1.0, 10. ]) )\n",
    "    fit_results.append( curve_fit(DSCB, bin_centers[-1], (histos_mv1[-1])[0], p0=[90.0, 11.0, 1.6, 1., 1.6, 1.0, 10. ]) )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "edc7e84d-e92c-4084-88ce-a5db3c679a9c",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "x_line = np.arange(55, 215, 0.25)\n",
    "cool_colors=['crimson','rebeccapurple','darkturquoise','orange']\n",
    "for i in range(len(eft_values)):\n",
    "    plt.plot(bin_centers[i], (histos_mv1[i])[0], marker='+', linestyle='None', color=cool_colors[i])\n",
    "    plt.plot(x_line, DSCB(x_line,*(fit_results[i][0])) , label=str(eft_values[i]) + ' $\\mathrm{TeV}^{-2}$', color=cool_colors[i])\n",
    "    \n",
    "plt.xlabel(\"$\\mathrm{m_{V1}}$ [$\\mathrm{GeV}$]\")\n",
    "plt.ylabel(\"a.u.\")\n",
    "plt.legend(title = \"$\\mathrm{c_{WWW}}$ / $\\mathrm{\\Lambda}^{-2} = $\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cdb868d3-7019-419c-8b41-448943067e25",
   "metadata": {},
   "source": [
    "<font color='orange'>**Exercise 8: signal $m_{V}$ template**</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f26c829-b7c0-4709-917a-095b42954fe3",
   "metadata": {},
   "source": [
    "Read through and understand the above implementation. Then do the following tasks:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c69b3c97-6a6d-4989-ae2f-c6b2268e04cf",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "a) How does this distribution look for $m_{V2}$ ?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a572788d-5dd6-42c9-a039-74ef1105f830",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "514fa231-218c-4265-b669-5742810b25d2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code goes here:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e48802ad-116e-4f32-bb05-81353e4ba6dd",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "b) Instead of a double-sided Crystal Ball function, try using a single-sided Crystal Ball function or a Gaussian. You can use the above function \"DSCB_fix\" and fix the correct parameter(s)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6444f599-e6d5-495d-9a55-cceb7b4a7069",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b90c5717-9a03-4143-846f-1d61f1a02ec7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code goes here:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5fe8889-2f4f-40b1-9cac-f303ee984416",
   "metadata": {},
   "source": [
    "#### 1D template for $m_\\mathrm{VV}$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05c5cdea-d27e-4040-a51b-a94fb1064ede",
   "metadata": {},
   "source": [
    "Now that we looked at $P_\\mathrm{V1}$ and $P_\\mathrm{V2}$, it is now time to have a look at the third axis in the final fit: $P_\\mathrm{VV}$. Since the EFT dependency is prominent in this variable and does not only affect the normalization but also noticably the shape, the parametrization also includes the EFT parameter itself.\\\n",
    "The functional form for $P_\\mathrm{VV}$ is the following:\n",
    "\n",
    "$$ P^\\mathrm{EFT}(m_\\mathrm{VV}) = \\mathrm{N_{SM}} \\cdot \\mathrm{e}^{ \\mathrm{a_0} \\mathrm{M_{VV}} }\n",
    "+ \\mathrm{N_{quadr}} \\cdot \\mathrm{c_i}^2 \\cdot \\mathrm{e}^{ \\mathrm{a_1} \\mathrm{M_{VV}} } \\cdot \\frac{1 + \\mathrm{Erf( (\\mathrm{M_{VV} - \\mathrm{a_2}) / \\mathrm{a_3} }) }} {2}  $$\n",
    "\n",
    "\n",
    "$$ %P(m_\\mathrm{VV}) = \\mathrm{N_{SM}} \\cdot \\mathrm{e}^{ \\mathrm{a_0} \\mathrm{M_{VV}} }\n",
    "%+ \\mathrm{N_{intf}} \\cdot \\mathrm{c_i} \\cdot \\mathrm{e}^{ \\mathrm{a_1} \\mathrm{M_{VV}} } \n",
    "%+ \\mathrm{N_{quadr}} \\cdot \\mathrm{c_i}^2 \\cdot \\frac{1 + \\mathrm{Erf( (\\mathrm{M_{VV} - \\mathrm{a_2}) / \\mathrm{a_3} }) }} {2}  $$\n",
    "\n",
    "where the interference term has be omitted for simplicity. The PDF then falls into two parts: an exponential falling part for the SM which does not change with the Willson coefficient, $\\mathrm{c_i}$, and a term that scales quadratically with $\\mathrm{c_i}$. The functional form has been chosen to describe the turn on with higher energies and the effect of the proton PDF, which ultimately forces the distribution to zero.\\\n",
    "The fitting procedure is similar to before:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "73a4c8b7-0f73-4833-be3a-d54db0e7cc71",
   "metadata": {},
   "outputs": [],
   "source": [
    "def eft_sm(x, a0, scale):\n",
    "    \n",
    "    exp_part = np.exp(a0 * x)\n",
    "    return scale * exp_part\n",
    "    \n",
    "# a2 = offset, a3 = width\n",
    "def eft_quadr(x, a1, offset, width, scale):\n",
    "    \n",
    "    erf_part = (1 + scipy.special.erf((x - offset) / width)) / 2\n",
    "    exp_part = np.exp(a1 * x)\n",
    "    return scale * exp_part * erf_part"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5e256574-9130-4b00-b9c8-93f04dd317b4",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# First the histograms\n",
    "histo_sm = np.histogram(signal_tau21['ak8_j1_j2'].v4.mass, bins=15, weights=(signal_tau21['EFT_weight_1st',\"0\"])) \n",
    "bin_centers_sm = histo_sm[1][:-1] + np.diff( histo_sm[1] ) / 2\n",
    "\n",
    "histo_quad = np.histogram(signal_tau21['ak8_j1_j2'].v4.mass, bins=15, weights=((signal_tau21['EFT_weight_1st',\"15\"] + signal_tau21['EFT_weight_1st',\"-15\"] -signal_tau21['EFT_weight_1st',\"0\"])/2.)) \n",
    "bin_centers_quad = histo_quad[1][:-1] + np.diff( histo_quad[1] ) / 2\n",
    "\n",
    "# Then the fits -> needs good starting values!\n",
    "fit_result_sm = curve_fit(eft_sm, bin_centers_sm, histo_sm[0], p0=[-0.003, 2800.])\n",
    "fit_result_quad = curve_fit(eft_quadr, bin_centers_quad, histo_quad[0], p0=[-0.001, 2800., 1400., 9300.])\n",
    "\n",
    "# And finally plot it\n",
    "x_line_mvv = np.arange(800, 8000, 1)\n",
    "plt.figure()\n",
    "\n",
    "fig, ax = plt.subplots(1, 2, figsize=(9,4.5))\n",
    "ax[0].plot(x_line_mvv, eft_sm(x_line_mvv,*fit_result_sm[0]), color=cool_colors[2], label='Fit')\n",
    "ax[0].plot(bin_centers_sm, histo_sm[0], marker='+', linestyle='None', color=cool_colors[0], label='Simulation')\n",
    "ax[0].set_xlabel(\"$\\mathrm{m_{VV}}$ [$\\mathrm{GeV}$]\")\n",
    "ax[0].set_ylabel(\"a.u.\")\n",
    "ax[0].set_title(\"SM contribution\")\n",
    "ax[0].legend(title = \"$\\mathrm{c_{WWW}}$ / $\\mathrm{\\Lambda}^{-2} = 0 \\, \\mathrm{TeV}^{-2}$\")\n",
    "\n",
    "\n",
    "ax[1].plot(x_line_mvv, eft_quadr(x_line_mvv,*fit_result_quad[0]), color=cool_colors[2], label='Fit')\n",
    "ax[1].plot(bin_centers_quad, histo_quad[0], marker='+', linestyle='None', color=cool_colors[0], label='Simulation')\n",
    "ax[1].set_xlabel(\"$\\mathrm{m_{VV}}$ [$\\mathrm{GeV}$]\")\n",
    "ax[1].set_ylabel(\"a.u.\")\n",
    "ax[1].set_title(\"quadratic EFT contribution\")\n",
    "ax[1].legend(title = \"$\\mathrm{c_{WWW}}$ / $\\mathrm{\\Lambda}^{-2} = 15 \\, \\mathrm{TeV}^{-2}$\")\n",
    "\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3105793d-12a1-4429-8a51-d797ef3853f1",
   "metadata": {},
   "source": [
    "Now the two parts of our $\\mathrm{m_{VV}}$ parametrization are fitted. Your task will be to combine them and make a quick cross-check. The following exercise will guide you through that:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa6661f9-04a9-43b7-88e6-c6e81dce4710",
   "metadata": {},
   "source": [
    "<font color='orange'>**Exercise 9: signal $m_{VV}$ template**</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a19495e-89b8-4517-a225-0249bb588e14",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "a) What are the results of the two fits for $m_{VV}$ ?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b127c0d0-7f13-4c24-9208-02afd1608508",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "421f84d8-ab17-4858-957f-60306dfbf7e8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your code:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d79b8094-ffe5-4e73-83c2-50a0ca1fb726",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "b) Which value of $c_\\mathrm{HBox}$ has been used to derive the quadratic contribution?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ec59421-756d-41ec-ad87-2f70a40f5ced",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4c7a494-17ac-4aa1-aba3-c67333b77a36",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "c) Complete the code below to sum both contributions. Be careful that the first term does not scale with the Willson coefficient, while the second does.\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b66ddfa-53a8-4313-9ced-0e8a619b2676",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Please complete the code\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ce042b9f-deef-44a8-94ec-cd6b76b1c03d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# function for the combination of both contributions\n",
    "def p_mvv(x, a0, scaleSM, a1, offset, width, scaleQ, eft_value):\n",
    "    \n",
    "    # here we can conveniently reuse the functions from above. \n",
    "    # Be careful to scale the normalization of the quadratic contribution with \"eft_value ** 2\"!\n",
    "    # Your code goes here:\n",
    "    sm_part =\n",
    "    quad_part =\n",
    "    \n",
    "    return sm_part + quad_part"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3dfa6b44-9db1-449a-b2d4-507f5e00a030",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "d) Finally, plot the complete template together with the Monte Carlo Simulation for $c_\\mathrm{HBox} = 0,5,10,15$ in one plot."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c284484d-85d1-485c-a764-1f8a7a99120f",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Please complete the code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32cb4aac-d8b4-4c7e-9b31-493e9e2f2f77",
   "metadata": {},
   "outputs": [],
   "source": [
    "# the combined fit results. Be careful to rescale the normalization of the quadratic contribution!\n",
    "combined_fit_results=\n",
    "\n",
    "cwww_values=[0,5,10,15]\n",
    "hs,bs = [],[]\n",
    "for i,c in enumerate(cwww_values):\n",
    "\n",
    "    # Your code goes here:\n",
    "\n",
    "    \n",
    "# some cosmetics in case you are using matplotlib\n",
    "plt.xlabel(\"$\\mathrm{m_{VV}}$ [$\\mathrm{GeV}$]\")\n",
    "plt.ylabel(\"a.u.\")\n",
    "plt.legend(title = \"$\\mathrm{c_{WWW}}$ / $\\mathrm{\\Lambda}^{-2} = $\")\n",
    "plt.title(\"full $m_\\mathrm{VV}$ contribution\")\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d98ed6ed-c955-403e-9b10-7adb95a7f838",
   "metadata": {},
   "source": [
    "### QCD templates"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6fc723aa-ec6a-4ace-a4c2-30016055ffb8",
   "metadata": {},
   "source": [
    "Now we will have a very short look at the QCD background. The derivation of parametric templates is not done explicitely in this exercise but we will have a look at the results.\\\n",
    "The three contributions of the PDF cannot be assumed to be uncorrelated. In this case, the conditional PDFs for $m_\\mathrm{V1}$ and $m_\\mathrm{V2}$, $P_\\mathrm{V1}(m_\\mathrm{V1} | m_\\mathrm{VV})$ and $P_\\mathrm{V1}(m_\\mathrm{V2} | m_\\mathrm{VV})$, are derived in the form of 2D histograms. \\\n",
    "The result is given below and shows that all three axes can be modeled (and in fact are) with a simple exponential distribution.\n",
    "\n",
    "$$ P^\\mathrm{QCD}(m_\\mathrm{VV}, m_\\mathrm{V1}, m_\\mathrm{V2}) = P(m_\\mathrm{VV}) \\times P_\\mathrm{cond,1}(m_\\mathrm{V1} \\vert m_\\mathrm{VV}) \\times P_\\mathrm{cond,2}(m_\\mathrm{V2}  \\vert m_\\mathrm{VV})$$\n",
    "\n",
    "<table><tr>\n",
    "<td style=\"padding-right:40px\"> <img src=\"figures/mvv_plot.png\" alt=\"Drawing\" style=\"width: 250px;\"/> </td>\n",
    "<td style=\"padding-right:40px\"> <img src=\"figures/cond_l1.png\" alt=\"Drawing\" style=\"width: 250px;\"/> </td>\n",
    "<td style=\"padding-left:40px\"> <img src=\"figures/cond_l2.png\" alt=\"Drawing\" style=\"width: 250px;\"/> </td>\n",
    "</tr></table>\n",
    "<center><figcaption align = \"center\">Fig.4: $m_\\mathrm{VV}$, $m_\\mathrm{V1}$ and $m_\\mathrm{V2}$ contributions to the combined PDF, $P^\\mathrm{QCD}(m_\\mathrm{VV}, m_\\mathrm{V1}, m_\\mathrm{V2})$.</figcaption></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90a84d2d-b762-4ae5-8225-198e7eb5ab28",
   "metadata": {},
   "source": [
    "<font color='orange'>**Exercise 10: QCD background**</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a37c872a-4d34-44b7-b84a-af106fc4ab15",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "a) Is the shape of $m_\\mathrm{V1}$, $m_\\mathrm{V1}$ and $m_\\mathrm{VV}$ expected? Do you know an explanation for it?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94e79b7f-8c4f-4200-839d-5aa30ecfb7ab",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4bde95f-704b-4c89-9039-1f01ba494026",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "b) Apart from looking at the explicit plots, do you have an explanation why $m_\\mathrm{V1}$ and $m_\\mathrm{VV}$ are correlated?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4401c3f-5616-462f-88fb-e1dd3d7f62f9",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8236f953-e770-47dd-b236-61956ffc9cd3",
   "metadata": {},
   "source": [
    "## Limits"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "344d8dfc-eeee-42b3-945f-3c9e90bbe0d7",
   "metadata": {},
   "source": [
    "Unfortunately, we have to cut many corners to fit this analysis into the scope of one exercise sheet. One of such corners are for example systematic uncertainties or the final fit for limit extraction. Also backgrounds, which contain both, a resonant and a exponentially falling contribution to the AK8 jet-masses have not been investigated.\\\n",
    "The final fit itself runs within a CMS internal software framework based on ROOT and takes multiple hours.\n",
    "Instead, we give you the expected limits derived from simulation in form of a plot which you have to understand in the exercise below."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e912bf9-029f-4f09-a2d1-fac7a6fa9104",
   "metadata": {},
   "source": [
    "<center><img src=\"figures/ExpLimits_VBS_EFT.png\" alt=\"Drawing\" style=\"width: 400px;\"/></center>\n",
    "\n",
    "<center><figcaption align = \"center\">Fig.5: Plot showing the final fit results and expected limits for $c_\\mathrm{HBox}$.</figcaption></center>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bce1ef82-5501-450a-892d-fa527e400097",
   "metadata": {},
   "source": [
    "<font color='orange'>**Exercise 11: Limits**</font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "beff3866-b1c3-4957-8091-08e9e177b876",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "a) Explain what is shown on the x- and y- axis."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d4befcb-fb64-410d-b82e-1ff5140a6780",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer: </font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e3ce681-d5d1-4991-86f4-6aff2d1e7a5c",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "b) Which regions can be excluded at 95% CL and why?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74b14674-7c02-445b-99e5-e8ec1d702edd",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer: </font>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6da84908-469f-4985-ad32-0767791a32ee",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">\n",
    "c) Assuming symmetric errors, how do they compare to other public limits?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3653220a-b7bc-4a74-8721-d9c14d73c64e",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">\n",
    "<font color='green'> Answer:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80411522-66d0-4687-9fa2-3fc0b0fa83a3",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-warning\">    \n",
    "    \n",
    "d) How does this plot change with increased luminosity?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d52826c-52af-4577-bb09-ecfdf415065e",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-success\">    \n",
    "<font color='green'> Answer:"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}