\n",
" \n",
"**Task 3:**\n",
" \n",
"Please implement the conditions $|\\eta| < 5$ and $p_{\\mathrm{T}} > 30 \\,\\mathrm{GeV}$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"Selection2 = [] # List of Masks(Dataframes) for the Samples \n",
"\n",
"for ID in [0,1,2,3,4,5,6]: \n",
" \n",
" # get sample jets\n",
" sample = Files[ID]\n",
" sample_Jets = Sample.JetNames(sample) \n",
" \n",
"\n",
" mask = pd.DataFrame(np.ones((len(sample),len(sample.columns.levels[0]))), dtype = bool, columns = sample.columns.levels[0])\n",
" \n",
" # condition on Pt\n",
" ### YOUR CODE HERE\n",
" \n",
" # condition on Eta\n",
" ### YOUR CODE HERE\n",
" \n",
" mask = make_particle_filter_mask(sample, mask)\n",
" \n",
" # save the selection \n",
" Selection2.append(mask)\n",
" print(f'Mask {Sample.toString(ID)} Generated')\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Calculate $H\\kern-0.575em\\raise-0.375ex{\\large/}_{\\mathrm{T}}$ and $ \\Delta \\phi (\\vec{H\\kern-0.575em\\raise-0.375ex{\\large/}_{\\mathrm{T}}}, \\mathrm{jet}_{i}) $\n",
"\n",
"After applying the masks, one can compute the missing transverse energy $H\\kern-0.575em\\raise-0.375ex{\\large/}_{\\mathrm{T}}$ and $ \\Delta \\phi (\\vec{H\\kern-0.575em\\raise-0.375ex{\\large/}_{\\mathrm{T}}}, \\mathrm{jet}_{i}) $. The final distributions are visualized using the pre-defined function `plot_quantities`. For those who are interested in how this function works, feel free to look at the code in `utils.py`. The basic idea is a simple variable arrangement of histogram plots of physics object quantities by utilizing combinations of zip, itertools.product functions and other minor features. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
" \n",
"**Task 4**\n",
" \n",
"Compute the missing transverse energy and the angle difference between the missing transverse energy and each of the three hardest jets $ | \\Delta \\phi (\\vec{H\\kern-0.575em\\raise-0.375ex{\\large/}_{\\mathrm{T}}}, \\mathrm{jet}_{i})| $: \n",
"\n",
"1. Find the $x$- and $y$-components of the missing transverse energy using $\\phi$ angle. \n",
"2. Initialize a vector object for $H\\kern-0.575em\\raise-0.375ex{\\large/}_{\\mathrm{T}}$ using the $x$- and $y$-components.\n",
"3. Compute the missing transverse energy from the $x$- and $y$-components.\n",
"4. Find the angle difference $| \\Delta \\phi (\\vec{H\\kern-0.575em\\raise-0.375ex{\\large/}_{\\mathrm{T}}}, \\mathrm{jet}_{i})| $ using well known function from the scikit-HEP vector package. \n",
" \n",
"For reference, please refer to the documentation of the [scikit-HEP vector package](https://github.com/scikit-hep/vector). You might find the function `vector.obj.deltaphi(vector.obj)` especially useful. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for ID in [0,1,2,3,4,5,6]: \n",
" \n",
" #apply mask to the sample\n",
" masked_df = Files[ID].mask(~Selection2[ID])\n",
" #get a list of jets as strings \n",
" jets = Sample.JetNames(Files[ID])\n",
" \n",
" \n",
" #Calculate MHt components with selection \n",
" \n",
" ### YOUR CODE HERE\n",
" \n",
" # save MHT\n",
" Sensitive_Variables[ID]['MHt'] = MHt\n",
" \n",
" # Save MHt as a vector object using the x- and y-components \n",
" MHt_vec = vec.arr(dict(x=MHtx, y=MHty)) \n",
" \n",
" #Calculate Delta Phi \n",
" \n",
" # Calculate the angle of missing transverse MHt (MHtPhi) using the vector library \n",
" MHtPhi = MHt_vec.phi\n",
" \n",
" ### YOUR CODE HERE\n",
" \n",
" # save the results \n",
" Sensitive_Variables[ID]['DeltaPhi1'] = DeltaPhi1\n",
" Sensitive_Variables[ID]['DeltaPhi2'] = DeltaPhi2 \n",
" Sensitive_Variables[ID]['DeltaPhi3'] = DeltaPhi3 \n",
"\n",
" # FOR QCD need weight to properly treat events from different HT bins at generation\n",
" if ID == 4:\n",
" Weights = Files[ID]['Weight'].values\n",
" Weights = np.squeeze(Weights) # change the shape from (N,1) to (N,), does not effect data only changes shape\n",
" else:\n",
" Weights = None\n",
" \n",
" utils.plot_quantities(df=Sensitive_Variables[ID]\n",
" ,column=['NJets','Ht','MHt','DeltaPhi1','DeltaPhi2','DeltaPhi3']\n",
" ,quantity=None\n",
" ,bins=[10,30,24,24,24,24]\n",
" ,hist_range = [(-0.5,9.5),(0.1, 3000),(0.1, 1000),(0, 3.2),(0, 3.2),(0, 3.2)]\n",
" ,density=False\n",
" ,weights=Weights\n",
" ,label=Sample.toString(ID)\n",
" ,unit=['', 'GeV', 'GeV','Radian', 'Radian', 'Radian']\n",
" ,yscale = ['linear','log','log','linear','linear','linear']\n",
" ,suptitle=Sample.toString(ID)\n",
" ,color=Sample.color(ID))\n",
" print('\\n'*3)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comparison of distributions \n",
"After computing the sensitive variables for **all samples**, we will now compare the shapes of the different kinematic distributions of the different processes. In the following cell the $N_{\\mathrm{jets}}$ and $H_{\\mathrm{T}}$ distributions are plotted. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"# plot sample comparision \n",
"Labels = [Sample.toString(ID) for ID in range(7)]\n",
"Colors = [Sample.color(ID) for ID in range(7)]\n",
"\n",
"utils.plot_quantities(df=Sensitive_Variables, column=['NJets','Ht','MHt'], quantity=None,\n",
" bins=[10,30,24], hist_range = [(-0.5,9.5),(0.1, 3000),(0.1, 1000)], density=True, weights=Weights,\n",
" label=Labels, unit=['','GeV','GeV'], yscale = ['linear','log','log'], suptitle=Sample.toString(ID), color=Colors)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"\n",
"**Question 4**\n",
" \n",
"How do the distributions for the different processes differ? In particular:\n",
"* Explain the differences of the $N_{\\mathrm{jets}}$ and $H_{\\mathrm{T}}$ distributions of QCD and Z($\\nu \\nu $) + jets events.\n",
"* Explain the different $H\\kern-0.575em\\raise-0.375ex{\\large/}_{\\mathrm{T}}$ distributions of the QCD and the other processes. Hint: what is the primary source of $H\\kern-0.575em\\raise-0.375ex{\\large/}_{\\mathrm{T}}$ in the Z($\\nu \\nu $) + jets, W($ l \\nu$) + jets and $t\\bar{t}$ + jets events? Where does $H\\kern-0.575em\\raise-0.375ex{\\large/}_{\\mathrm{T}}$ stem from in QCD events?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Expected event yields\n",
"We will now use the simulated events to estimate the number of SM-background events after the full baseline event selection. \n",
"The yields dataframe initialized below stores the number of simulated events passing the selection. The first column contains the number\n",
"of events after the baseline selection. The further columns store the number of events after tighter requirements (on top of the baseline selection) on $H_{\\mathrm{T}}$, $H\\kern-0.575em\\raise-0.375ex{\\large/}_{\\mathrm{T}}$, and $N_{\\mathrm{jets}} $. Have a look at the code to identify the additional requirements.\n",
"
\n",
"\n",
"By comparing the number of events after the baseline selection with the total number of events given in Table 1, we can compute the total selection efficiency:\n",
"\n",
"
$\\epsilon = \\dfrac{\\text{number of MC events after selection requirements}}{\\text{total number of MC events}} $
\n",
" \n",
"Together with the cross section and the integrated luminosity, we can then compute the expected number of events in a given data sample for a process $i$: \n",
" $N_{\\mathrm{exp},i} = \\epsilon_i \\cdot \\sigma_i \\cdot L $ \n",
" \n",
"We do not have to compute the cross section normalisation ourselves every time. Instead, we can use the weights already stored in the dataframes, which include the cross section normalisation. In addition, the weights contain a correction to properly describe the impact of additional proton-proton collisions in the same event (pile-up)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# initlize Yields dataframe \n",
"\n",
"Yields = pd.DataFrame(columns=['baseline', 'Njets<7','Njets>=7','HT<1700','HT>1700','MHT<600','MHT>600'],\n",
" index=['Data', 'WJets', 'TTJets','ZJets','QCD','LM6','LM9']) \n",
"print(Yields) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**After** computing the $N_{\\mathrm{jets}}$, $H_{\\mathrm{T}}$ and $H\\kern-0.575em\\raise-0.375ex{\\large/}_{\\mathrm{T}}$ variables in the previous cells, we can apply the full baseline selection and plot the final distributions. We will also determine the expected event yields for the background and signal processes. \n",
"We do not have to compute the cross section normalisation ourselves every time. Instead, we can use the weight already stored in CSV files. It is saved in\n",
"the column 'Weight' and includes the cross section normalisation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"**Task 5:**\n",
" \n",
"Complete the implementation of the baseline selection: $N_{\\mathrm{jets}} \\geq 3$, $H\\kern-0.575em\\raise-0.375ex{\\large/}_{\\mathrm{T}} > 200 \\,\\mathrm{GeV}$ and $H_{\\mathrm{T}} > 500 \\,\\mathrm{GeV}$ in the cell below. \n",
"\n",
"**Task 6:**\n",
" \n",
"Determine the yields after applying the $H\\kern-0.575em\\raise-0.375ex{\\large/}_{\\mathrm{T}} > 600 \\,\\mathrm{GeV}$ and $H_{\\mathrm{T}} > 1700 \\,\\mathrm{GeV}$ requirements. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\n",
"Final_Variables = [] # List of Final_Variables as Dataframes after the baseline selection\n",
"\n",
"# loop over the samples \n",
"for ID in [0,1,2,3,4,5,6]: \n",
" \n",
" # add the weights to the sensitive variables dataframe \n",
" Weights = Files[ID]['Weight'].values\n",
" Weights = np.squeeze(Weights) # change the shape from (N,1) to (N,), does not effect data only changes shape\n",
" Sensitive_Variables[ID]['Weights'] = Weights \n",
" \n",
" # complete the Baseline Selection\n",
" Baseline_Selection = ((Sensitive_Variables[ID]['NJets']>=3)\n",
" ### YOUR CODE HERE\n",
" )\n",
" \n",
" # applying Baseline_Selection\n",
" Variables_after_Cut = Sensitive_Variables[ID].mask(~ Baseline_Selection) \n",
" Yields.iloc[ID,0] = np.nansum(Variables_after_Cut['Weights'].values) # np.nansum: sums entries, while ignoring nan values \n",
" Final_Variables.append(Variables_after_Cut)\n",
" \n",
" # applying selection criteria 1 \n",
" Variables_after_Cut = Sensitive_Variables[ID].mask(~(Baseline_Selection & (Sensitive_Variables[ID]['NJets']< 7)))\n",
" Yields.iloc[ID,1] = np.nansum(Variables_after_Cut['Weights'].values)\n",
" \n",
" # applying selection criteria 2 \n",
" Variables_after_Cut = Sensitive_Variables[ID].mask(~(Baseline_Selection & (Sensitive_Variables[ID]['NJets']>= 7)))\n",
" Yields.iloc[ID,2] = np.nansum(Variables_after_Cut['Weights'].values)\n",
" \n",
" # applying selection criteria 3 \n",
" Variables_after_Cut = Sensitive_Variables[ID].mask(~(Baseline_Selection & (Sensitive_Variables[ID]['Ht']< 1700)))\n",
" Yields.iloc[ID,3] = np.nansum(Variables_after_Cut['Weights'].values)\n",
" \n",
" # applying selection criteria 4 \n",
" ### YOUR CODE HERE\n",
" \n",
" # applying selection criteria 5 \n",
" Variables_after_Cut = Sensitive_Variables[ID].mask(~(Baseline_Selection & (Sensitive_Variables[ID]['MHt']< 600)))\n",
" Yields.iloc[ID,5] = np.nansum(Variables_after_Cut['Weights'].values)\n",
" \n",
" # applying selection criteria 6\n",
" ### YOUR CODE HERE\n",
" \n",
" # Plotting the results \n",
" utils.plot_quantities(df=Final_Variables[ID]\n",
" ,column=['NJets','Ht','MHt','DeltaPhi1','DeltaPhi2','DeltaPhi3']\n",
" ,quantity=None\n",
" ,bins=[9,25,16,24,24,24]\n",
" ,hist_range = [(2.5,11.5),(500, 3000),(200, 1000),(0,3.2),(0,3.2),(0,3.2)]\n",
" ,density=False\n",
" ,weights=Final_Variables[ID]['Weights']\n",
" ,label=Sample.toString(ID)\n",
" ,unit=['', 'GeV', 'GeV','Radian', 'Radian', 'Radian']\n",
" ,yscale = ['linear','log','log','linear','linear','linear']\n",
" ,suptitle=Sample.toString(ID)\n",
" ,color=Sample.color(ID))\n",
" \n",
" print('\\n'*3)\n",
" \n",
"print(Yields)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"\n",
"**Question 5**\n",
" \n",
"Discuss the result. Which background dominates in which phase-space region? Does this match your initial expectation (Question 3)?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data\n",
"\n",
"We will now compare the $N_{\\mathrm{jets}}$, $H_{\\mathrm{T}}$, and $H\\kern-0.575em\\raise-0.375ex{\\large/}_{\\mathrm{T}}$ distributions in real proton-proton collision data with the sum of the background distributions obtained from simulation. Both potential signals are also shown, but not added to the stack of background processes. This is all done using the pre-defined compare plotting function. Please refer to `Sample.py` if you are interested in the exact implantation of the plotting function."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Plot_Bins = [np.arange(2.5,11.5,1),range(500,3100,100),range(200,1000,50)]\n",
"Data = Final_Variables[0]\n",
"Bkg = [Final_Variables[i] for i in [1,2,3,4]]\n",
"Sgl = [Final_Variables[i] for i in [5,6]]\n",
"Labels = [Sample.toString(i) for i in range(len(Final_Variables))] # note that Labels = [Data_label, Background_labels, Signal_labels] \n",
"Colors = [Sample.color(i) for i in range(len(Final_Variables))] # # note that colors = [Data_label, Background_labels, Signal_labels] \n",
"\n",
"\n",
"Sample.Compare(Bkg,Sgl,Data,Bins = Plot_Bins,Labels = Labels, Colors = Colors )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"\n",
"**Question 6**\n",
" \n",
"Do you observe any deviations of the data from the SM background expectation? What can you say about the existence of any new physics processes?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"\n",
"**Question 7**\n",
" \n",
"Which uncertainty is represented by the uncertainty bars? Are there any further uncertainties to be considered?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"\n",
"[1] The CMS Collaboration, “Search for new physics in the multijet and missing transverse momentum final state in proton-proton collisions at $\\sqrt{s}= 8$ TeV”, JHEP 06 (2014) 055, arXiv:1402.4770. doi:10.1007/JHEP06(2014)055.\n",
"\n",
"[2] H. Baer and X. Tata, “Weak Scale Supersymmetry: From Superfields to Scattering Events”. Cambridge University Press, 2006. ISBN 0-521-85786-4.\n",
"\n",
"[3] S. P. Martin, “A Supersymmetry primer”, arXiv:hep-ph/9709356.\n",
"\n",
"[4] The CMS Collaboration, “CMS technical design report, volume II: Physics performance”, J. Phys. G34 (2007) 995. doi:10.1088/0954-3899/34/6/S01.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}