{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "84b2ed6d",
   "metadata": {},
   "source": [
    "# Exercise Sheet No. 2\n",
    "\n",
    "---\n",
    "\n",
    "> Machine Learning for Natural Sciences, Summer 2024, Jun.-Prof. Pascal Friederich, pascal.friederich@kit.edu\n",
    "\n",
    "> Instructor: Marlen Neubert (marlen.neubert@kit.edu)\n",
    "\n",
    "---\n",
    "**Deadline**: Monday, April 29th 8am \n",
    "\n",
    "**Topic**: This exercise deals with decision trees and random forests. We examine the parameters and properties of these two algorithms on a binary classification example using [`sklearn`](https://scikit-learn.org/stable/) methods."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3aac18d1",
   "metadata": {},
   "source": [
    "### Please put your name and your group members here: \n",
    "You are encouraged to work in groups of a maximum of 3 people, however **each of you** has to submit a solution.\n",
    "\n",
    "Nils Lennart Bruns, usxfs\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eee3c2a3",
   "metadata": {},
   "source": [
    "## Preliminaries\n",
    "If you are not familiar with Python, you may want to learn more about Python\n",
    "and its basic syntax. Since there are a lof of free and well written tutorials\n",
    " online, we refer you to one of the following online tutorials:\n",
    "\n",
    "* http://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook\n",
    "* https://www.learnpython.org/\n",
    "* https://automatetheboringstuff.com/"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20c7a0a0",
   "metadata": {},
   "source": [
    "## 1.1 Data Preprocessing and Exploration\n",
    "\n",
    "The data we will be working with is the breast cancer dataset from the [University of Wisconsin](http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29) - a binary classification dataset for diagnosing breast cancer. \\\n",
    "It contains 30 features which are derived from digitized images and describe characteristics of the cell nuclei. Corresponding labels describe the stage of cancer as either \\\n",
    "`B`: benign, the tumor doesn’t contain cancerous cells or \\\n",
    "`M`: malignant, the tumor contains cancerous cells. \n",
    "\n",
    "### Problem Description\n",
    "We want to predict whether a breast cancer tumor is benign or malignant. This is a binary classification problem since we have two output classes.\\\n",
    "Before we can start training our algorithms we have to get familiar with the data and prepare it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "84713313",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn.preprocessing import LabelEncoder\n",
    "import requests\n",
    "from sklearn.tree import DecisionTreeClassifier\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.metrics import accuracy_score, confusion_matrix, ConfusionMatrixDisplay\n",
    "from sklearn.ensemble import RandomForestClassifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "e982076d",
   "metadata": {},
   "outputs": [],
   "source": [
    "data_url = \"https://bwsyncandshare.kit.edu/s/dCsEn6eK5S453Lq/download\"\n",
    "data_file = \"breast_cancer_data.csv\"\n",
    "if not os.path.exists(data_file):\n",
    "    print(\"Downloading dataset ...\")\n",
    "    with open(data_file, \"wb\") as f:\n",
    "        f.write(requests.get(data_url).content)\n",
    "    print(\"Downloading dataset done.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b38f90bf",
   "metadata": {},
   "source": [
    "We load the dataset via the data library ``pandas``, which will return a ``DataFrame`` object. We can print the head of the table with ``.head()``:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "2c83c30e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>diagnosis</th>\n",
       "      <th>radius_mean</th>\n",
       "      <th>texture_mean</th>\n",
       "      <th>perimeter_mean</th>\n",
       "      <th>area_mean</th>\n",
       "      <th>smoothness_mean</th>\n",
       "      <th>compactness_mean</th>\n",
       "      <th>concavity_mean</th>\n",
       "      <th>concave points_mean</th>\n",
       "      <th>...</th>\n",
       "      <th>texture_worst</th>\n",
       "      <th>perimeter_worst</th>\n",
       "      <th>area_worst</th>\n",
       "      <th>smoothness_worst</th>\n",
       "      <th>compactness_worst</th>\n",
       "      <th>concavity_worst</th>\n",
       "      <th>concave points_worst</th>\n",
       "      <th>symmetry_worst</th>\n",
       "      <th>fractal_dimension_worst</th>\n",
       "      <th>Unnamed: 32</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>842302</td>\n",
       "      <td>M</td>\n",
       "      <td>17.99</td>\n",
       "      <td>10.38</td>\n",
       "      <td>122.80</td>\n",
       "      <td>1001.0</td>\n",
       "      <td>0.11840</td>\n",
       "      <td>0.27760</td>\n",
       "      <td>0.3001</td>\n",
       "      <td>0.14710</td>\n",
       "      <td>...</td>\n",
       "      <td>17.33</td>\n",
       "      <td>184.60</td>\n",
       "      <td>2019.0</td>\n",
       "      <td>0.1622</td>\n",
       "      <td>0.6656</td>\n",
       "      <td>0.7119</td>\n",
       "      <td>0.2654</td>\n",
       "      <td>0.4601</td>\n",
       "      <td>0.11890</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>842517</td>\n",
       "      <td>M</td>\n",
       "      <td>20.57</td>\n",
       "      <td>17.77</td>\n",
       "      <td>132.90</td>\n",
       "      <td>1326.0</td>\n",
       "      <td>0.08474</td>\n",
       "      <td>0.07864</td>\n",
       "      <td>0.0869</td>\n",
       "      <td>0.07017</td>\n",
       "      <td>...</td>\n",
       "      <td>23.41</td>\n",
       "      <td>158.80</td>\n",
       "      <td>1956.0</td>\n",
       "      <td>0.1238</td>\n",
       "      <td>0.1866</td>\n",
       "      <td>0.2416</td>\n",
       "      <td>0.1860</td>\n",
       "      <td>0.2750</td>\n",
       "      <td>0.08902</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>84300903</td>\n",
       "      <td>M</td>\n",
       "      <td>19.69</td>\n",
       "      <td>21.25</td>\n",
       "      <td>130.00</td>\n",
       "      <td>1203.0</td>\n",
       "      <td>0.10960</td>\n",
       "      <td>0.15990</td>\n",
       "      <td>0.1974</td>\n",
       "      <td>0.12790</td>\n",
       "      <td>...</td>\n",
       "      <td>25.53</td>\n",
       "      <td>152.50</td>\n",
       "      <td>1709.0</td>\n",
       "      <td>0.1444</td>\n",
       "      <td>0.4245</td>\n",
       "      <td>0.4504</td>\n",
       "      <td>0.2430</td>\n",
       "      <td>0.3613</td>\n",
       "      <td>0.08758</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>84348301</td>\n",
       "      <td>M</td>\n",
       "      <td>11.42</td>\n",
       "      <td>20.38</td>\n",
       "      <td>77.58</td>\n",
       "      <td>386.1</td>\n",
       "      <td>0.14250</td>\n",
       "      <td>0.28390</td>\n",
       "      <td>0.2414</td>\n",
       "      <td>0.10520</td>\n",
       "      <td>...</td>\n",
       "      <td>26.50</td>\n",
       "      <td>98.87</td>\n",
       "      <td>567.7</td>\n",
       "      <td>0.2098</td>\n",
       "      <td>0.8663</td>\n",
       "      <td>0.6869</td>\n",
       "      <td>0.2575</td>\n",
       "      <td>0.6638</td>\n",
       "      <td>0.17300</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>84358402</td>\n",
       "      <td>M</td>\n",
       "      <td>20.29</td>\n",
       "      <td>14.34</td>\n",
       "      <td>135.10</td>\n",
       "      <td>1297.0</td>\n",
       "      <td>0.10030</td>\n",
       "      <td>0.13280</td>\n",
       "      <td>0.1980</td>\n",
       "      <td>0.10430</td>\n",
       "      <td>...</td>\n",
       "      <td>16.67</td>\n",
       "      <td>152.20</td>\n",
       "      <td>1575.0</td>\n",
       "      <td>0.1374</td>\n",
       "      <td>0.2050</td>\n",
       "      <td>0.4000</td>\n",
       "      <td>0.1625</td>\n",
       "      <td>0.2364</td>\n",
       "      <td>0.07678</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 33 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         id diagnosis  radius_mean  texture_mean  perimeter_mean  area_mean  \\\n",
       "0    842302         M        17.99         10.38          122.80     1001.0   \n",
       "1    842517         M        20.57         17.77          132.90     1326.0   \n",
       "2  84300903         M        19.69         21.25          130.00     1203.0   \n",
       "3  84348301         M        11.42         20.38           77.58      386.1   \n",
       "4  84358402         M        20.29         14.34          135.10     1297.0   \n",
       "\n",
       "   smoothness_mean  compactness_mean  concavity_mean  concave points_mean  \\\n",
       "0          0.11840           0.27760          0.3001              0.14710   \n",
       "1          0.08474           0.07864          0.0869              0.07017   \n",
       "2          0.10960           0.15990          0.1974              0.12790   \n",
       "3          0.14250           0.28390          0.2414              0.10520   \n",
       "4          0.10030           0.13280          0.1980              0.10430   \n",
       "\n",
       "   ...  texture_worst  perimeter_worst  area_worst  smoothness_worst  \\\n",
       "0  ...          17.33           184.60      2019.0            0.1622   \n",
       "1  ...          23.41           158.80      1956.0            0.1238   \n",
       "2  ...          25.53           152.50      1709.0            0.1444   \n",
       "3  ...          26.50            98.87       567.7            0.2098   \n",
       "4  ...          16.67           152.20      1575.0            0.1374   \n",
       "\n",
       "   compactness_worst  concavity_worst  concave points_worst  symmetry_worst  \\\n",
       "0             0.6656           0.7119                0.2654          0.4601   \n",
       "1             0.1866           0.2416                0.1860          0.2750   \n",
       "2             0.4245           0.4504                0.2430          0.3613   \n",
       "3             0.8663           0.6869                0.2575          0.6638   \n",
       "4             0.2050           0.4000                0.1625          0.2364   \n",
       "\n",
       "   fractal_dimension_worst  Unnamed: 32  \n",
       "0                  0.11890          NaN  \n",
       "1                  0.08902          NaN  \n",
       "2                  0.08758          NaN  \n",
       "3                  0.17300          NaN  \n",
       "4                  0.07678          NaN  \n",
       "\n",
       "[5 rows x 33 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.read_csv(data_file)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14164182",
   "metadata": {},
   "source": [
    "We see that the data consists of 33 columns and 569 rows - corresponding to 569 samples.\\\n",
    "The first column is called `id`, followed by `diagnosis` which contains the labels.\\\n",
    "First, we want to check the distribution of classes. Use a pandas method to count the number of benign and malignant data samples. The values of your answer should be integers assigned to the variables `B` and `M`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "ab7e9856",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "0b966c5a0ea0e2782eca6e0fc8667893",
     "grade": false,
     "grade_id": "cell-22ce838295d464ea",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "569 569\n"
     ]
    }
   ],
   "source": [
    "# look at distribution of classes\n",
    "B = None \n",
    "M = None\n",
    "\n",
    "B = (data[\"diagnosis\"] == \"B\").count()\n",
    "M = (data[\"diagnosis\"] == \"M\").count()\n",
    "\n",
    "print(B, M)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "441ed263",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "2526799b1762effc7e038418ceb7a8f2",
     "grade": true,
     "grade_id": "class_distribution",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "##### DO NOT CHANGE #####\n",
    "\n",
    "# check results - 1 point\n",
    "\n",
    "assert B != None and M != None, \"Please assign values to B and M!\"\n",
    "\n",
    "\n",
    "##### DO NOT CHANGE #####"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "641daac8",
   "metadata": {},
   "source": [
    "We also see that there is a column `Unnamed: 32` which doesn't contain any information.\\\n",
    "In the next step we therefore want to clean the data by removing unnecesary columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "1c2e6b0d",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "822f8cdb840f4d7e1676f59806a37941",
     "grade": false,
     "grade_id": "cell-5d0c5d760be5c1b0",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>diagnosis</th>\n",
       "      <th>radius_mean</th>\n",
       "      <th>texture_mean</th>\n",
       "      <th>perimeter_mean</th>\n",
       "      <th>area_mean</th>\n",
       "      <th>smoothness_mean</th>\n",
       "      <th>compactness_mean</th>\n",
       "      <th>concavity_mean</th>\n",
       "      <th>concave points_mean</th>\n",
       "      <th>symmetry_mean</th>\n",
       "      <th>...</th>\n",
       "      <th>radius_worst</th>\n",
       "      <th>texture_worst</th>\n",
       "      <th>perimeter_worst</th>\n",
       "      <th>area_worst</th>\n",
       "      <th>smoothness_worst</th>\n",
       "      <th>compactness_worst</th>\n",
       "      <th>concavity_worst</th>\n",
       "      <th>concave points_worst</th>\n",
       "      <th>symmetry_worst</th>\n",
       "      <th>fractal_dimension_worst</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>M</td>\n",
       "      <td>17.99</td>\n",
       "      <td>10.38</td>\n",
       "      <td>122.80</td>\n",
       "      <td>1001.0</td>\n",
       "      <td>0.11840</td>\n",
       "      <td>0.27760</td>\n",
       "      <td>0.3001</td>\n",
       "      <td>0.14710</td>\n",
       "      <td>0.2419</td>\n",
       "      <td>...</td>\n",
       "      <td>25.38</td>\n",
       "      <td>17.33</td>\n",
       "      <td>184.60</td>\n",
       "      <td>2019.0</td>\n",
       "      <td>0.1622</td>\n",
       "      <td>0.6656</td>\n",
       "      <td>0.7119</td>\n",
       "      <td>0.2654</td>\n",
       "      <td>0.4601</td>\n",
       "      <td>0.11890</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>M</td>\n",
       "      <td>20.57</td>\n",
       "      <td>17.77</td>\n",
       "      <td>132.90</td>\n",
       "      <td>1326.0</td>\n",
       "      <td>0.08474</td>\n",
       "      <td>0.07864</td>\n",
       "      <td>0.0869</td>\n",
       "      <td>0.07017</td>\n",
       "      <td>0.1812</td>\n",
       "      <td>...</td>\n",
       "      <td>24.99</td>\n",
       "      <td>23.41</td>\n",
       "      <td>158.80</td>\n",
       "      <td>1956.0</td>\n",
       "      <td>0.1238</td>\n",
       "      <td>0.1866</td>\n",
       "      <td>0.2416</td>\n",
       "      <td>0.1860</td>\n",
       "      <td>0.2750</td>\n",
       "      <td>0.08902</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>M</td>\n",
       "      <td>19.69</td>\n",
       "      <td>21.25</td>\n",
       "      <td>130.00</td>\n",
       "      <td>1203.0</td>\n",
       "      <td>0.10960</td>\n",
       "      <td>0.15990</td>\n",
       "      <td>0.1974</td>\n",
       "      <td>0.12790</td>\n",
       "      <td>0.2069</td>\n",
       "      <td>...</td>\n",
       "      <td>23.57</td>\n",
       "      <td>25.53</td>\n",
       "      <td>152.50</td>\n",
       "      <td>1709.0</td>\n",
       "      <td>0.1444</td>\n",
       "      <td>0.4245</td>\n",
       "      <td>0.4504</td>\n",
       "      <td>0.2430</td>\n",
       "      <td>0.3613</td>\n",
       "      <td>0.08758</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>M</td>\n",
       "      <td>11.42</td>\n",
       "      <td>20.38</td>\n",
       "      <td>77.58</td>\n",
       "      <td>386.1</td>\n",
       "      <td>0.14250</td>\n",
       "      <td>0.28390</td>\n",
       "      <td>0.2414</td>\n",
       "      <td>0.10520</td>\n",
       "      <td>0.2597</td>\n",
       "      <td>...</td>\n",
       "      <td>14.91</td>\n",
       "      <td>26.50</td>\n",
       "      <td>98.87</td>\n",
       "      <td>567.7</td>\n",
       "      <td>0.2098</td>\n",
       "      <td>0.8663</td>\n",
       "      <td>0.6869</td>\n",
       "      <td>0.2575</td>\n",
       "      <td>0.6638</td>\n",
       "      <td>0.17300</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>M</td>\n",
       "      <td>20.29</td>\n",
       "      <td>14.34</td>\n",
       "      <td>135.10</td>\n",
       "      <td>1297.0</td>\n",
       "      <td>0.10030</td>\n",
       "      <td>0.13280</td>\n",
       "      <td>0.1980</td>\n",
       "      <td>0.10430</td>\n",
       "      <td>0.1809</td>\n",
       "      <td>...</td>\n",
       "      <td>22.54</td>\n",
       "      <td>16.67</td>\n",
       "      <td>152.20</td>\n",
       "      <td>1575.0</td>\n",
       "      <td>0.1374</td>\n",
       "      <td>0.2050</td>\n",
       "      <td>0.4000</td>\n",
       "      <td>0.1625</td>\n",
       "      <td>0.2364</td>\n",
       "      <td>0.07678</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 31 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "  diagnosis  radius_mean  texture_mean  perimeter_mean  area_mean  \\\n",
       "0         M        17.99         10.38          122.80     1001.0   \n",
       "1         M        20.57         17.77          132.90     1326.0   \n",
       "2         M        19.69         21.25          130.00     1203.0   \n",
       "3         M        11.42         20.38           77.58      386.1   \n",
       "4         M        20.29         14.34          135.10     1297.0   \n",
       "\n",
       "   smoothness_mean  compactness_mean  concavity_mean  concave points_mean  \\\n",
       "0          0.11840           0.27760          0.3001              0.14710   \n",
       "1          0.08474           0.07864          0.0869              0.07017   \n",
       "2          0.10960           0.15990          0.1974              0.12790   \n",
       "3          0.14250           0.28390          0.2414              0.10520   \n",
       "4          0.10030           0.13280          0.1980              0.10430   \n",
       "\n",
       "   symmetry_mean  ...  radius_worst  texture_worst  perimeter_worst  \\\n",
       "0         0.2419  ...         25.38          17.33           184.60   \n",
       "1         0.1812  ...         24.99          23.41           158.80   \n",
       "2         0.2069  ...         23.57          25.53           152.50   \n",
       "3         0.2597  ...         14.91          26.50            98.87   \n",
       "4         0.1809  ...         22.54          16.67           152.20   \n",
       "\n",
       "   area_worst  smoothness_worst  compactness_worst  concavity_worst  \\\n",
       "0      2019.0            0.1622             0.6656           0.7119   \n",
       "1      1956.0            0.1238             0.1866           0.2416   \n",
       "2      1709.0            0.1444             0.4245           0.4504   \n",
       "3       567.7            0.2098             0.8663           0.6869   \n",
       "4      1575.0            0.1374             0.2050           0.4000   \n",
       "\n",
       "   concave points_worst  symmetry_worst  fractal_dimension_worst  \n",
       "0                0.2654          0.4601                  0.11890  \n",
       "1                0.1860          0.2750                  0.08902  \n",
       "2                0.2430          0.3613                  0.08758  \n",
       "3                0.2575          0.6638                  0.17300  \n",
       "4                0.1625          0.2364                  0.07678  \n",
       "\n",
       "[5 rows x 31 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# clean the data by removing columns 'Unnamed: 32' and 'id'\n",
    "\n",
    "data = data.drop(['Unnamed: 32', 'id'], axis=1)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "aea25336",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "0c63eefea726df52338e686a1cb57161",
     "grade": true,
     "grade_id": "clean_data",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "##### DO NOT CHANGE #####\n",
    "\n",
    "#  1 point\n",
    "\n",
    "assert data.shape == (\n",
    "    569,\n",
    "    31,\n",
    "), \"Your data shape after removing the columns does not match!\"\n",
    "\n",
    "\n",
    "##### DO NOT CHANGE #####"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b89ca58a",
   "metadata": {},
   "source": [
    "The first column of the cleaned dataset should now correspond to the labels, the rest of the columns correspond to the features which we will assign to `X`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "ff1d12fd",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "379c26187eb932e5005313ac272192b4",
     "grade": false,
     "grade_id": "cell-60ccd97d76da9140",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "##### DO NOT CHANGE #####\n",
    "\n",
    "# Features\n",
    "X = data.drop(\"diagnosis\", axis=1)\n",
    "\n",
    "##### DO NOT CHANGE #####"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "758a6841",
   "metadata": {},
   "source": [
    "Next, we need to convert the categorical labels `B` and `M` into integers `0` and `1` as our model can only handle numeric data. \\\n",
    "We can do this easily by using the [`LabelEncoder()`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html) from `sklearn`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "216fe5e0",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "7c36d8e96a8df0d2841a53b3515bf1bb",
     "grade": false,
     "grade_id": "cell-4c39ba7e96a96a4b",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n",
      " 0 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 1 0 0 0 0 1 0 1 1\n",
      " 0 1 0 1 1 0 0 0 1 1 0 1 1 1 0 0 0 1 0 0 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0\n",
      " 0 0 0 0 0 0 1 1 1 0 1 1 0 0 0 1 1 0 1 0 1 1 0 1 1 0 0 1 0 0 1 0 0 0 0 1 0\n",
      " 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 0 0 1 0 0 1 1 1 0 1\n",
      " 0 1 0 0 0 1 0 0 1 1 0 1 1 1 1 0 1 1 1 0 1 0 1 0 0 1 0 1 1 1 1 0 0 1 1 0 0\n",
      " 0 1 0 0 0 0 0 1 1 0 0 1 0 0 1 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 1 1 1 1 1\n",
      " 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0\n",
      " 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 1 0 0\n",
      " 0 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1\n",
      " 1 0 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0\n",
      " 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 1 0 0\n",
      " 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0\n",
      " 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 1 0 1 0 1 0 0 0 0 0 1 0 0 1 0 1 0 1 1\n",
      " 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
      " 0 0 0 0 0 0 0 1 1 1 1 1 1 0]\n"
     ]
    }
   ],
   "source": [
    "# 2 points\n",
    "\n",
    "# categorical y values\n",
    "y_categorical = data[\"diagnosis\"].values\n",
    "\n",
    "# Assign a LabelEncoder object to labelencoder_y and obtain the encoded labels as y.\n",
    "labelencoder_y = None\n",
    "y = None\n",
    "\n",
    "labelencoder_y = LabelEncoder()\n",
    "labelencoder_y.fit(y_categorical)\n",
    "y = labelencoder_y.transform(y_categorical)\n",
    "print(y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "beab50b4",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "d95bd19da8f9fa60c11dd9317dff27ce",
     "grade": true,
     "grade_id": "label_encoder",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "##### DO NOT CHANGE #####\n",
    "\n",
    "assert isinstance(\n",
    "    labelencoder_y, LabelEncoder\n",
    "), \"The labelencoder should be an instance of the sklearn LabelEncoder\"\n",
    "\n",
    "# hidden test label encoder - 1 point\n",
    "\n",
    "\n",
    "##### DO NOT CHANGE #####"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0121eed7",
   "metadata": {},
   "source": [
    "In our last preprocessing step we need to divide the data into a training and test set. We use the training set for training and keep the test set for evaluating a trained classifier which gives us the generalization error.\n",
    "\n",
    "We use [`train_test_split()`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html?highlight=train_test_split#sklearn.model_selection.train_test_split) to split 80% of `X` and `y` as training set and use the rest as test set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "424576f4",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "ba414b9656d7bf8e57eebd77cd445f88",
     "grade": false,
     "grade_id": "cell-4158f8d16c082fa1",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "##### DO NOT CHANGE #####\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    X, y, test_size=0.2, random_state=42\n",
    ")\n",
    "\n",
    "##### DO NOT CHANGE #####"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a342c847",
   "metadata": {},
   "source": [
    "## 1.2 Decision Tree Classifier\n",
    "We are now ready to train a decision tree classifier. \\\n",
    "We will use the  [`DecisionTreeClassifier()`](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) available in sklearn.\n",
    "\n",
    "### Entropy and Gini Index\n",
    "One parameter we have to choose is the function to measure the quality of a split i.e. the `criterion` which measures the impurity of a split.\\\n",
    "Possible criteria are `entropy` and `gini` which you also have seen in the lecture. Both quantify the uncertainty or disorder in a dataset's distribution of classes. A higher value implies greater disorder.\\\n",
    "In decision tree algorithms, the goal is therefore to reduce entropy (or the gini index) by making splits that result in more homogeneous subsets of data.\n",
    "\n",
    "Consider a dataset with 100 samples belonging to two classes (class A and class B). Assume that each class has an equal probability of occurrence. Calculate both the entropy and the gini index of the dataset using the formulas given in the lecture.\\\n",
    "Assign your answers as floats to the variables below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "5fcdc4e3",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "ef7e4111192e4fbcc6c40fb8a7eb8ff1",
     "grade": false,
     "grade_id": "cell-16abc80a7ce2e979",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "50.0 25.0\n"
     ]
    }
   ],
   "source": [
    "# assign values as floats - 2 points\n",
    "entropy = None\n",
    "gini_index = None\n",
    "\n",
    "entropy = -0.5*np.log2(0.5)*100\n",
    "gini_index = 100*.25\n",
    "\n",
    "print(entropy, gini_index)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "4fc00fcb",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "f71adb6b074026032ce149e523dafa3a",
     "grade": true,
     "grade_id": "entropy_gini_index",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "##### DO NOT CHANGE #####\n",
    "\n",
    "assert (\n",
    "    entropy != None and gini_index != None\n",
    "), \"Please assign values to entropy and gini index!\"\n",
    "\n",
    "\n",
    "##### DO NOT CHANGE #####"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "965039b7",
   "metadata": {},
   "source": [
    "We can now initialize the decision tree classifier using the gini index as splitting criterion and a fixed maximum depth of the tree. We use a specific random state to make the results reproducable :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "9ad7b8bf",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "e86e3efb9f75c3334390df99ab7c1387",
     "grade": false,
     "grade_id": "cell-4fa71eabfa742600",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "##### DO NOT CHANGE #####\n",
    "\n",
    "# Initialize DecisionTreeClassifier\n",
    "tree_classifier = DecisionTreeClassifier(criterion=\"gini\", max_depth=5, random_state=42)\n",
    "\n",
    "##### DO NOT CHANGE #####"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2cef27dc",
   "metadata": {},
   "source": [
    "Now, we can train the DecisionTreeClassifier on the training data using the fit() method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "6a74feb6",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "17a5a0f734417d7878ce1b95fa3ddb03",
     "grade": false,
     "grade_id": "cell-aaf56bb6680fbe00",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>#sk-container-id-1 {\n",
       "  /* Definition of color scheme common for light and dark mode */\n",
       "  --sklearn-color-text: black;\n",
       "  --sklearn-color-line: gray;\n",
       "  /* Definition of color scheme for unfitted estimators */\n",
       "  --sklearn-color-unfitted-level-0: #fff5e6;\n",
       "  --sklearn-color-unfitted-level-1: #f6e4d2;\n",
       "  --sklearn-color-unfitted-level-2: #ffe0b3;\n",
       "  --sklearn-color-unfitted-level-3: chocolate;\n",
       "  /* Definition of color scheme for fitted estimators */\n",
       "  --sklearn-color-fitted-level-0: #f0f8ff;\n",
       "  --sklearn-color-fitted-level-1: #d4ebff;\n",
       "  --sklearn-color-fitted-level-2: #b3dbfd;\n",
       "  --sklearn-color-fitted-level-3: cornflowerblue;\n",
       "\n",
       "  /* Specific color for light theme */\n",
       "  --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
       "  --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
       "  --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
       "  --sklearn-color-icon: #696969;\n",
       "\n",
       "  @media (prefers-color-scheme: dark) {\n",
       "    /* Redefinition of color scheme for dark theme */\n",
       "    --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
       "    --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
       "    --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
       "    --sklearn-color-icon: #878787;\n",
       "  }\n",
       "}\n",
       "\n",
       "#sk-container-id-1 {\n",
       "  color: var(--sklearn-color-text);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 pre {\n",
       "  padding: 0;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 input.sk-hidden--visually {\n",
       "  border: 0;\n",
       "  clip: rect(1px 1px 1px 1px);\n",
       "  clip: rect(1px, 1px, 1px, 1px);\n",
       "  height: 1px;\n",
       "  margin: -1px;\n",
       "  overflow: hidden;\n",
       "  padding: 0;\n",
       "  position: absolute;\n",
       "  width: 1px;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-dashed-wrapped {\n",
       "  border: 1px dashed var(--sklearn-color-line);\n",
       "  margin: 0 0.4em 0.5em 0.4em;\n",
       "  box-sizing: border-box;\n",
       "  padding-bottom: 0.4em;\n",
       "  background-color: var(--sklearn-color-background);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-container {\n",
       "  /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
       "     but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
       "     so we also need the `!important` here to be able to override the\n",
       "     default hidden behavior on the sphinx rendered scikit-learn.org.\n",
       "     See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
       "  display: inline-block !important;\n",
       "  position: relative;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-text-repr-fallback {\n",
       "  display: none;\n",
       "}\n",
       "\n",
       "div.sk-parallel-item,\n",
       "div.sk-serial,\n",
       "div.sk-item {\n",
       "  /* draw centered vertical line to link estimators */\n",
       "  background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
       "  background-size: 2px 100%;\n",
       "  background-repeat: no-repeat;\n",
       "  background-position: center center;\n",
       "}\n",
       "\n",
       "/* Parallel-specific style estimator block */\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel-item::after {\n",
       "  content: \"\";\n",
       "  width: 100%;\n",
       "  border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
       "  flex-grow: 1;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel {\n",
       "  display: flex;\n",
       "  align-items: stretch;\n",
       "  justify-content: center;\n",
       "  background-color: var(--sklearn-color-background);\n",
       "  position: relative;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel-item {\n",
       "  display: flex;\n",
       "  flex-direction: column;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel-item:first-child::after {\n",
       "  align-self: flex-end;\n",
       "  width: 50%;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel-item:last-child::after {\n",
       "  align-self: flex-start;\n",
       "  width: 50%;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-parallel-item:only-child::after {\n",
       "  width: 0;\n",
       "}\n",
       "\n",
       "/* Serial-specific style estimator block */\n",
       "\n",
       "#sk-container-id-1 div.sk-serial {\n",
       "  display: flex;\n",
       "  flex-direction: column;\n",
       "  align-items: center;\n",
       "  background-color: var(--sklearn-color-background);\n",
       "  padding-right: 1em;\n",
       "  padding-left: 1em;\n",
       "}\n",
       "\n",
       "\n",
       "/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
       "clickable and can be expanded/collapsed.\n",
       "- Pipeline and ColumnTransformer use this feature and define the default style\n",
       "- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
       "*/\n",
       "\n",
       "/* Pipeline and ColumnTransformer style (default) */\n",
       "\n",
       "#sk-container-id-1 div.sk-toggleable {\n",
       "  /* Default theme specific background. It is overwritten whether we have a\n",
       "  specific estimator or a Pipeline/ColumnTransformer */\n",
       "  background-color: var(--sklearn-color-background);\n",
       "}\n",
       "\n",
       "/* Toggleable label */\n",
       "#sk-container-id-1 label.sk-toggleable__label {\n",
       "  cursor: pointer;\n",
       "  display: block;\n",
       "  width: 100%;\n",
       "  margin-bottom: 0;\n",
       "  padding: 0.5em;\n",
       "  box-sizing: border-box;\n",
       "  text-align: center;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 label.sk-toggleable__label-arrow:before {\n",
       "  /* Arrow on the left of the label */\n",
       "  content: \"▸\";\n",
       "  float: left;\n",
       "  margin-right: 0.25em;\n",
       "  color: var(--sklearn-color-icon);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {\n",
       "  color: var(--sklearn-color-text);\n",
       "}\n",
       "\n",
       "/* Toggleable content - dropdown */\n",
       "\n",
       "#sk-container-id-1 div.sk-toggleable__content {\n",
       "  max-height: 0;\n",
       "  max-width: 0;\n",
       "  overflow: hidden;\n",
       "  text-align: left;\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-0);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-toggleable__content.fitted {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-0);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-toggleable__content pre {\n",
       "  margin: 0.2em;\n",
       "  border-radius: 0.25em;\n",
       "  color: var(--sklearn-color-text);\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-0);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-toggleable__content.fitted pre {\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-0);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
       "  /* Expand drop-down */\n",
       "  max-height: 200px;\n",
       "  max-width: 100%;\n",
       "  overflow: auto;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
       "  content: \"▾\";\n",
       "}\n",
       "\n",
       "/* Pipeline/ColumnTransformer-specific style */\n",
       "\n",
       "#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
       "  color: var(--sklearn-color-text);\n",
       "  background-color: var(--sklearn-color-unfitted-level-2);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
       "  background-color: var(--sklearn-color-fitted-level-2);\n",
       "}\n",
       "\n",
       "/* Estimator-specific style */\n",
       "\n",
       "/* Colorize estimator box */\n",
       "#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-2);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-2);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-label label.sk-toggleable__label,\n",
       "#sk-container-id-1 div.sk-label label {\n",
       "  /* The background is the default theme color */\n",
       "  color: var(--sklearn-color-text-on-default-background);\n",
       "}\n",
       "\n",
       "/* On hover, darken the color of the background */\n",
       "#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {\n",
       "  color: var(--sklearn-color-text);\n",
       "  background-color: var(--sklearn-color-unfitted-level-2);\n",
       "}\n",
       "\n",
       "/* Label box, darken color on hover, fitted */\n",
       "#sk-container-id-1 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
       "  color: var(--sklearn-color-text);\n",
       "  background-color: var(--sklearn-color-fitted-level-2);\n",
       "}\n",
       "\n",
       "/* Estimator label */\n",
       "\n",
       "#sk-container-id-1 div.sk-label label {\n",
       "  font-family: monospace;\n",
       "  font-weight: bold;\n",
       "  display: inline-block;\n",
       "  line-height: 1.2em;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-label-container {\n",
       "  text-align: center;\n",
       "}\n",
       "\n",
       "/* Estimator-specific */\n",
       "#sk-container-id-1 div.sk-estimator {\n",
       "  font-family: monospace;\n",
       "  border: 1px dotted var(--sklearn-color-border-box);\n",
       "  border-radius: 0.25em;\n",
       "  box-sizing: border-box;\n",
       "  margin-bottom: 0.5em;\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-0);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-estimator.fitted {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-0);\n",
       "}\n",
       "\n",
       "/* on hover */\n",
       "#sk-container-id-1 div.sk-estimator:hover {\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-2);\n",
       "}\n",
       "\n",
       "#sk-container-id-1 div.sk-estimator.fitted:hover {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-2);\n",
       "}\n",
       "\n",
       "/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
       "\n",
       "/* Common style for \"i\" and \"?\" */\n",
       "\n",
       ".sk-estimator-doc-link,\n",
       "a:link.sk-estimator-doc-link,\n",
       "a:visited.sk-estimator-doc-link {\n",
       "  float: right;\n",
       "  font-size: smaller;\n",
       "  line-height: 1em;\n",
       "  font-family: monospace;\n",
       "  background-color: var(--sklearn-color-background);\n",
       "  border-radius: 1em;\n",
       "  height: 1em;\n",
       "  width: 1em;\n",
       "  text-decoration: none !important;\n",
       "  margin-left: 1ex;\n",
       "  /* unfitted */\n",
       "  border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
       "  color: var(--sklearn-color-unfitted-level-1);\n",
       "}\n",
       "\n",
       ".sk-estimator-doc-link.fitted,\n",
       "a:link.sk-estimator-doc-link.fitted,\n",
       "a:visited.sk-estimator-doc-link.fitted {\n",
       "  /* fitted */\n",
       "  border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
       "  color: var(--sklearn-color-fitted-level-1);\n",
       "}\n",
       "\n",
       "/* On hover */\n",
       "div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
       ".sk-estimator-doc-link:hover,\n",
       "div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
       ".sk-estimator-doc-link:hover {\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-3);\n",
       "  color: var(--sklearn-color-background);\n",
       "  text-decoration: none;\n",
       "}\n",
       "\n",
       "div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
       ".sk-estimator-doc-link.fitted:hover,\n",
       "div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
       ".sk-estimator-doc-link.fitted:hover {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-3);\n",
       "  color: var(--sklearn-color-background);\n",
       "  text-decoration: none;\n",
       "}\n",
       "\n",
       "/* Span, style for the box shown on hovering the info icon */\n",
       ".sk-estimator-doc-link span {\n",
       "  display: none;\n",
       "  z-index: 9999;\n",
       "  position: relative;\n",
       "  font-weight: normal;\n",
       "  right: .2ex;\n",
       "  padding: .5ex;\n",
       "  margin: .5ex;\n",
       "  width: min-content;\n",
       "  min-width: 20ex;\n",
       "  max-width: 50ex;\n",
       "  color: var(--sklearn-color-text);\n",
       "  box-shadow: 2pt 2pt 4pt #999;\n",
       "  /* unfitted */\n",
       "  background: var(--sklearn-color-unfitted-level-0);\n",
       "  border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
       "}\n",
       "\n",
       ".sk-estimator-doc-link.fitted span {\n",
       "  /* fitted */\n",
       "  background: var(--sklearn-color-fitted-level-0);\n",
       "  border: var(--sklearn-color-fitted-level-3);\n",
       "}\n",
       "\n",
       ".sk-estimator-doc-link:hover span {\n",
       "  display: block;\n",
       "}\n",
       "\n",
       "/* \"?\"-specific style due to the `<a>` HTML tag */\n",
       "\n",
       "#sk-container-id-1 a.estimator_doc_link {\n",
       "  float: right;\n",
       "  font-size: 1rem;\n",
       "  line-height: 1em;\n",
       "  font-family: monospace;\n",
       "  background-color: var(--sklearn-color-background);\n",
       "  border-radius: 1rem;\n",
       "  height: 1rem;\n",
       "  width: 1rem;\n",
       "  text-decoration: none;\n",
       "  /* unfitted */\n",
       "  color: var(--sklearn-color-unfitted-level-1);\n",
       "  border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 a.estimator_doc_link.fitted {\n",
       "  /* fitted */\n",
       "  border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
       "  color: var(--sklearn-color-fitted-level-1);\n",
       "}\n",
       "\n",
       "/* On hover */\n",
       "#sk-container-id-1 a.estimator_doc_link:hover {\n",
       "  /* unfitted */\n",
       "  background-color: var(--sklearn-color-unfitted-level-3);\n",
       "  color: var(--sklearn-color-background);\n",
       "  text-decoration: none;\n",
       "}\n",
       "\n",
       "#sk-container-id-1 a.estimator_doc_link.fitted:hover {\n",
       "  /* fitted */\n",
       "  background-color: var(--sklearn-color-fitted-level-3);\n",
       "}\n",
       "</style><div id=\"sk-container-id-1\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>DecisionTreeClassifier(max_depth=5, random_state=42)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-1\" type=\"checkbox\" checked><label for=\"sk-estimator-id-1\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow fitted\">&nbsp;&nbsp;DecisionTreeClassifier<a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.4/modules/generated/sklearn.tree.DecisionTreeClassifier.html\">?<span>Documentation for DecisionTreeClassifier</span></a><span class=\"sk-estimator-doc-link fitted\">i<span>Fitted</span></span></label><div class=\"sk-toggleable__content fitted\"><pre>DecisionTreeClassifier(max_depth=5, random_state=42)</pre></div> </div></div></div></div>"
      ],
      "text/plain": [
       "DecisionTreeClassifier(max_depth=5, random_state=42)"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "##### DO NOT CHANGE #####\n",
    "\n",
    "tree_classifier.fit(X_train, y_train)\n",
    "\n",
    "##### DO NOT CHANGE #####"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0e4182c",
   "metadata": {},
   "source": [
    "Use the predict() method to predict the labels of the test set to check how well the model generalizes to unseen data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "af2b2cca",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "3f0434dcbb78de22f2063c8b41371c0f",
     "grade": false,
     "grade_id": "cell-93189e66e709ae9d",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "y_pred = None\n",
    "\n",
    "y_pred = tree_classifier.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "a4324177",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "fa9aa02db633f2043715e7eba0889683",
     "grade": true,
     "grade_id": "decision_tree_predict",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "##### DO NOT CHANGE #####\n",
    "\n",
    "#  1 point\n",
    "\n",
    "\n",
    "##### DO NOT CHANGE #####"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "280aad8e",
   "metadata": {},
   "source": [
    "### Accuracy\n",
    "\n",
    "Since we now have the predicted labels of the test set we can use them to evaluate the accuracy of the model by comparing them to the 'true' labels.\\\n",
    "The accuracy is defined as:\n",
    "\\begin{align}\n",
    "Accuracy &= \\frac{Number\\,of\\,correct\\,predictions}{Total\\,number\\,of\\,predictions}\n",
    "\\end{align}\n",
    "\n",
    "Use [`accuracy_score`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) to get the accuracy of the trained decision tree:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "1cde2417",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "a5156c5788d87e0561436147ab227301",
     "grade": false,
     "grade_id": "cell-80843b60a7bf0f8b",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.9473684210526315\n"
     ]
    }
   ],
   "source": [
    "# Evaluate the accuracy of the model - 1 point\n",
    "\n",
    "accuracy = None\n",
    "\n",
    "accuracy = (y_pred == y_test).sum() / len(y_pred)\n",
    "print(accuracy)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "70bbcc22",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "42441dbf38b69d391f0263b24d74a3f9",
     "grade": true,
     "grade_id": "decision_tree_accuracy",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "##### DO NOT CHANGE #####\n",
    "\n",
    "assert accuracy != None\n",
    "\n",
    "\n",
    "##### DO NOT CHANGE #####"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9725ffc5",
   "metadata": {},
   "source": [
    "## 1.3 Random Forest Classifier\n",
    "\n",
    "A random forest classifier is an ensemble method consisting of multiple decision trees. \n",
    "By combining bagging and a random split selection, random forests generally have many advantages over single decision trees like improved generalization, robustness to noise and higher accuracy.\n",
    "By averaging predictions from multiple trees, overfitting can also be reduced.\n",
    "Since features are intrinsically evaluated the interpretability of single decision trees is kept.\\\n",
    "In this part of the exercise we fit a random forest classifier to our data and compare its perfomance to the single tree classifier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "b6c5bc90",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "ce74131157e9146ca14b14d6528e283f",
     "grade": false,
     "grade_id": "cell-6d77200d90a73fdd",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "##### DO NOT CHANGE #####\n",
    "\n",
    "rf_classifier = RandomForestClassifier(n_estimators=50, max_depth=2, random_state=42)\n",
    "\n",
    "##### DO NOT CHANGE #####"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d6b8326a",
   "metadata": {},
   "source": [
    "Again, fit the training data to the random forest classifier, predict the labels of the test set and evaluate the accuracy of the trained model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "0706e73b",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "ea1ee1eb4d520d971b77e259ea350a04",
     "grade": false,
     "grade_id": "cell-e5d3fc5b559d208d",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.956140350877193\n"
     ]
    }
   ],
   "source": [
    "y_pred_rf = None\n",
    "accuracy_rf = None\n",
    "\n",
    "rf_classifier.fit(X_train, y_train)\n",
    "y_pred_rf = rf_classifier.predict(X_test)\n",
    "accuracy_rf = (y_pred_rf == y_test).sum() / len(y_pred)\n",
    "print(accuracy_rf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "9c8baec7",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "d70a1f33852ada1160360589115c9f06",
     "grade": true,
     "grade_id": "rf_implementation",
     "locked": true,
     "points": 2,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "##### DO NOT CHANGE #####\n",
    "\n",
    "# check results - 2 points\n",
    "\n",
    "\n",
    "##### DO NOT CHANGE #####"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ffd3ddd1",
   "metadata": {},
   "source": [
    "### Confusion Matrix \n",
    "There are many other metrics besides accuracy to evaluate the performance of a classification model.\n",
    "Confusion matrices are a visual representation of how many samples were correctly and incorrectly classified.\n",
    "True labels are assigned to the y-axis, predicted labels to the x-axis and each cell of the matrix contains the number of cases the specific combinatination occured.\\\n",
    "For example, cell (0,0) contains the number of times the model predicted the label 0 (B, benign) correctly.\n",
    "\n",
    "[`confusion_matrix`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "02dd0735",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "7c0bc0c714bad914567dfc890e104feb",
     "grade": false,
     "grade_id": "cell-0417ea30bba567bc",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x7f6771b06d50>"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfIAAAG2CAYAAACEWASqAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAAAwvklEQVR4nO3deXxU9b3/8ffJHkhmQlASAglL2ZVFo8XUDWw00opQ0lot3kZEe1VAIBcXfpbVJV69CtJGcEGQXim4AFew4sWo4AJYoni1hUgQTVgSVJqERLMwc35/IFOnAZzJzGSW83o+HufxcL5zlk/aPPjk8/l+zzmGaZqmAABAWIoKdgAAAKDtSOQAAIQxEjkAAGGMRA4AQBgjkQMAEMZI5AAAhDESOQAAYYxEDgBAGCORAwAQxkjkAACEMRI5AAAB0LNnTxmG0WqbNGmSJKmxsVGTJk1S586dlZSUpPz8fFVXV3t9HYNnrQMA4H9ffvmlHA6H6/Mnn3yiyy+/XG+++aZGjBihW2+9Va+88oqWL18uu92uyZMnKyoqSu+++65X1yGRAwDQDqZNm6YNGzZoz549qqur05lnnqmVK1fql7/8pSRp9+7dGjhwoLZu3aoLLrjA4/PGBCrg9uB0OnXw4EElJyfLMIxghwMA8JJpmjp69KgyMjIUFRW42d7GxkY1Nzf7fB7TNFvlm/j4eMXHx5/2uObmZv33f/+3CgsLZRiGSktL1dLSotzcXNc+AwYMUFZWlrUS+cGDB5WZmRnsMAAAPqqsrFT37t0Dcu7Gxkb16pGkqsOOH975ByQlJam+vt5tbM6cOZo7d+5pj1u3bp1qamp0ww03SJKqqqoUFxenlJQUt/3S0tJUVVXlVUxhnciTk5MlSV980FO2JNbtITL9ot/gYIcABMwxtegd/cX173kgNDc3q+qwQ1+U9pQtue25ou6oUz2yP1dlZaVsNptr/IeqcUlaunSpRo0apYyMjDZf/1TCOpGfaG/YkqJ8+j8HCGUxRmywQwAC57tVWu0xPZqUbCgpue3Xceq7nGOzuSXyH/LFF1/o9ddf15o1a1xj6enpam5uVk1NjVtVXl1drfT0dK/iIvsBACzBYTp93tpi2bJl6tKli37+85+7xrKzsxUbG6uSkhLXWFlZmSoqKpSTk+PV+cO6IgcAwFNOmXKq7TdqteVYp9OpZcuWqaCgQDEx/0y5drtdEydOVGFhoVJTU2Wz2TRlyhTl5OR4tdBNIpEDABAwr7/+uioqKnTjjTe2+m7BggWKiopSfn6+mpqalJeXp8cff9zra5DIAQCW4JRTbWuO//N4b11xxRU61eNaEhISVFxcrOLiYh+iIpEDACzCYZpy+PAMNF+ODSQWuwEAEMaoyAEAlhCMxW7tgUQOALAEp0w5IjCR01oHACCMUZEDACyB1joAAGGMVesAACDkUJEDACzB+d3my/GhiEQOALAEh4+r1n05NpBI5AAAS3CYxzdfjg9FzJEDABDGqMgBAJbAHDkAAGHMKUMOGT4dH4porQMAEMaoyAEAluA0j2++HB+KSOQAAEtw+Nha9+XYQKK1DgBAGKMiBwBYQqRW5CRyAIAlOE1DTtOHVes+HBtItNYBAAhjVOQAAEugtQ4AQBhzKEoOHxrRDj/G4k8kcgCAJZg+zpGbzJEDAAB/oyIHAFgCc+QAAIQxhxklh+nDHHmIPqKV1joAAGGMihwAYAlOGXL6UL86FZolOYkcAGAJkTpHTmsdAIAwRkUOALAE3xe70VoHACBojs+R+/DSFFrrAADA36jIAQCW4PTxWeusWgcAIIiYIwcAIIw5FRWR95EzRw4AQBijIgcAWILDNOTw4VWkvhwbSCRyAIAlOHxc7OagtQ4AAPyNihwAYAlOM0pOH1atO1m1DgBA8NBaBwAAXjlw4ICuv/56de7cWYmJiRo8eLB27Njh+t40Tc2ePVtdu3ZVYmKicnNztWfPHq+uQSIHAFiCU/9cud6Wzenl9f7xj3/owgsvVGxsrF599VX9/e9/1yOPPKJOnTq59nnooYe0aNEiLVmyRNu3b1fHjh2Vl5enxsZGj69Dax0AYAm+PxDGu2P/8z//U5mZmVq2bJlrrFevXq7/Nk1TCxcu1O9//3uNGTNGkrRixQqlpaVp3bp1uvbaaz26DhU5AABeqKurc9uamppOut/LL7+s8847T7/61a/UpUsXnXPOOXrqqadc3+/bt09VVVXKzc11jdntdg0fPlxbt271OB4SOQDAEk48a92XTZIyMzNlt9tdW1FR0Umv99lnn2nx4sXq27evXnvtNd166626/fbb9eyzz0qSqqqqJElpaWlux6Wlpbm+8wStdQCAJfjrfeSVlZWy2Wyu8fj4+JPv73TqvPPO0wMPPCBJOuecc/TJJ59oyZIlKigoaHMc/4qKHABgCf6qyG02m9t2qkTetWtXDRo0yG1s4MCBqqiokCSlp6dLkqqrq932qa6udn3nCRI5AAABcOGFF6qsrMxt7NNPP1WPHj0kHV/4lp6erpKSEtf3dXV12r59u3Jycjy+Dq11AIAl+P5AGO+OnT59un7yk5/ogQce0DXXXKP3339fTz75pJ588klJkmEYmjZtmu677z717dtXvXr10qxZs5SRkaGxY8d6fB0SOQDAEpymIacPbzDz9tjzzz9fa9eu1cyZMzV//nz16tVLCxcu1Pjx41373HnnnWpoaNDvfvc71dTU6KKLLtLGjRuVkJDg8XVI5AAABMhVV12lq6666pTfG4ah+fPna/78+W2+BokcAGAJTh9b6748TCaQSOQAAEvw/e1noZnIQzMqAADgESpyAIAlOGTI4cMDYXw5NpBI5AAAS6C1DgAAQg4VOQDAEhzyrT3u8F8ofkUiBwBYQqS21knkAABL+P6LT9p6fCgKzagAAIBHqMgBAJZg+vg+cpPbzwAACB5a6wAAIORQkQMALKG9X2PaXkjkAABLcPj49jNfjg2k0IwKAAB4hIocAGAJtNYBAAhjTkXJ6UMj2pdjAyk0owIAAB6hIgcAWILDNOTwoT3uy7GBRCIHAFgCc+QAAIQx08e3n5k82Q0AAPgbFTkAwBIcMuTw4cUnvhwbSCRyAIAlOE3f5rmdph+D8SNa6wAAhDEqcrTy2x8PUvX+uFbjowu+1OSiA2puNPTkvAy99XIntTQZyh5xVFOK9qvTmceCEC3gu7OH1+tXt32pvoO/Uef0Y5p7Y09t3WgPdljwM6ePi918OTaQQiKq4uJi9ezZUwkJCRo+fLjef//9YIdkaYteLdOfd37i2opWlUuSLh5dK0laMrebtm2y6/dPfK7/WlOuI9Wxmj+xZxAjBnyT0MGpz/6WoD/+v+7BDgUB5JTh8xaKgp7IV69ercLCQs2ZM0cffPCBhg4dqry8PB0+fDjYoVlWSmeHUrscc23bX7era88mDcmpV0NdlF77c6r+fe4BDbuoXn2HfKvCRyv09x1J2lXaIdihA22y402bnn2oq96jCkcYCnoif/TRR3XzzTdrwoQJGjRokJYsWaIOHTromWeeCXZokNTSbOiNlzop79qvZRjSnv/roGMtUTrn4nrXPll9m9SlW7N2lXYMYqQAcHonnuzmyxaKgprIm5ubVVpaqtzcXNdYVFSUcnNztXXr1iBGhhPe22hXfV20rrjmiCTpyOEYxcY5lWR3uO2XcmaLjhxmyQWA0HVijtyXLRQF9V/er776Sg6HQ2lpaW7jaWlp2r17d6v9m5qa1NTU5PpcV1cX8Bit7rU/p+r8kXXqnM5CNgAIRaH558UpFBUVyW63u7bMzMxghxTRqvfH6sO3k3Xlb752jaV2OaaW5ijV10a77VvzZaxSu5DsAYQupwzX89bbtLHYrbUzzjhD0dHRqq6udhuvrq5Wenp6q/1nzpyp2tpa11ZZWdleoVrS/67qrJQzjml47j87H32HfKOYWKc+fCfJNVZZHq/DB+I0MLshGGECgEdMH1esmyGayIPaWo+Li1N2drZKSko0duxYSZLT6VRJSYkmT57cav/4+HjFx8e3c5TW5HRK/7s6Vbm/OqLo7/2WdLQ5lXfdET05t5uSUxzqmOxQ8T3dNTC7QQOzvwlewIAPEjo4lNGr2fU5PbNZvc/6VkdrovXlgdbPVEB44u1nAVJYWKiCggKdd955+vGPf6yFCxeqoaFBEyZMCHZolvbhlmQdPhCnvGuPtPrulrkHFGWYuvfmnmppMnTeiKOaXLQ/CFEC/tFv6Ld6+KW9rs+3zDsoSfrf1Z30yPSsYIUFeCToifzXv/61vvzyS82ePVtVVVUaNmyYNm7c2GoBHNpX9oijeu3gzpN+F5dganLRAU0uOtC+QQEB8n9bk5SXMTTYYSDAIvXJbkFP5JI0efLkk7bSAQDwl0htrYfmnxcAAMAjIVGRAwAQaL4+Lz1Ubz8jkQMALIHWOgAACDkkcgCAJfj0VLc2VPNz586VYRhu24ABA1zfNzY2atKkSercubOSkpKUn5/f6gFpniCRAwAsob0TuSSdddZZOnTokGt75513XN9Nnz5d69ev1wsvvKDNmzfr4MGDGjdunNfXYI4cAIAAiYmJOekjx2tra7V06VKtXLlSl112mSRp2bJlGjhwoLZt26YLLrjA42tQkQMALMFfFXldXZ3b9v23cv6rPXv2KCMjQ71799b48eNVUVEhSSotLVVLS4vba7wHDBigrKwsr1/jTSIHAFiCKfn40pTjMjMz3d7EWVRUdNLrDR8+XMuXL9fGjRu1ePFi7du3TxdffLGOHj2qqqoqxcXFKSUlxe2YtLQ0VVVVefVz0VoHAFiCv24/q6yslM1mc42f6mVeo0aNcv33kCFDNHz4cPXo0UPPP/+8EhMT2xzHv6IiBwDACzabzW3z9K2cKSkp6tevn8rLy5Wenq7m5mbV1NS47XOq13ifDokcAGAJwVi1/n319fXau3evunbtquzsbMXGxqqkpMT1fVlZmSoqKpSTk+PVeWmtAwAsob2f7DZjxgyNHj1aPXr00MGDBzVnzhxFR0fruuuuk91u18SJE1VYWKjU1FTZbDZNmTJFOTk5Xq1Yl0jkAAAExP79+3Xdddfp66+/1plnnqmLLrpI27Zt05lnnilJWrBggaKiopSfn6+mpibl5eXp8ccf9/o6JHIAgCW0d0W+atWq036fkJCg4uJiFRcXtzkmiUQOALAI0zRk+pDIfTk2kFjsBgBAGKMiBwBYAu8jBwAgjPE+cgAAEHKoyAEAlhCpi91I5AAAS4jU1jqJHABgCZFakTNHDgBAGKMiBwBYguljaz1UK3ISOQDAEkxJpunb8aGI1joAAGGMihwAYAlOGTJ4shsAAOGJVesAACDkUJEDACzBaRoyeCAMAADhyTR9XLUeosvWaa0DABDGqMgBAJYQqYvdSOQAAEsgkQMAEMYidbEbc+QAAIQxKnIAgCVE6qp1EjkAwBKOJ3Jf5sj9GIwf0VoHACCMUZEDACyBVesAAIQxU769UzxEO+u01gEACGdU5AAAS6C1DgBAOIvQ3jqJHABgDT5W5ArRipw5cgAAwhgVOQDAEniyGwAAYSxSF7vRWgcAIIxRkQMArME0fFuwFqIVOYkcAGAJkTpHTmsdAIAwRkUOALAGKz8Q5uWXX/b4hFdffXWbgwEAIFAiddW6R4l87NixHp3MMAw5HA5f4gEAAF7wKJE7nc5AxwEAQOCFaHvcFz7NkTc2NiohIcFfsQAAEDCR2lr3etW6w+HQvffeq27duikpKUmfffaZJGnWrFlaunSp3wMEAMAvTD9sbfTggw/KMAxNmzbNNdbY2KhJkyapc+fOSkpKUn5+vqqrq70+t9eJ/P7779fy5cv10EMPKS4uzjV+9tln6+mnn/Y6AAAAItlf//pXPfHEExoyZIjb+PTp07V+/Xq98MIL2rx5sw4ePKhx48Z5fX6vE/mKFSv05JNPavz48YqOjnaNDx06VLt37/Y6AAAA2ofhh8079fX1Gj9+vJ566il16tTJNV5bW6ulS5fq0Ucf1WWXXabs7GwtW7ZM7733nrZt2+bVNbxO5AcOHFCfPn1ajTudTrW0tHh7OgAA2oefWut1dXVuW1NT0ykvOWnSJP385z9Xbm6u23hpaalaWlrcxgcMGKCsrCxt3brVqx/L60Q+aNAgvf32263GX3zxRZ1zzjneng4AgLCSmZkpu93u2oqKik6636pVq/TBBx+c9PuqqirFxcUpJSXFbTwtLU1VVVVexeP1qvXZs2eroKBABw4ckNPp1Jo1a1RWVqYVK1Zow4YN3p4OAID24acnu1VWVspms7mG4+PjW+1aWVmpqVOnatOmTQG/u8vrinzMmDFav369Xn/9dXXs2FGzZ8/Wrl27tH79el1++eWBiBEAAN+dePuZL5skm83mtp0skZeWlurw4cM699xzFRMTo5iYGG3evFmLFi1STEyM0tLS1NzcrJqaGrfjqqurlZ6e7tWP1ab7yC+++GJt2rSpLYcCABDxfvrTn+rjjz92G5swYYIGDBigu+66S5mZmYqNjVVJSYny8/MlSWVlZaqoqFBOTo5X12rzA2F27NihXbt2STo+b56dnd3WUwEAEHDt+RrT5ORknX322W5jHTt2VOfOnV3jEydOVGFhoVJTU2Wz2TRlyhTl5OToggsu8CourxP5/v37dd111+ndd991TdLX1NToJz/5iVatWqXu3bt7e0oAAAIvxN5+tmDBAkVFRSk/P19NTU3Ky8vT448/7vV5vJ4jv+mmm9TS0qJdu3bpyJEjOnLkiHbt2iWn06mbbrrJ6wAAALCCt956SwsXLnR9TkhIUHFxsY4cOaKGhgatWbPG6/lxqQ0V+ebNm/Xee++pf//+rrH+/fvrD3/4gy6++GKvAwAAoF18b8Fam48PQV4n8szMzJM++MXhcCgjI8MvQQEA4G+GeXzz5fhQ5HVr/eGHH9aUKVO0Y8cO19iOHTs0depU/dd//ZdfgwMAwG+C+NKUQPKoIu/UqZMM458thYaGBg0fPlwxMccPP3bsmGJiYnTjjTdq7NixAQkUAAC05lEi//7kPAAAYcnKc+QFBQWBjgMAgMAKsdvP/KXND4SRjr8Uvbm52W3s+8+fBQAAgeX1YreGhgZNnjxZXbp0UceOHdWpUye3DQCAkBShi928TuR33nmn3njjDS1evFjx8fF6+umnNW/ePGVkZGjFihWBiBEAAN9FaCL3urW+fv16rVixQiNGjNCECRN08cUXq0+fPurRo4eee+45jR8/PhBxAgCAk/C6Ij9y5Ih69+4t6fh8+JEjRyRJF110kbZs2eLf6AAA8Bc/vcY01HidyHv37q19+/ZJkgYMGKDnn39e0vFK/cRLVAAACDUnnuzmyxaKvE7kEyZM0EcffSRJuvvuu1VcXKyEhARNnz5dd9xxh98DBAAAp+b1HPn06dNd/52bm6vdu3ertLRUffr00ZAhQ/waHAAAfsN95CfXo0cP9ejRwx+xAAAAL3mUyBctWuTxCW+//fY2BwMAQKAY8vHtZ36LxL88SuQLFizw6GSGYZDIAQBoRx4l8hOr1EPVL398oWKMuGCHAQTEp0/2C3YIQMA4v22Ubv+f9rmYlV+aAgBA2IvQxW5e334GAABCBxU5AMAaIrQiJ5EDACzB16ezRcyT3QAAQOhoUyJ/++23df311ysnJ0cHDhyQJP3pT3/SO++849fgAADwmwh9janXifyll15SXl6eEhMT9eGHH6qpqUmSVFtbqwceeMDvAQIA4Bck8uPuu+8+LVmyRE899ZRiY2Nd4xdeeKE++OADvwYHAABOz+vFbmVlZbrkkktajdvtdtXU1PgjJgAA/I7Fbt9JT09XeXl5q/F33nlHvXv39ktQAAD43Yknu/myhSCvE/nNN9+sqVOnavv27TIMQwcPHtRzzz2nGTNm6NZbbw1EjAAA+C5C58i9bq3ffffdcjqd+ulPf6pvvvlGl1xyieLj4zVjxgxNmTIlEDECAIBT8DqRG4ahe+65R3fccYfKy8tVX1+vQYMGKSkpKRDxAQDgF5E6R97mJ7vFxcVp0KBB/owFAIDA4RGtx40cOVKGceoJ/zfeeMOngAAAgOe8TuTDhg1z+9zS0qKdO3fqk08+UUFBgb/iAgDAv3xsrUdMRb5gwYKTjs+dO1f19fU+BwQAQEBEaGvdby9Nuf766/XMM8/463QAAMADfnuN6datW5WQkOCv0wEA4F8RWpF7ncjHjRvn9tk0TR06dEg7duzQrFmz/BYYAAD+xO1n37Hb7W6fo6Ki1L9/f82fP19XXHGF3wIDAAA/zKtE7nA4NGHCBA0ePFidOnUKVEwAAMBDXi12i46O1hVXXMFbzgAA4SdCn7Xu9ar1s88+W5999lkgYgEAIGBOzJH7soUirxP5fffdpxkzZmjDhg06dOiQ6urq3DYAACAtXrxYQ4YMkc1mk81mU05Ojl599VXX942NjZo0aZI6d+6spKQk5efnq7q62uvreJzI58+fr4aGBv3sZz/TRx99pKuvvlrdu3dXp06d1KlTJ6WkpDBvDgAIbe3YVu/evbsefPBBlZaWaseOHbrssss0ZswY/e1vf5MkTZ8+XevXr9cLL7ygzZs36+DBg63uDPOEx4vd5s2bp1tuuUVvvvmm1xcBACDo2vk+8tGjR7t9vv/++7V48WJt27ZN3bt319KlS7Vy5UpddtllkqRly5Zp4MCB2rZtmy644AKPr+NxIjfN4z/BpZde6vHJAQCINP86jRwfH6/4+PjTHuNwOPTCCy+ooaFBOTk5Ki0tVUtLi3Jzc137DBgwQFlZWdq6datXidyrOfLTvfUMAIBQ5q/FbpmZmbLb7a6tqKjolNf8+OOPlZSUpPj4eN1yyy1au3atBg0apKqqKsXFxSklJcVt/7S0NFVVVXn1c3l1H3m/fv1+MJkfOXLEqwAAAGgXfmqtV1ZWymazuYZPV433799fO3fuVG1trV588UUVFBRo8+bNPgTRmleJfN68ea2e7AYAgJWcWIXuibi4OPXp00eSlJ2drb/+9a967LHH9Otf/1rNzc2qqalxq8qrq6uVnp7uVTxeJfJrr71WXbp08eoCAACEglB41rrT6VRTU5Oys7MVGxurkpIS5efnS5LKyspUUVGhnJwcr87pcSJnfhwAENbaedX6zJkzNWrUKGVlZeno0aNauXKl3nrrLb322muy2+2aOHGiCgsLlZqaKpvNpilTpignJ8erhW5SG1atAwCAH3b48GH99re/1aFDh2S32zVkyBC99tpruvzyyyVJCxYsUFRUlPLz89XU1KS8vDw9/vjjXl/H40TudDq9PjkAACGjnSvypUuXnvb7hIQEFRcXq7i42Ieg2vAaUwAAwlEozJEHAokcAGAN7VyRtxevX5oCAABCBxU5AMAaIrQiJ5EDACwhUufIaa0DABDGqMgBANZAax0AgPBFax0AAIQcKnIAgDXQWgcAIIxFaCKntQ4AQBijIgcAWILx3ebL8aGIRA4AsIYIba2TyAEAlsDtZwAAIORQkQMArIHWOgAAYS5Ek7EvaK0DABDGqMgBAJYQqYvdSOQAAGuI0DlyWusAAIQxKnIAgCXQWgcAIJzRWgcAAKGGihwAYAm01gEACGcR2lonkQMArCFCEzlz5AAAhDEqcgCAJTBHDgBAOKO1DgAAQg0VOQDAEgzTlGG2vaz25dhAIpEDAKyB1joAAAg1VOQAAEtg1ToAAOGM1joAAAg1VOQAAEugtQ4AQDiL0NY6iRwAYAmRWpEzRw4AQBijIgcAWEOEttapyAEAlnGivd6WzVtFRUU6//zzlZycrC5dumjs2LEqKytz26exsVGTJk1S586dlZSUpPz8fFVXV3t1HRI5AAABsHnzZk2aNEnbtm3Tpk2b1NLSoiuuuEINDQ2ufaZPn67169frhRde0ObNm3Xw4EGNGzfOq+vQWgcAWINpHt98Od4LGzdudPu8fPlydenSRaWlpbrkkktUW1urpUuXauXKlbrsssskScuWLdPAgQO1bds2XXDBBR5dh4ocAGAJvrTVv99er6urc9uampo8un5tba0kKTU1VZJUWlqqlpYW5ebmuvYZMGCAsrKytHXrVo9/LhI5AABeyMzMlN1ud21FRUU/eIzT6dS0adN04YUX6uyzz5YkVVVVKS4uTikpKW77pqWlqaqqyuN4aK0DAKzBT6vWKysrZbPZXMPx8fE/eOikSZP0ySef6J133vEhgJMjkQMALMFwHt98OV6SbDabWyL/IZMnT9aGDRu0ZcsWde/e3TWenp6u5uZm1dTUuFXl1dXVSk9P9/j8tNYBAAgA0zQ1efJkrV27Vm+88YZ69erl9n12drZiY2NVUlLiGisrK1NFRYVycnI8vg4VObz2q5sqNKHwc61b0U1PPvijYIcDeM3+1mGlbD6smK+PL1JqzkjU1z/P0DeDUyRJsYcbdeaLlUoor5dxzKlvzrLr8HU95LDFBjFq+KydHwgzadIkrVy5Uv/zP/+j5ORk17y33W5XYmKi7Ha7Jk6cqMLCQqWmpspms2nKlCnKycnxeMW6FOSKfMuWLRo9erQyMjJkGIbWrVsXzHDggb5nH9Woaw7ps90dgx0K0GbHOsXpq3HdVXHPWaq45yx909+mbo+XK+7gtzKaHOq28FOZhrS/sL8q7xwo45ipbn/cIzlD9NFe8Ii/Vq17avHixaqtrdWIESPUtWtX17Z69WrXPgsWLNBVV12l/Px8XXLJJUpPT9eaNWu8uk5QE3lDQ4OGDh2q4uLiYIYBDyV0cOjOh3Zr0Zx+qq+jmYPw1TA0RQ2DU9SSlqCWtAR9/YvucsZHKeGzeiWW1yv26yZV39Bbzd07qLl7B1VN6KX4LxrUYXddsEOHL07cR+7L5tXlzJNuN9xwg2ufhIQEFRcX68iRI2poaNCaNWu8mh+XgtxaHzVqlEaNGhXMEOCF236/R+9vTtXOrZ107b9XBDscwD+cppJ3HJHR7FRj7yTFftkkGZIZY7h2MWOjJENKLK/XN4PsQQwWaC2syqqmpia3G+/r6vjruL1cMuqw+gyq19Rrzg12KIBfxO3/Rln/uUtGi1PO+GgdurWPmjMS5UiOkTMuWmes2a+vxnaTJJ2xZr8MpxRd2xLkqOGLSH2NaVgl8qKiIs2bNy/YYVjOGemN+veZe3XPTYPV0syNDogMzekJ+mLWWYr61qHk0iNKW7ZP+2cMUHNGog79+4/U5bkvlPJGtWRIR8/vrMasDtznE+4i9O1nYZXIZ86cqcLCQtfnuro6ZWZmBjEia+h7Vr06ndGiP7z4gWssOkY6+7xajf7NAY0ZdrGcTuM0ZwBCUEyUWrokSJKaenRU/OffKKWkWof/rae+Ocuuzx8YoqijLVK0IWeHGPWe8aFazkgNctBAa2GVyOPj4z16gg78a+fWFN16dbbb2PT7y7R/Xwe98HQmSRwRwTBNGcfcnxbiTD5+u1ni7jpFHz2m+qEpQYgM/kJrHZb17Tcx+qLc/Vel8dto1dXE6otybkND+DljTaUazk5RS2qcohodsr3/tRI/PaojU/tJkmzvfqnmrolyJMUo4bN6dVldoX/kpqklPTHIkcMn7fz2s/YS1EReX1+v8vJy1+d9+/Zp586dSk1NVVZWVhAjAxDJoo8eU/qyzxRd2yJnYrSaunXQgan9XCvS46obdcba/YpucKilc5y+/lmGanLTghw1cHJBTeQ7duzQyJEjXZ9PzH8XFBRo+fLlQYoKnrj7hqHBDgFos+qCXqf9/qtxmfpqHOtvIg2t9QAYMWKEzBBtVQAAIkyErlrnZgoAAMIYi90AAJZAax0AgHDmNH178U2IvjSHRA4AsAbmyAEAQKihIgcAWIIhH+fI/RaJf5HIAQDWEKFPdqO1DgBAGKMiBwBYArefAQAQzli1DgAAQg0VOQDAEgzTlOHDgjVfjg0kEjkAwBqc322+HB+CaK0DABDGqMgBAJZAax0AgHAWoavWSeQAAGvgyW4AACDUUJEDACyBJ7sBABDOaK0DAIBQQ0UOALAEw3l88+X4UEQiBwBYA611AAAQaqjIAQDWwANhAAAIX5H6iFZa6wAAhDEqcgCANUToYjcSOQDAGkz59k7x0MzjJHIAgDUwRw4AAEIOFTkAwBpM+ThH7rdI/IpEDgCwhghd7EZrHQCAANiyZYtGjx6tjIwMGYahdevWuX1vmqZmz56trl27KjExUbm5udqzZ4/X1yGRAwCswemHzQsNDQ0aOnSoiouLT/r9Qw89pEWLFmnJkiXavn27OnbsqLy8PDU2Nnp1HVrrAABLaO9V66NGjdKoUaNO+p1pmlq4cKF+//vfa8yYMZKkFStWKC0tTevWrdO1117r8XWoyAEAaGf79u1TVVWVcnNzXWN2u13Dhw/X1q1bvToXFTkAwBr8tNitrq7ObTg+Pl7x8fFenaqqqkqSlJaW5jaelpbm+s5TVOQAAGs4kch92SRlZmbKbre7tqKioqD+WFTkAAB4obKyUjabzfXZ22pcktLT0yVJ1dXV6tq1q2u8urpaw4YN8+pcVOQAAGvwU0Vus9nctrYk8l69eik9PV0lJSWusbq6Om3fvl05OTlenYuKHABgDU5Jho/He6G+vl7l5eWuz/v27dPOnTuVmpqqrKwsTZs2Tffdd5/69u2rXr16adasWcrIyNDYsWO9ug6JHABgCe19+9mOHTs0cuRI1+fCwkJJUkFBgZYvX64777xTDQ0N+t3vfqeamhpddNFF2rhxoxISEry6DokcAIAAGDFihMzTJH/DMDR//nzNnz/fp+uQyAEA1hChz1onkQMArMFpSoYPydgZmomcVesAAIQxKnIAgDXQWgcAIJz5mMgVmomc1joAAGGMihwAYA201gEACGNOUz61x1m1DgAA/I2KHABgDabz+ObL8SGIRA4AsAbmyAEACGPMkQMAgFBDRQ4AsAZa6wAAhDFTPiZyv0XiV7TWAQAIY1TkAABroLUOAEAYczol+XAvuDM07yOntQ4AQBijIgcAWAOtdQAAwliEJnJa6wAAhDEqcgCANUToI1pJ5AAASzBNp0wf3mDmy7GBRCIHAFiDafpWVTNHDgAA/I2KHABgDaaPc+QhWpGTyAEA1uB0SoYP89whOkdOax0AgDBGRQ4AsAZa6wAAhC/T6ZTpQ2s9VG8/o7UOAEAYoyIHAFgDrXUAAMKY05SMyEvktNYBAAhjVOQAAGswTUm+3EcemhU5iRwAYAmm05TpQ2vdJJEDABBEplO+VeTcfgYAAPyMihwAYAm01gEACGcR2loP60R+4q+jY2ZzkCMBAsf5bWOwQwAC5sTvd3tUu8fU4tPzYI6pxX/B+JFhhmqvwAP79+9XZmZmsMMAAPiosrJS3bt3D8i5Gxsb1atXL1VVVfl8rvT0dO3bt08JCQl+iMw/wjqRO51OHTx4UMnJyTIMI9jhWEJdXZ0yMzNVWVkpm80W7HAAv+L3u/2ZpqmjR48qIyNDUVGBW3/d2Nio5mbfu7dxcXEhlcSlMG+tR0VFBewvOJyezWbjHzpELH6/25fdbg/4NRISEkIuAfsLt58BABDGSOQAAIQxEjm8Eh8frzlz5ig+Pj7YoQB+x+83wlFYL3YDAMDqqMgBAAhjJHIAAMIYiRwAgDBGIgcAIIyRyOGx4uJi9ezZUwkJCRo+fLjef//9YIcE+MWWLVs0evRoZWRkyDAMrVu3LtghAR4jkcMjq1evVmFhoebMmaMPPvhAQ4cOVV5eng4fPhzs0ACfNTQ0aOjQoSouLg52KIDXuP0MHhk+fLjOP/98/fGPf5R0/Dn3mZmZmjJliu6+++4gRwf4j2EYWrt2rcaOHRvsUACPUJHjBzU3N6u0tFS5ubmusaioKOXm5mrr1q1BjAwAQCLHD/rqq6/kcDiUlpbmNp6WluaX1wICANqORA4AQBgjkeMHnXHGGYqOjlZ1dbXbeHV1tdLT04MUFQBAIpHDA3FxccrOzlZJSYlrzOl0qqSkRDk5OUGMDAAQE+wAEB4KCwtVUFCg8847Tz/+8Y+1cOFCNTQ0aMKECcEODfBZfX29ysvLXZ/37dunnTt3KjU1VVlZWUGMDPhh3H4Gj/3xj3/Uww8/rKqqKg0bNkyLFi3S8OHDgx0W4LO33npLI0eObDVeUFCg5cuXt39AgBdI5AAAhDHmyAEACGMkcgAAwhiJHACAMEYiBwAgjJHIAQAIYyRyAADCGIkcAIAwRiIHfHTDDTe4vbt6xIgRmjZtWrvH8dZbb8kwDNXU1JxyH8MwtG7dOo/POXfuXA0bNsynuD7//HMZhqGdO3f6dB4AJ0ciR0S64YYbZBiGDMNQXFyc+vTpo/nz5+vYsWMBv/aaNWt07733erSvJ8kXAE6HZ60jYl155ZVatmyZmpqa9Je//EWTJk1SbGysZs6c2Wrf5uZmxcXF+eW6qampfjkPAHiCihwRKz4+Xunp6erRo4duvfVW5ebm6uWXX5b0z3b4/fffr4yMDPXv31+SVFlZqWuuuUYpKSlKTU3VmDFj9Pnnn7vO6XA4VFhYqJSUFHXu3Fl33nmn/vUpx//aWm9qatJdd92lzMxMxcfHq0+fPlq6dKk+//xz1/O9O3XqJMMwdMMNN0g6/na5oqIi9erVS4mJiRo6dKhefPFFt+v85S9/Ub9+/ZSYmKiRI0e6xempu+66S/369VOHDh3Uu3dvzZo1Sy0tLa32e+KJJ5SZmakOHTrommuuUW1trdv3Tz/9tAYOHKiEhAQNGDBAjz/+uNexAGgbEjksIzExUc3Nza7PJSUlKisr06ZNm7Rhwwa1tLQoLy9PycnJevvtt/Xuu+8qKSlJV155peu4Rx55RMuXL9czzzyjd955R0eOHNHatWtPe93f/va3+vOf/6xFixZp165deuKJJ5SUlKTMzEy99NJLkqSysjIdOnRIjz32mCSpqKhIK1as0JIlS/S3v/1N06dP1/XXX6/NmzdLOv4Hx7hx4zR69Gjt3LlTN910k+6++26v/zdJTk7W8uXL9fe//12PPfaYnnrqKS1YsMBtn/Lycj3//PNav369Nm7cqA8//FC33Xab6/vnnntOs2fP1v33369du3bpgQce0KxZs/Tss896HQ+ANjCBCFRQUGCOGTPGNE3TdDqd5qZNm8z4+HhzxowZru/T0tLMpqYm1zF/+tOfzP79+5tOp9M11tTUZCYmJpqvvfaaaZqm2bVrV/Ohhx5yfd/S0mJ2797ddS3TNM1LL73UnDp1qmmapllWVmZKMjdt2nTSON98801TkvmPf/zDNdbY2Gh26NDBfO+999z2nThxonndddeZpmmaM2fONAcNGuT2/V133dXqXP9Kkrl27dpTfv/www+b2dnZrs9z5swxo6Ojzf3797vGXn31VTMqKso8dOiQaZqm+aMf/chcuXKl23nuvfdeMycnxzRN09y3b58pyfzwww9PeV0AbcccOSLWhg0blJSUpJaWFjmdTv3mN7/R3LlzXd8PHjzYbV78o48+Unl5uZKTk93O09jYqL1796q2tlaHDh1ye3VrTEyMzjvvvFbt9RN27typ6OhoXXrppR7HXV5erm+++UaXX36523hzc7POOeccSdKuXbtavUI2JyfH42ucsHr1ai1atEh79+5VfX29jh07JpvN5rZPVlaWunXr5nYdp9OpsrIyJScna+/evZo4caJuvvlm1z7Hjh2T3W73Oh4A3iORI2KNHDlSixcvVlxcnDIyMhQT4/7r3rFjR7fP9fX1ys7O1nPPPdfqXGeeeWabYkhMTPT6mPr6eknSK6+84pZApePz/v6ydetWjR8/XvPmzVNeXp7sdrtWrVqlRx55xOtYn3rqqVZ/WERHR/stVgCnRiJHxOrYsaP69Onj8f7nnnuuVq9erS5durSqSk/o2rWrtm/frksuuUTS8cqztLRU55577kn3Hzx4sJxOpzZv3qzc3NxW35/oCDgcDtfYoEGDFB8fr4qKilNW8gMHDnQt3Dth27ZtP/xDfs97772nHj166J577nGNffHFF632q6io0MGDB5WRkeG6TlRUlPr376+0tDRlZGTos88+0/jx4726PgD/YLEb8J3x48frjDPO0JgxY/T2229r3759euutt3T77bdr//79kqSpU6fqwQcf1Lp167R7927ddtttp70HvGfPniooKNCNN96odevWuc75/PPPS5J69OghwzC0YcMGffnll6qvr1dycrJmzJih6dOn69lnn9XevXv1wQcf6A9/+INrAdktt9yiPXv26I477lBZWZlWrlyp5cuXe/Xz9u3bVxUVFVq1apX27t2rRYsWnXThXkJCggoKCvTRRx/p7bff1u23365rrrlG6enpkqR58+apqKhIixYt0qeffqqPP/5Yy5Yt06OPPupVPADahkQOfKdDhw7asmWLsrKyNG7cOA0cOFATJ05UY2Ojq0L/j//4D/3bv/2bCgoKlJOTo+TkZP3iF7847XkXL16sX/7yl7rttts0YMAA3XzzzWpoaJAkdevWTfPmzdPdd9+ttLQ0TZ48WZJ07733atasWSoqKtLAgQN15ZVX6pVXXlGvXr0kHZ+3fumll7Ru3ToNHTpUS5Ys0QMPPODVz3v11Vdr+vTpmjx5soYNG6b33ntPs2bNarVfnz59NG7cOP3sZz/TFVdcoSFDhrjdXnbTTTfp6aef1rJlyzR48GBdeumlWr58uStWAIFlmKdapQMAAEIeFTkAAGGMRA4AQBgjkQMAEMZI5AAAhDESOQAAYYxEDgBAGCORAwAQxkjkAACEMRI5AABhjEQOAEAYI5EDABDGSOQAAISx/w/I10mpOyvFlwAAAABJRU5ErkJggg==",
      "text/plain": [
       "<Figure size 640x480 with 2 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "##### DO NOT CHANGE #####\n",
    "\n",
    "cm = confusion_matrix(y_test, y_pred_rf)\n",
    "\n",
    "ConfusionMatrixDisplay(confusion_matrix=cm).plot()\n",
    "\n",
    "##### DO NOT CHANGE #####"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "138b94b3",
   "metadata": {},
   "source": [
    "From your confusion matrix, how many wrong predictions were made by the model? Assign it to `answer` as an integer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "6ac58752",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "d50f8ddebddbeb56bcda4a420fcd7b48",
     "grade": false,
     "grade_id": "cell-c8ce2ca4a871f946",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5\n"
     ]
    }
   ],
   "source": [
    "answer_cm_1 = None\n",
    "\n",
    "answer_cm_1 = cm[(0,1)]+cm[(1,0)]\n",
    "print(answer_cm_1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "id": "de36e00b",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "fdf9444646863dad24c34627304b1716",
     "grade": true,
     "grade_id": "rf_confusion_matrix_1",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "##### DO NOT CHANGE #####\n",
    "\n",
    "# check results - 1 points\n",
    "\n",
    "\n",
    "##### DO NOT CHANGE #####"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ffbc8d0",
   "metadata": {},
   "source": [
    "From your confusion matrix, how many samples were classified as benign by the model but are actually malignant? Assign it to `answer` as an integer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "501491a6",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "4defefea4a3ecefab79c7852a24df6ec",
     "grade": false,
     "grade_id": "cell-3baa82187d2089f3",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4\n"
     ]
    }
   ],
   "source": [
    "answer_cm_2 = None\n",
    "\n",
    "answer_cm_2= cm[(1,0)]\n",
    "print(answer_cm_2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "6cfba7bf",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "6133d49de231bb75139468daff12af5f",
     "grade": true,
     "grade_id": "rf_confusion_matrix_2",
     "locked": true,
     "points": 1,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "##### DO NOT CHANGE #####\n",
    "\n",
    "# check results - 1 points\n",
    "\n",
    "\n",
    "##### DO NOT CHANGE #####"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0e5ded2",
   "metadata": {},
   "source": [
    "### Hyperparameters\n",
    "\n",
    "One essential part of any machine learning application is hyperparameter optimization. Hyperparameters refer to the parameters of the algorithm itself and by tuning these parameters we can maximize the performance of the model.\n",
    "In the case of a random forest classifier these include for example:\\\n",
    "`n_estimators` numbers of trees in the forest\\\n",
    "`criterion` impurity measure \\\n",
    "`max_depth` maximum depth of a tree\n",
    "\n",
    "Can you improve the accuracy `accuracy_rf` of the random forest classifier by finding more suitable hyperparameters?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "id": "7677e39b",
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "3941eecfab009b40a27077bcf5c3f604",
     "grade": false,
     "grade_id": "cell-290803e722f06c0c",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.956140350877193 0.9649122807017544\n"
     ]
    }
   ],
   "source": [
    "accuracy_rf_tuned = None\n",
    "\n",
    "rf_classifier = RandomForestClassifier(n_estimators=100, max_depth=3, random_state=42)\n",
    "rf_classifier.fit(X_train, y_train)\n",
    "y_pred_rf = rf_classifier.predict(X_test)\n",
    "accuracy_rf_tuned = (y_pred_rf == y_test).sum() / len(y_pred)\n",
    "print(accuracy_rf, accuracy_rf_tuned)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "id": "71371943",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "5a337de897523b4a50a627fab26d67db",
     "grade": true,
     "grade_id": "rf_hyperopt",
     "locked": true,
     "points": 3,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "##### DO NOT CHANGE #####\n",
    "\n",
    "# check results - 3 points\n",
    "\n",
    "\n",
    "##### DO NOT CHANGE #####"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a56944cd",
   "metadata": {},
   "source": [
    "# Submitting your solution\n",
    "\n",
    "As a last step, the notebook should be uploaded to Ilias such that we can auto-grade it."
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "formats": "ipynb,py:percent"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}