{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "2234ad31d4d5274ebc28534d5fa92ed9", "grade": false, "grade_id": "cell-06aa629a0207eeaf", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "# Exercise Sheet No. 5\n", "\n", "---\n", "\n", "> Machine Learning for Natural Sciences, Summer 2024, TT.-Prof. Pascal Friederich, pascal.friederich@kit.edu\n", "> \n", "> Deadline: May 27th 2024, 8:00 am\n", ">\n", "> Tutor: jonas.teufel@kit.edu\n", ">\n", "> **Please ask questions in the forum/discussion board and only contact the Tutor when there are issues with the grading**\n", "---\n", "\n", "**Topic**: This exercise sheet will focus on Bayesian statistics and Naive Bayes Classification" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "4e18e25140bf3e34b36ac2538a4e46a4", "grade": false, "grade_id": "cell-72d95f678d4226c2", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "⚠️ **NOTE.** In an attempt to increase your opportunities for pre-submission self-checks of the assignments, hash-based assert statements will be provided to you throughout this notebook. These assert statements are used to give you an indication about the correcteness of specific numeric values:\n", "\n", "```python\n", "some_variable = 0.73\n", "assert hash(f'{some_variable:.2f}') == 4545130770134580, 'your value is likely incorrect!'\n", "```\n", "\n", "Also note that these hash-based checks do *NOT* check the value with the required precision of the hidden tests! Keep this in mind when contemplating to simply brute force search the correct value. So even if a brute-forced value passes the self-check assert statement, it will likely *NOT* pass the hidden tests for the grading.\n", "\n", "Therefore, if the hash-based self-check fails, your computed value is most likely incorrect. However, the self-check passing does not fully guarantee that correctness! The hash-based asserts statements are necessary but not sufficient conditions for the final result." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "5f5134aee51c2645e17e19ab3105c618", "grade": false, "grade_id": "cell-cab0717140f2e351", "locked": false, "schema_version": 3, "solution": false, "task": false }, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Please add here your group members' names and student IDs. \n", "\n", "Names: Nils Lennart Bruns \n", "\n", "IDs: 2460137" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "b5fa8f151535d9c077c80926cbc9b38f", "grade": false, "grade_id": "cell-dc8e622a586ed8d6", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "import io\n", "import csv\n", "import copy\n", "import hashlib\n", "import typing as t\n", "from collections import defaultdict\n", "\n", "import pandas as pd\n", "import numpy as np\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "from scipy.integrate import quad\n", "\n", "hashcheck = lambda v: hashlib.sha256(v.encode()).hexdigest()\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "9281bb21691eb5fc8a7e8cb192d74de3", "grade": false, "grade_id": "cell-b4b7e8190a5548e9", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "# 5 Bayes' theorem\n", "\n", "This section will review Bayes' rule and some additional important formulas from lecture 2. Bayes rule defines how the posterior probability\n", "\n", "$$\n", "\\boxed{p(y|x) = \\frac{p(x|y) p(y)}{p(x)} }\n", "$$\n", "\n", "is defined by the likelihood $p(x|y)$, the evidence $p(x)$ and the prior $p(y)$. Here, $p(x|y)$ is a conditional probability, which gives the probability of event $x$ occurring given that the condition $y$ is true.\n", "\n", "**Practical Example.** Lets clarify this with a concrete example: Here we'll assume that $y \\in \\{ \\mathrm{healthy}, \\mathrm{sick} \\}$ is a random discrete variable if a person has actually caught a certain disease. Additionally, $x \\in \\{ \\mathrm{negative}, \\mathrm{positive} \\}$ is the outcome of a test that is supposed to detect that disease. By looking at population statistics we can see that only about 1 in 1000 people will actually ever catch this disease in their lives. Therefore we can say that $P(y=\\mathrm{sick}) = 1 / 1000 = 0.001$ and likewise $P(y=\\mathrm{healthy} = 1 - P(y=\\mathrm{sick}) = 0.999$. Additionally, through clinical tests we may also know the overall probability of a test being positive or negative $p(x)$ and the *sensitivity* of the test $p(x|y)$ which is the probability of the test showing a positive result when a person actually has the disease. All of this information can then be used to determine the conditional probability $p(y=\\mathrm{sick}|x=\\mathrm{positive})$, for example, of having the disease when the test result is positive. Interestingly, $p(y=\\mathrm{healthy}|x=\\mathrm{positive})$ often times turns out to be higher than intuitively expexted - especially for rare diseases. This is because for really rare diseases the term $p(y=\\mathrm{healthy}$ is so high that a possible test would need an incredibly high sensitivity to compensate for this.\n", "\n", "**Marginalization.** Marginalization is the process of \"eliminating a variable\" from a distribution. For continuous variable $y$, the marginalized distribution \n", "\n", "$$\n", "p(x) = \\int\\limits_{-\\infty}^{+\\infty} p(x|y=s) p(y=s) ds\n", "$$\n", "\n", "is defined as the integral over all possible values of $y$. Similarly for a discrete variable $y$, the marginalization\n", "\n", "$$\n", "p(x) = \\sum\\limits_k p(x|y=k) \\; p(y=k)\n", "$$\n", "\n", "is defined as the discrete sum over all possible realizations $k$ of the variable." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "ed0cfec2b7eb13f7ef7864f384786c3f", "grade": false, "grade_id": "task-5-1", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## 5.1 Exam Qualifications\n", "\n", "**The purpose of exams.** Students generally go to university to learn. Each lecture is designed to teach students about some specific aspect of a subject area (physics, chemistry, computer science etc.). In this process, each course has a specific set of learning goals about certain topics that students should be familar with after completing the course. So there exists an abstract set of qualifications that a student should obtain by taking each course. At the end of each course there is an exam which is supposed to test whether a student sufficiently obtained these qualifications. However, the definition of these qualifications may be complex and it is easy to imagine that there can be a rift between these true qualifications and what an exam is able test in a limited amount of time. For example, it is likely easy to recall some fellow student which managed to pass a certain exam even with a *shaky* understanding of the topic. There might also be the opposite case of students failing the exam even though they seemed to have a good grasp of the topic. Ultimately, one can say that the outcome of an exam is not always perfectly aligned with the knowledge that a person has actually acquired. \n", "\n", "**A case study.** While Prof. Friederich is grading the MLNS exam one day, he wonders how effective his own exam is at judging whether or not a student has actually acquired all the necessary qualifications from his lecture. After thinking about it for a while, he decides that this can be framed as a question of probabilities: *What is the probability that a student has acquired all the necessary qualification, given that they passed the exam?*\n", "\n", "Formally, we can define two random variables $x$ and $y$. The discrete random variable $y = \\{ \\mathrm{qualified}, \\mathrm{unqualified} \\}$ captures whether a student has actually acquired the necessary qualification or not. The second discrete random variable $x = \\{ \\mathrm{pass}, \\mathrm{fail} \\}$ captures whether a student has passed the exam or failed it. To answer the initial question we therefore have to calculate the probability \n", "\n", "$$\n", "P(y=\\mathrm{qualified}|x=\\mathrm{pass})\n", "$$\n", "\n", "that a student is sufficiently qualified *under the condition* that they have passed the exam. This can be done by applying Bayes rule:\n", "\n", "$$\n", "P(y=\\mathrm{qualified}|x=\\mathrm{pass}) = \\frac{ P(x=\\mathrm{pass}|y=\\mathrm{qualified}) \\cdot P(y=\\mathrm{qualified})}{P(x=\\mathrm{pass})}\n", "$$\n", "\n", "This requires us to provide an estimate of the following 3 probabilities:\n", "\n", "- $P(x=\\mathrm{pass})$: The overall probability of passing the exam.\n", "- $P(y=\\mathrm{qualified})$: The prior probabilitiy of acquiring the necessary qualifications.\n", "- $P(x=\\mathrm{pass} | y=\\mathrm{qualified})$: The probability of passing the exam under the condition of possessing the necessary qualifications.\n", "\n", "To estimate the probability $P(x=\\mathrm{pass} | y=\\mathrm{qualified})$ of passing the exam when possessing the necessary qualifications, Prof. Friederich makes the assumption that all of his PhD students already possess all the necessary qualifications. All PhD students take the exam, and 9 out of 10 pass it.\n", "\n", "To estimate the probability $P(y=\\mathrm{qualified})$, Prof. Friederich draws from his experience of talking to various students throughout the semester and estimates that about $80\\%$ of the students generally acquire the necessary qualifications." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "60a8e449da82baa9a9a412cb41145640", "grade": false, "grade_id": "cell-c60c72088095b1a1", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**🛠️ Task 5.1 (1 points)** Fill in the numeric values for the probabilities using the information provided in the previous description. Set the value of the variable ``p_y`` to the prior probability of possessing the necessary qualifications and the value of the variable ``p_x_y`` as the conditional probability of passing the exam under the condition of possessing the necessary qualification." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "5077b3f8eaec55947937523bfa04c3a7", "grade": false, "grade_id": "ans-5-1", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# TASK: Provide estimates for the task probabilities from the previous description.\n", "\n", "# HINT: Give the probabilities as float ratios between 0 and 1 and NOT as percentages\n", "\n", "p_y: float = None\n", "p_x_y: float = None\n", " \n", "p_y = 0.8\n", "p_x_y = 0.9" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "63e95482cec31027e2a48313c2b1854c", "grade": true, "grade_id": "test-5-1-probabilities-text", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# ID: test-5-1-probabilities-text - possible points: 2\n", "\n", "assert isinstance(p_y, float)\n", "assert 0 <= p_y <= 1.0, 'give probabilities as ratios in the range [0, 1] and not as percentages!'\n", "assert hashcheck(f'{p_y:.1f}') == '1e9d7c27c8bbc8ddf0055c93e064a62fa995d177fee28cc8fa949bc8a4db06f4', 'p_y is likely incorrect!'\n", "\n", "assert isinstance(p_x_y, float)\n", "assert 0 <= p_x_y <= 1.0, 'give probabilities as ratios in the range [0, 1] and not as percentages!'\n", "assert hashcheck(f'{p_x_y:.1f}') == '8139b33952401b3ee0e2ca84651cb9a1d7f66d442bf908f9cf1f53ea746e5801', 'p_x_y is likely incorrect!'\n", "\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "c16b1d086b0a283f84f180b1e53a53fb", "grade": false, "grade_id": "task-5-2", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**🛠️ Task 5.2 (2 points)** Estimate the overall chance of passing the exam $P(x=\\mathrm{pass})$. This probability can be very accurately approximated by looking at the results of the previous years. Use the given statistics of the previous years to calculate an approximate value of $P(x=\\mathrm{pass})$ and assign the value to the variable ``p_x``" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "cabedfcfe372da8585577f3687563f3a", "grade": false, "grade_id": "ans-5-2", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [ { "data": { "text/plain": [ "0.8494077834179357" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# A list that contains the exam statistics of the previous years. Each element is a dictionary that contains \n", "# the following string keys:\n", "# - year: The year in which the exam was written\n", "# - pass: the number of students that passed the exam\n", "# - fail: the number of students that failed the exam\n", "previous_statistics: list[dict[str, int]] = [\n", " {\n", " 'year': 2021,\n", " 'pass': 162,\n", " 'fail': 35,\n", " },\n", " {\n", " 'year': 2022,\n", " 'pass': 174,\n", " 'fail': 23,\n", " },\n", " {\n", " 'year': 2023,\n", " 'pass': 166,\n", " 'fail': 31,\n", " }\n", "]\n", "\n", " \n", "# TASK: Provide estimates for the task probabilities by calculating from the given information.\n", "\n", "# HINT 1: Give the probabilities as float ratios between 0 and 1 and NOT as percentages\n", "\n", "# HINT 2: One can think of *multiple* ways to estimate the overall probability of passing \n", "# from the given past statistics. If the self-check fails, try to think of an \n", "# an alternate approach.\n", " \n", "p_x: float = None\n", " \n", "p_x = np.sum(list(map(lambda year: year['pass'], previous_statistics)))/(np.sum(list(map(lambda year: year['pass'], previous_statistics))) + np.sum(list(map(lambda year: year['fail'], previous_statistics))))\n", "\n", "p_x" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "eabc49adb5886f133e9cc3b5624f5cc1", "grade": true, "grade_id": "test-5-2-exam-statistics", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# ID: test-5-2-exam-statistics - possible points: 2\n", "\n", "assert isinstance(p_x, float)\n", "assert p_x < 1.0, 'give probabilities as ratios in the range [0, 1] and not as percentages!'\n", "assert hashcheck(f'{p_x:.2f}') == '1e181f0934d441445f03ff51c972ef44275b830c10a80401e53b27bf5baf327a', \"p_x answer is likely incorrect!\"\n", "\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "340e5965fa2963faf3002ae95d8f9761", "grade": false, "grade_id": "task-5-3", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**🛠️ Task 5.3 (1 point)** Given the probabilities from the previous task, now apply Bayes Rule and calculate the probability $P(y=\\mathrm{qualified}\\;|\\;x=\\mathrm{pass})$ that a student has acquired the necessary qualifications, given that they have passed the exam. Assign the resulting conditional probability to the variable ``p_y_x``." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "7d89f5eca00a42632b952b7483095712", "grade": false, "grade_id": "ans-5-3", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# TASK: Compute the conditional probability p(y=qualified|x=pass) using previously known / estimated \n", "# probabilities using bayes rule and assign it to the variable p_y_x\n", "\n", "p_y_x: float = None\n", "\n", "p_y_x = p_x_y*p_y/p_x" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "6871a5d5b79a60a45a3e7c3142853418", "grade": true, "grade_id": "test-5-3-conditional-probability", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "If a student has successfully passed the exam, there is a 84.76% percent chance that they have actually acquired the necessary qualifications from the course!\n" ] } ], "source": [ "##### DO NOT CHANGE #####\n", "# ID: test-5-3-conditional-probability - possible points: 1\n", "\n", "print(f'If a student has successfully passed the exam, there is a {p_y_x*100:.2f}% percent '\n", " f'chance that they have actually acquired the necessary qualifications from the course!')\n", "assert isinstance(p_y_x, float)\n", "assert 0 <= p_y_x <= 1.0, 'give probabilities as ratios in the range [0, 1] and not as percentages!'\n", "assert hashcheck(f'{p_y_x:.2f}') == '1e181f0934d441445f03ff51c972ef44275b830c10a80401e53b27bf5baf327a'\n", "\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "d706c13b6391cec6bdba00640b07bd3c", "grade": false, "grade_id": "cell-9e6bec48c99422db", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "So in the end we have found out that there is a relatively high - yet not completely certain - chance of having acquired the necessary qualifications when passing the exam. Ultimately, every form of examination is trying to maximize this metric. As everyone might have already experienced, different exams achieve this ideal to different degrees.\n", "\n", "**The data-centric approach.** In the previous section, we have solved the initial question concerning exam \n", "qualifications in a *classic* fashion through a combination of educated guesses and statistics. In the end, such an approach has to be customized for each individual application. In other applications, it might be more difficult to estimate certain probabilities and past statistics might not be accessible. Overall, this kind of approach requires a lot of domain knowledge, which might or might not be available in certaion situations. Contrary to this knowledge-centric approach is the *data-centric* approach - on which machine learning methods are generally based. Instead of manually deriving a solution, it is automatically extracted from large amounts of raw data." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "e7409ac389bafa2884e9bd512ee0e27c", "grade": false, "grade_id": "cell-6e51825289e97a58", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## 5.2 Data Analysis\n", "\n", "**data collection.** We return to our example of exam qualifications - trying to answer the question of how well Prof. Friederich's exam is at determining whether students have obtained the necessary qualifications. However, in the following sections we approach this question from a *data-centric* prespective. To do this, we first need the necessary raw data. \n", "\n", "**student survey.** We assume that Prof. Friederich conducts a large scale (hypothetical) survey of students. Following the most recent exam, he randomly selects a subset of the students and invites them to a personal interview. Based on each 3 hour personal interview about the content of the lecture, Prof. Friederich is certain about the true qualifications of each student. Slowly, over the course of multiple weeks, this survey results in a dataset consisting of each students true qualification state $y_{\\mathrm{true}} \\in \\{ \\mathrm{qualified}, \\mathrm{unqualified} \\}$ in addition to their exam results $x \\in \\{ \\mathrm{pass}, \\mathrm{fail} \\}$. Additionally, each student is asked to provide the following additional information:\n", "\n", "- The number of hours $t \\in [0, \\infty)$ that they have invested into studying.\n", "- The number of points $r \\in [0, 100]$ that a student has achieved in the exercise.\n", "- The boolean state $l \\in \\{ \\mathrm{seldom}, \\mathrm{regular} \\} = \\{0,1\\}$ of how frequently a student has attended the lecture.\n", "- The boolean state $g \\in \\{ \\mathrm{ignored}, \\mathrm{used} \\} = \\{0, 1\\}$ of whether a student has used old exams during study." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "a8a8dafc3d96b34f051ba19e6f692558", "grade": false, "grade_id": "cell-3f0b02ea73a9b2e5", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**🛠️ Task 5.4 (1 point)** This dataset is available at https://bwsyncandshare.kit.edu/s/YwPT62wGYtK7HCL in CSV format. Your task to write the code to retrieve this dataset from the remote file storage server and load it as a ``pandas.DataFrame`` object into the local variable ``df`` for further processing." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "66bf579d531c0936a987a8db991dbcf7", "grade": false, "grade_id": "cell-7b3358711db7de06", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "import io\n", "import requests\n", "import pandas as pd\n", "\n", "\n", "def nextcloud_download(url: str) -> str:\n", " \"\"\"\n", " Downloads the *content* of a file from a nextcloud server.\n", " \n", " :param url: the absolute URL of the file on the nextcloud server\n", " \n", " :returns: the string content of the file\n", " \"\"\"\n", " response = requests.get(f'{url}/download')\n", " content = response.content.decode('utf-8')\n", " return content\n", "\n", "\n", "# TASK: Use the ``nextcloud_download`` function to download the dataset and then load\n", "# the dataset into the given ``df`` variable as a pandas dataframe object.\n", "\n", "df: pd.DataFrame = None\n", "\n", "csv_text = nextcloud_download(\"https://bwsyncandshare.kit.edu/s/YwPT62wGYtK7HCL\")\n", "df = pd.read_csv(io.StringIO(csv_text), sep=\",\")" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "c5ba93db1c334bbb01ecbbb084073f5d", "grade": true, "grade_id": "test-5-4-load-dataset", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# ID: test-5-4-load-dataset - possible points: 1\n", "\n", "assert isinstance(df, pd.DataFrame)\n", "assert len(df) != 0\n", "assert len(df) == 264\n", "\n", "# NOTE: The hidden tests will test some randomly chosen example elements from the dataset.\n", "\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "266e0adc174579e591c9440522cceff2", "grade": false, "grade_id": "cell-c34411470370fab6", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**🛠️ Task 5.5 (1 point)** In the previous section we wanted to approximate the conditional probability of $P(y=\\mathrm{qualified}\\;|\\;x=\\mathrm{pass})$ of a student possessing the required qualification under the condition of having passed the exam. Empirically estimate this conditional probability directly from the student survey data and assign the value to the variable ``p_y_x_data``." ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "80b7db7c475839cfcb097c4c8b9b010c", "grade": false, "grade_id": "cell-2d3823960613d127", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [ { "data": { "text/plain": [ "0.8133333333333334" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# TASK: Compute the conditional probability p(y=qualified | x=pass) directly from the dataset and \n", "# store the resulting float value in this variable.\n", "p_y_x_data: float = None\n", "\n", "p_y_x_data = np.sum(df.loc[df.passed==1][\"qualified\"])/np.sum(df[\"passed\"])\n", "p_y_x_data" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "2d66ff89d9c5600e6124e3d6102231ee", "grade": true, "grade_id": "test-5-5-direct-probability", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# ID: test-5-5-direct-probability - possible points: 1\n", "\n", "assert isinstance(p_y_x_data, float), 'please give solution as a float'\n", "assert 0.0 <= p_y_x_data <= 1.0, 'please give probability in the range [0, 1]'\n", "assert hashcheck(f'{p_y_x_data:.1f}') == '1e9d7c27c8bbc8ddf0055c93e064a62fa995d177fee28cc8fa949bc8a4db06f4'\n", "\n", "# NOTE: The hidden tests will check for the exact value of this probility with a tolerance of \n", "# 4 decimals.\n", "\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "39130e6a238044e5a9e247b4d2c6dd35", "grade": false, "grade_id": "cell-cc277c979997eea3", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Based on the student survey dataset, if a student has successfully passed the exam, there is a 81.33% percent chance that they have actually acquired the necessary qualifications from the course!\n" ] } ], "source": [ "##### DO NOT CHANGE #####\n", "print(f'Based on the student survey dataset, if a student has successfully passed the exam, there is a {p_y_x_data*100:.2f}% percent '\n", " f'chance that they have actually acquired the necessary qualifications from the course!')\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "3c2cd76bc777e097eaac0d6f313e0c9e", "grade": false, "grade_id": "cell-77e077a9d4692b83", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**🛠️ Task 5.6 (2 points).** Before continuing with the remaining execises, it makes sense to do some data exploration to get a proper overview of the data exploration here." ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "467baec40566261ecfd93b6f4bf9d7d9", "grade": false, "grade_id": "cell-ce06ed2db668e5db", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [ { "data": { "text/plain": [ "{'hours_study': 184.3560606060606,\n", " 'exercise_points': 65.15530303030303,\n", " 'qualified': 0.7916666666666666,\n", " 'passed': 0.8522727272727273,\n", " 'lecture': 0.5303030303030303}" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# TASK: Plot the data distribution for every column in the student survey dataset to gain some basic \n", "# understanding of the data.\n", "# Additionally, fill in the mean values of each of the dataset columns to the ``mean_values`` \n", "# dictionary.\n", "\n", "# HINT: Try to think of a way to do the visualization automatically. Try to generically iterate over the \n", "# columns of the dataset. What are appropriate visualizations for continuous vs. discrete variables?\n", "# What is a simple method to automatically decide whether an unkown column likely contains \n", "# cont. vs. discrete data.\n", "\n", "mean_values: dict[str, float] = {\n", " 'hours_study': 0.0,\n", " 'exercise_points': 0.0,\n", " 'qualified': 0.0,\n", " 'passed': 0.0,\n", " 'lecture': 0.0,\n", "}\n", "\n", "mean_values: dict[str, float] = {\n", " 'hours_study': df.hours_study.mean(),\n", " 'exercise_points': df.exercise_points.mean(),\n", " 'qualified': df.qualified.mean(),\n", " 'passed': df.passed.mean(),\n", " 'lecture': df.lecture.mean(),\n", "}\n", "\n", "mean_values\n" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "cb061389e89e93d8464db39e6b450fd3", "grade": true, "grade_id": "test-5-6-exploration", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# ID: test-5-6-exploration - possible points: 2\n", "\n", "assert isinstance(mean_values, dict)\n", "assert len(mean_values) >= 5\n", "\n", "# NOTE: The hidden tests will compare the values in the mean_values dict with the \n", "# true values with a tolerance of 3 decimal points\n", "\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "598ba24860e1211f4c5d2692845f7a88", "grade": false, "grade_id": "cell-cf8e9d1dd6ba1a63", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**practice makes perfect.** Since students provided addditional information during the survey, we can now use this information to approximate numerous other conditional probabilities as well. Intuitively, the amount of time spent studying makes sense as an significant indication of learning progress. Therefore in the following section we would like to investigate the conditional probability $P(y=\\mathrm{qualified}|t)$ of being qualified under the condition of having spent $t$ hours studying.\n", "\n" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "8c7416825231efb8cb54dd16b5e7eeaf", "grade": false, "grade_id": "cell-a5460b37faa1d026", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "##### DO NOT CHANGE #####\n", "\n", "qualified = df['qualified']\n", "hours_study = df['hours_study']\n", "\n", "fig, (ax1, ax2) = plt.subplots(\n", " ncols=2,\n", " nrows=1,\n", " figsize=(10, 5)\n", ")\n", "\n", "sns.histplot(hours_study, ax=ax1, binwidth=10)\n", "ax1.set_xlabel('$t$')\n", "ax1.set_title('Hours of study $t$')\n", "\n", "s = df['qualified'].value_counts(sort=False)\n", "sns.barplot(ax=ax2, x=s.index, y=s.values, order=s.index)\n", "ax2.bar_label(ax2.containers[0]);\n", "ax2.set_xlabel('$x$')\n", "ax2.set_title('Qualification $y$')\n", "\n", "fig\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "8a4be283bbd7c692cdae0f65050def82", "grade": false, "grade_id": "cell-a323ae983b8ad3ce", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**🛠️ Task 5.7 (2 points)** In this exercise we want to determine $P(y=\\mathrm{qualified}|t)$ for all possible values of $t \\in \\{0, \\dots, 400\\}$. Calculate the conditional probabilities from the student survey dataset and use them to fill the ``p_y_t`` array." ] }, { "cell_type": "code", "execution_count": 117, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "0e10f026f49550a8447883a80aebfd62", "grade": false, "grade_id": "cell-bdeb340bea8a3142", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_22/3285982670.py:18: RuntimeWarning: invalid value encountered in scalar divide\n", " p_y_t[i] = (np.sum(df.loc[df.hours_study == t][\"qualified\"]) / np.sum(df.hours_study == t))\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA04AAAHYCAYAAAB6ALj2AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8fJSN1AAAACXBIWXMAAA9hAAAPYQGoP6dpAABg5klEQVR4nO3de3wU5d338e8mgSxoshAgBzBCRKvG1FCQYDxijQL6xMN9q9WKgPXwiHjEWkg9BLQKyKP1VmwUWkWLVu9asVI1VRFPFY0SI8YgVg2HYkLEwCYCCbA7zx80WzbZzexmD7Ob/bxfr3292JlrrvnNNZPr2h87e43NMAxDAAAAAAC/kqwOAAAAAABiHYkTAAAAAJggcQIAAAAAEyROAAAAAGCCxAkAAAAATJA4AQAAAIAJEicAAAAAMEHiBAAAAAAmSJwAAAAAwASJEwAAAACYIHECAAAAABMkTugV7rvvPh111FFyu91Wh2KppUuXymazacOGDT7fd/joo490wgkn6KCDDpLNZlNNTY3fsj01Z84c2Wy2gMqOGDFCc+bM6dF+Hn30UR166KFqb2/v0fYAwq/j73/btm1WhxIQX31iLAqmX40lsdy+8Xathlssn5sOsXTdkzghZv3973+XzWbzvJKTkzVixAjdfPPN+uGHHzzlWlpatGDBAs2aNUtJSVzSZvbu3asLL7xQzc3N+u1vf6s//vGPGj58uNVhBcQwDN1111169913PcumTZumPXv26LHHHrMwMiD6Ov6zw263a8uWLV3Wjx8/XgUFBRZEFl/C3Se+//77mjNnjnbs2BG+IKMo3PFbNeb4Gi/gLZznJt6v+0DxKRMx69NPP5Ukzx/zY489pmOOOUYPPvigbrnlFk+5xx9/XPv27dMll1xiVagx67LLLtPu3bu9OsKvv/5aGzdu1C9/+UtdffXVmjx5sgYOHOizbKz58ssvVV5eroaGBs8yu92uqVOn6oEHHpBhGBZGB1ijvb1d8+fPtzqMuOWvT+yp999/X3Pnzo3bD5Dhjj/c7RsoX+MFvIXz3MT7dR8oEifErLVr1+qggw7SDTfcoMmTJ+vKK6/USy+9pOHDh+uvf/2rp9wTTzyhc845R3a73cJoY1NycrLsdrvXV9xNTU2SpAEDBpiWjTVr1qyRJI0ePdpr+UUXXaSNGzdq1apVVoQFWGrUqFFasmSJvv32W6tDiaqdO3eGpR5/fSLCIxLtG8i59zdexINwXdtmuPaDR+KEmPXpp5/q2GOP9br9Ljk5WZmZmWptbZUk1dfXa+3atSopKfHadsuWLbLb7frFL37htfyNN95Qnz59dPPNN0cs7vfee09jx46V3W7XyJEj9dhjj/m8P3fatGkaMWJEl+07l924caOuvfZaHXnkkerXr58GDRqkCy+8MKDfInX+3dK0adN06qmnSpIuvPBC2Ww2jR8/3mdZaX87/uIXv1BWVpZSU1N1zDHH6PHHHw/omMOtqKhIl156qSTpiCOOkM1m83T2Y8aMUUZGhldCDSSKX//613K5XKbfOgXa53S8//LLLzV58mQ5HA4NGTJEd9xxhwzD0ObNm3XuuecqPT1d2dnZuv/++/3uc9u2bbrooouUnp6uQYMG6cYbb1RbW5tXmUD6mY6Y6urq9POf/1wDBw7USSedZNo2n3zyiSZNmqT09HQdfPDBOv300/XBBx94tYm/PtGX1tZW3XTTTRoxYoRSU1OVmZmpM844Q9XV1Z44b731VklSXl6e51bzDRs2BNz+HQLtV4Npv6+++krTpk3TgAED5HA4dPnll2vXrl1e5fzFb3bsvpi1r9n5OTD2YM59d+OFPzt27Oi2bYKJOZhz7e/4etLewcYYrmu/u+sm2PYI5LpftWqVbDabli9f3mXdM888I5vNptWrV3fXRD2WEpFagRDt2bNH69ev11VXXeW1fOvWrfr88889/4P0/vvvS+r6P0rDhg3TlVdeqcWLF6u8vFzDhw/XF198oQsvvFCTJk3yOdDv3btXTqczoPgyMjJ8/p7qs88+05lnnqkhQ4Zozpw52rdvn8rLy5WVlRVQvb589NFHev/993XxxRfrkEMO0YYNG1RRUaHx48errq5O/fv3D7iu//t//6+GDRume++9VzfccIPGjh3rN7atW7fq+OOPl81m03XXXachQ4bo1Vdf1RVXXKGWlhbddNNNETtmX2bNmqU5c+aovb1dd955pyTv/yUbPXq0/vGPf4R1n0A8yMvL05QpU7RkyRLNnj1bQ4cODUu9P/vZz3T00Udr/vz5evnll/Wb3/xGGRkZeuyxx/TTn/5UCxYs0NNPP61f/vKXGjt2rE455ZQudVx00UUaMWKE5s2bpw8++EAPPfSQtm/frqeeekpS4P1MhwsvvFBHHHGE7r33XtNbcz///HOdfPLJSk9P169+9Sv16dNHjz32mMaPH6+3335b48aNC6pPlKRrrrlGzz//vK677jrl5+fr+++/13vvvad169Zp9OjR+q//+i99+eWX+tOf/qTf/va3Gjx4sCRpyJAhQbV9oP1qsO130UUXKS8vT/PmzVN1dbV+//vfKzMzUwsWLJCkbuO/+uqruz12X7pr30DOz4GCOfdm44UvZm3Tk5iD0fn4zK41fwKNMZzXfrSv+/Hjxys3N1dPP/20zj//fK91Tz/9tEaOHKni4uKg9h0wA4hBn3zyiSHJuP/++43vvvvO2LJli/Haa68Zxx13nJGcnGy8/vrrhmEYxu23325IMlpbW7vU8a9//ctITU01pk+fbmzbts0YOXKkMWrUKOOHH37wuc9Vq1YZkgJ61dfX+6zjvPPOM+x2u7Fx40bPsrq6OiM5Odno/Oc2depUY/jw4V3qKC8v9yq7a9euLmVWr15tSDKeeuopr+VPPPGEV3yd3x94nH/+85+73faKK64wcnJyjG3btnmVu/jiiw2Hw+GJK5hj9mf48OFGeXm5ablDDz3UmDZtms91V199tdGvX7+A9gf0Bh1/sx999JHx9ddfGykpKcYNN9zgWX/qqacaxxxzjOd9oH1Ox/urr77as2zfvn3GIYccYthsNmP+/Pme5du3bzf69etnTJ061Wed55xzjtfya6+91pBkfPrpp4ZhBN7PdNR3ySWXBNg6+/umvn37Gl9//bVn2bfffmukpaUZp5xyimeZvz7RF4fDYcyYMaPbMgsXLvQ5TgTa/h2xB9KvBtt+v/jFL7zKnX/++cagQYMCij+QY/fFX/sGen56cu4No/vx4kDBtE2gMQdzrv0dX0/bO9AYDSO8176/68YwAm+PYD5PlJWVGampqcaOHTs8y5qamoyUlJSAPk/0FLfqISatXbtWknTLLbdoyJAhGjZsmM4880y1trbqpZde8tya9/333yslJUUHH3xwlzqGDRumq666So8//rjOPvts7d69W3/729900EEH+dxnYWGhXn/99YBe2dnZXbZ3uVz6+9//rvPOO0+HHnqoZ/nRRx+tCRMm9Lgt+vXr5/n33r179f333+vwww/XgAEDAvrKvicMw9Bf/vIXlZaWyjAMbdu2zfOaMGGCnE6nqqurI3bMvjidTm3atEnHHnusz/UDBw7U7t27fd5aAfR2hx12mC677DItXrw4bD+Gv/LKKz3/Tk5O1nHHHSfDMHTFFVd4lg8YMEBHHnmkvvnmG591zJgxw+v99ddfL0l65ZVXAu5nDnTNNdcEFLvL5dJrr72m8847T4cddphneU5Ojn7+85/rvffeU0tLS0B1HWjAgAH68MMPI/p7skD71XC038knn6zvv/8+oLYI57H35PwEeu4l8/HCF7O2idQ15W//PWnvSMYY6Ws/2M8TU6ZMUXt7u55//nnPsueee0779u3T5MmTIxKjxG+cEKM+/fRTpaSk6LXXXtPrr7+ut99+W998842++OILnXXWWQHX88tf/lLt7e1au3atXnrpJQ0bNsxv2YEDB6qkpCSgl6+JKL777jvt3r1bRxxxRJd1Rx55ZMAxd7Z7927deeedys3NVWpqqgYPHqwhQ4Zox44dAd9aGKzvvvtOO3bs0OLFizVkyBCv1+WXXy5p/49KI3XMvnQk0/4GQuPft27E8uQWQCTdfvvt2rdvX9hm2Dvww4skORwO2e12z204By7fvn27zzo69w0jR45UUlKSNmzYEHA/c6C8vLyAYv/uu++0a9cun/3Q0UcfLbfbrc2bNwdU14Huu+8+1dbWKjc3V0VFRZozZ47fpLGnAu1Xe9J+nc9pxwxq/s7fgcJ57D05P4Gee8l8vPDFrG0idU116Hx8PWnvSMYY6Ws/2M8TRx11lMaOHaunn37as+zpp5/W8ccfr8MPPzxscXXGb5wQk9auXavDDz9cZ5xxRrflBg0apH379qm1tVVpaWld1t9zzz2SpH379ikjI6Pbuvbs2aPm5uaA4hsyZIiSk5MDKuuPvw/4LpfL6/3111+vJ554QjfddJOKi4vlcDhks9l08cUXR+yBvx31Tp48WVOnTvVZ5thjj43qA4c7BsLCwkKf67dv367+/ft7fUMHJJLDDjtMkydP1uLFizV79uwu6wPtczr46uP89XuGyW9OfMUQaD9zIKv/vi+66CKdfPLJWr58uV577TUtXLhQCxYs0AsvvKBJkyZ1u22w7W+mJ+0XyvkL5djDIZhzbzZe+BLqtX2gnpzrzsdndXt3FkvXfocpU6boxhtv1L/+9S+1t7frgw8+0KJFi0Kq0wyJE2LS2rVrdfLJJ5uWO+qooyTtn12v8wCxcOFC/f73v9eiRYt066236p577tHvf/97v3W9//77Ou200wKKr76+vssMMUOGDFG/fv30z3/+s0v59evXd1k2cOBAn8872Lhxo9f7559/XlOnTvWa0KKtrS2iz0oYMmSI0tLS5HK5usxYeCCXyxXUMYdi7dq1ysnJ6fK/3R3q6+t19NFHh3WfQLy5/fbbtWzZMq8ftHcItM8Jp3/+859e/5P+1Vdfye12a8SIEQH3Mz0xZMgQ9e/f32c/9MUXXygpKUm5ubk9qjsnJ0fXXnutrr32WjU1NWn06NG65557PB8e/X1IDLT9Ax1LItV+3X1rb3bsgYrk+ZHMx4ueCCbmcP2tBdvekW7X7uLp7roJpD2C/QwlSRdffLFmzpypP/3pT9q9e7f69Omjn/3sZ8EfWBC4VQ8xp7GxUU1NTcrPzzct2zFryscff+y1/MUXX9Ts2bN19913a8aMGbr66qv11FNPqb6+3m9dof7GKTk5WRMmTNCLL76oTZs2eZavW7dOf//737uUHzlypJxOp+d/xiSpoaGhy/SaycnJXf7H6+GHHw75f2q6k5ycrP/+7//WX/7yF9XW1nZZ/91333nKBXPModi0aZMOOeQQv+urq6t1wgknhHWfQLwZOXKkJk+erMcee0yNjY1d1gXS54TTI4884vX+4YcfliRNmjQp4H6mJ5KTk3XmmWfqr3/9q9cjFrZu3apnnnlGJ510ktLT04Oq0+Vydbk9OjMzU0OHDlV7e7tnWcfvaDt/UAymzw+kX41U+/mKP9BjD1Qkzs+BzMaLnggm5lD/1nra3pFq10Di8XfdS4G1R08+TwwePFiTJk3SsmXL9PTTT2vixIlhTZZ94RsnxJxPP/1UknTMMceYlj3ssMNUUFCgN954w/PMpjVr1ujSSy/VpZdeqttuu02S9Ktf/UqPPvpot986dfzGKRRz585VZWWlTj75ZF177bXat2+fHn74YR1zzDFeHYa0/39KZs2apfPPP1833HCDdu3apYqKCv3oRz/y+kHv//k//0d//OMf5XA4lJ+fr9WrV+uNN97QoEGDQorVzPz587Vq1SqNGzdOV111lfLz89Xc3Kzq6mq98cYbntsagznmUOTl5enNN9/Ufffdp6FDh+roo4/WmDFjJO0/583NzTr33HPDtj8gXt1222364x//qPXr13v1o4H2OeFUX1+vc845RxMnTtTq1au1bNky/fznP/fcQhVoP9MTv/nNb/T666/rpJNO0rXXXquUlBQ99thjam9v13333Rd0fa2trTrkkEN0wQUXqLCwUAcffLDeeOMNffTRR153BHT0S7fddpsuvvhi9enTR6WlpUG1f6D9aiTaz1f8J598so488kjTYw9GuM/PgbobL0IRaMyh/q0Feq2FEmMwAonH33V/0EEHBdwePfk8MWXKFF1wwQWSpLvvvrtHxxeUiM3XB/TQfffdZ0gy1q5dG1D5Bx54wDj44IONXbt2GZs3bzZycnKME0880Whra/MqN336dKNPnz7GN998E4mwPd5++21jzJgxRt++fY3DDjvMePTRR31OQWoYhvHaa68ZBQUFRt++fY0jjzzSWLZsWZey27dvNy6//HJj8ODBxsEHH2xMmDDB+OKLL4zhw4d3mQI4nNORG4ZhbN261ZgxY4aRm5tr9OnTx8jOzjZOP/10Y/HixT0+Zl8CmY58y5YtxoQJE4yDDz7YkGQ89NBDnnWzZs0yDj30UMPtdge0P6A3OHA68s6mTp1qSPKajtwwAutzOt5/9913Xeo86KCDuuyr87TnB9ZRV1dnXHDBBUZaWpoxcOBA47rrrjN2797tVTaQfsZfTGaqq6s9/Ub//v2N0047zXj//fe9ygQ6JXN7e7tx6623GoWFhUZaWppx0EEHGYWFhcbvfve7LmXvvvtuY9iwYUZSUpJXvxpI+3cItF8Npf189fu+4l+/fn3Ax95Zd+0byPnpybnvbrzoLNi2CSRmwwj8XPvafzDXmi+Bxhjua9/fdR9MewT7eaK9vd0YOHCg4XA4uvQtkWAzjB786g2IIU6nU4cddpjuu+8+r2lyY8mcOXM0d+7cHv3INBGMGDFC06ZN05w5c4Letr29XSNGjNDs2bN14403hj84AAAQk/bt26ehQ4eqtLRUf/jDHyK+P37jhLjncDj0q1/9SgsXLozqLG+IDU888YT69OkT1DM+AABA/HvxxRf13XffacqUKVHZH4kTeoVZs2Z5ZoxBYrnmmmu0adMmpaamWh0KAACIgg8//FBLlizRzJkz9ZOf/ESnnnpqVPbLp0wAAAAAcaOiokLTp09XZmamnnrqqajtl984AQAAAIAJvnECAAAAABMkTgAAAABgIuEegOt2u/Xtt98qLS1NNpvN6nAAIKEYhqHW1lYNHTqUyVwOwNgEANYIZlxKuMTp22+/VW5urtVhAEBC27x5sw455BCrw4gZjE0AYK1AxqWES5zS0tIk7W+c9PR0i6MBgMTS0tKi3NxcT1+M/RibAMAawYxLCZc4ddwCkZ6ezuAEABbhdjRvjE0AYK1AxiVuMAcAAAAAEyROAAAAAGCCxAkAAAAATJA4AQAAAIAJEicAAAAAMEHiBAAAAAAmSJwAAAAAwASJEwAAAACYIHECAAAAABMpVgeA6HG5DVXVN6uptU2ZaXYV5WUoOcn8Kcnd1TFm+ECt2bg9pDpD2X/n/YXjGCMZX7i3j/Txmp3vUN+bnb9w1x/s8YV7e6uvTwCIRfSNiBeWJk7vvPOOFi5cqDVr1qihoUHLly/Xeeed1+02b731lmbOnKnPP/9cubm5uv322zVt2rSoxBvPKmsbNHdFnRqcbZ5lOQ67ykvzNbEgp8d1JNkkt/GfMsHWGQyzYwjHMUYyvnBvH+njDeR8h/re7PyFs/6eHF84t7f6+gSAWETfiHhiMwzDMC8WGa+++qr+8Y9/aMyYMfqv//ov08Spvr5eBQUFuuaaa3TllVdq5cqVuummm/Tyyy9rwoQJAe2zpaVFDodDTqdT6enpYTqS2FZZ26Dpy6rV+UR3/F9OxeTRpp2Tvzo6C6bOYJgdw9Wn5GnxO/UhHWMk4zPbf7Dbh+OcdifQ8x0qs/MXrvoDbb9IbW/19RlLErEPDgTtgkQU6bEMCEQw/a+lv3GaNGmSfvOb3+j8888PqPyjjz6qvLw83X///Tr66KN13XXX6YILLtBvf/vbCEcav1xuQ3NX1Pn8gNexbO6KOrnc/j8CdldHT+sMhtkxGJKWvOv7Q3ck4gk2PrP9B7t9OM5pd4I536EyO3/hqF8KvP0isb3V1ycAxKJIj2VAJMTV5BCrV69WSUmJ17IJEyZo9erVfrdpb29XS0uL1yuRVNU3e3393ZkhqcHZpqr65h7X0ZM6gxHI/rvrV8MdT2ehtnGw24fjnHYn2PMdDpEcF4Ntv3BvL1l7fQJALIr0WAZEQlwlTo2NjcrKyvJalpWVpZaWFu3evdvnNvPmzZPD4fC8cnNzoxFqzGhqDewDXnflAq0jXNvFej09rddfuWC3D8c5jcR2sS7Y9gv39oHWDwCJINJjGRAJcZU49URZWZmcTqfntXnzZqtDiqrMNHvI5QKtI1zbxXo9Pa3XX7lgtw/HOY3EdrEu2PYL9/aB1g8AiSDSYxkQCXGVOGVnZ2vr1q1ey7Zu3ar09HT169fP5zapqalKT0/3eiWSorwM5Tjs8jepp037Z68pysvocR09qTMYgew/yaaQjjEUobZxsNuH45x2J9jzHQ7dnb9QBdt+4d5esvb6BIBYFOmxDIiEuEqciouLtXLlSq9lr7/+uoqLiy2KKPYlJ9lUXpovqesHt4735aX53T4vobs6Ogu0zmCYHYNN0lUn5/ldH+54go3PbP/Bbh+Oc9qdYM53qMzOXzjqlwJvv0hsb/X1CQCxKNJjGRAJliZOP/zwg2pqalRTUyNp/3TjNTU12rRpk6T9t9lNmTLFU/6aa67RN998o1/96lf64osv9Lvf/U7/+7//q5tvvtmK8OPGxIIcVUwerWyH99fd2Q57wFN9+qujc38WTJ3BMDuGsrPyQz7GSMZntv9gtw/HOe1JPJ3Pd6jvzc5fuOoPtP0itb3V1ycAxKJIj2VAuFn6HKe33npLp512WpflU6dO1dKlSzVt2jRt2LBBb731ltc2N998s+rq6nTIIYfojjvuCOoBuIn8rIxwPJm7cx1jhg/Umo3bo/a0b7NjsPrp46HuP9jtI328Zuc71Pdm5y/c9Qd7fOHe3urrMxYkch/cHdoFiYy+EVYKpv+1NHGyAoMTAFiHPtg32gUArBE3D8AFAAAAgHhA4gQAAAAAJkicAAAAAMAEiRMAAAAAmCBxAgAAAAATJE4AAAAAYILECQAAAABMkDgBAAAAgAkSJwAAAAAwQeIEAAAAACZInAAAAADABIkTAAAAAJhIsToAANHjchuqqm9WU2ubMtPsKsrLUHKSzeqwAAAAYh7fOAEJorK2QScteFOXLPlANz5bo0uWfKCTFrypytoGq0MDYto777yj0tJSDR06VDabTS+++KLfstdcc41sNpsefPDBqMUHAIgOEicgAVTWNmj6smo1ONu8ljc62zR9WTXJE9CNnTt3qrCwUI888ki35ZYvX64PPvhAQ4cOjVJkAIBo4lY9oJdzuQ3NXVEnw8c6Q5JN0twVdTojP5vb9gAfJk2apEmTJnVbZsuWLbr++uv197//XWeffXaUIgMARBPfOAG9XFV9c5dvmg5kSGpwtqmqvjl6QQG9iNvt1mWXXaZbb71VxxxzTEDbtLe3q6WlxesFAIhtJE5AL9fU6j9p6kk5AN4WLFiglJQU3XDDDQFvM2/ePDkcDs8rNzc3ghECAMKBxAno5TLT7GEtB+A/1qxZo//5n//R0qVLZbMFfqtrWVmZnE6n57V58+YIRgkACAcSJ6CXK8rLUI7DLn8f6WySchz7pyYHEJx3331XTU1NOvTQQ5WSkqKUlBRt3LhRt9xyi0aMGOF3u9TUVKWnp3u9AACxjckhgAiKhecmJSfZVF6ar+nLqmWTvCaJ6IikvDSfiSGAHrjssstUUlLitWzChAm67LLLdPnll1sUFQAgEkicgAiprG3Q3BV1XhMz5DjsKi/N18SCnKjGMrEgRxWTR3eJJ9uieIB48sMPP+irr77yvK+vr1dNTY0yMjJ06KGHatCgQV7l+/Tpo+zsbB155JHRDhUAEEEkTkAEdDw3qfMU4B3PTaqYPNqS5OmM/GzLvwED4s3HH3+s0047zfN+5syZkqSpU6dq6dKlFkUFAIg2EicgzML53KRw3+qXnGRT8chB5gUBeIwfP16G4esv2rcNGzZELhgAgGVInIAwC+a5Sd0lMbF0qx8AAECiY1Y9IMzC8dykjlv9OidgHbf6VdY2hBQjAAAAgkPiBIRZqM9NMrvVT9p/q5/LHfitQwAAAAgNt+oBYdbx3KRGZ5vP5Mem/bPZ+XtuUii3+gX7m6hYmC4dAAAgHpA4AWEW6nOTenqrX7C/ieI3VAAAAIHjVj0gAjqem5Tt8L4dL9thN52KvCe3+gX7myh+QwUAABAcvnECIqSnz00K9la/YKc/D+d06QAAAImCb5yACOp4btK5o4apeOSggBKRjlv9pP/c2tfB161+wfwmqiflAQAAQOIEWMrlNrT66+/115otWv31956Z8oK51S/Q30T946vv5HIbYZkuHQAAINFwqx5gEbPJGQK91S/Q30QtWvW1/lK9RRePzQ2ofKD1AgAAJAK+cQIsEOjkDIHc6tfxm6hAfo3U6GzTb9/4pwb07+O3vE37Ezh/06UDAAAkIhInIMrC/YDb7n4T5av+A8sE8hsqAAAAkDgBUReJyRn8/SbKX/07du3VTSU/6tF06QAAAImI3zgBURapyRk6fhP129e/1KJVX5mWHzG4v96b9dOgp0sHAABIRCROQJT15AG3gUpOsunEwwcHlDhlptk9v6ECAABA97hVD4gys8kcQp2cIdL1AwAAJCISJyDKgn3AbazVDwAAkIhInAALBPOA21isHwAAINHwGyfAIoE+4DZW6wcAAEgkJE6AhSI9OQOTPwAAAIQHt+oBAAAAgAkSJwAAAAAwQeIEAAAAACb4jRN6NZfbYHIEAAAAhIzECb1WZW2D5q6oU4OzzbMsx2FXeWk+03EDAAAgKNyqh16psrZB05dVeyVNktTobNP0ZdWqrG2wKDIAAADEIxIn9Dout6G5K+pk+FjXsWzuijq53L5KAAAAAF2ROKHXqapv7vJN04EMSQ3ONlXVN0cvKAAAAMQ1Eif0Ok2t/pOmnpQDAAAASJzQ62Sm2cNaDgAAACBxQq9TlJehHIdd/iYdt2n/7HpFeRnRDAsAAABxjMQJvU5ykk3lpfmS1CV56nhfXprP85wAAAAQMBIn9EoTC3JUMXm0sh3et+NlO+yqmDya5zgBAAAgKDwAF73WxIIcnZGfrar6ZjW1tikzbf/teXzTBAAAgGCROCGmudxGt4mP2frkJJuKRw6yInSfzOJNNLQHAISXFf0qfTkSheWJ0yOPPKKFCxeqsbFRhYWFevjhh1VUVOS3/IMPPqiKigpt2rRJgwcP1gUXXKB58+bJbmeGtN6msrZBc1fUeT2TKcdhV3lpviYW5JiujzXxFm+k0R6IF++8844WLlyoNWvWqKGhQcuXL9d5550nSdq7d69uv/12vfLKK/rmm2/kcDhUUlKi+fPna+jQodYGjoRjRb9KX45EYulvnJ577jnNnDlT5eXlqq6uVmFhoSZMmKCmpiaf5Z955hnNnj1b5eXlWrdunf7whz/oueee069//esoR45Iq6xt0PRl1V0eZNvobNP0ZdWa90pdt+sraxuiGa4ps+OJtXgjjfZAPNm5c6cKCwv1yCOPdFm3a9cuVVdX64477lB1dbVeeOEFrV+/Xuecc44FkSKRWdGv0pcj0dgMwzCs2vm4ceM0duxYLVq0SJLkdruVm5ur66+/XrNnz+5S/rrrrtO6deu0cuVKz7JbbrlFH374od57772A9tnS0iKHwyGn06n09PTwHAjCyuU2dNKCN7t0xAdKskluP1euTfsngXhv1k9j4lYBs+OJtXgjjfZIbPHeB9tsNq9vnHz56KOPVFRUpI0bN+rQQw8NqN54bxdYy4p+lb4cvUUw/a9l3zjt2bNHa9asUUlJyX+CSUpSSUmJVq9e7XObE044QWvWrFFVVZUk6ZtvvtErr7yis846y+9+2tvb1dLS4vVCbKuqb+42aZL8J02SZEhqcLapqr45vIH1kNnxxFq8kUZ7oLdzOp2y2WwaMGCA3zKMTQgnK/pV+nIkIssSp23btsnlcikrK8treVZWlhobG31u8/Of/1x33XWXTjrpJPXp00cjR47U+PHju71Vb968eXI4HJ5Xbm5uWI8D4dfU2n3SFO16QhVoHLESb6TRHujN2traNGvWLF1yySXd/s8lYxPCyYp+lb4ciSiunuP01ltv6d5779Xvfvc7z73kL7/8su6++26/25SVlcnpdHpemzdvjmLE6InMtPBM9BGuekIVaByxEm+k0R7orfbu3auLLrpIhmGooqKi27KMTQgnK/pV+nIkIstm1Rs8eLCSk5O1detWr+Vbt25Vdna2z23uuOMOXXbZZbryyislST/+8Y+1c+dOXX311brtttuUlNQ1D0xNTVVqamr4DwARU5SXoRyHXY3ONvm7Iy/JJhmGfK7vuK+6KC8jglEGzux4Yi3eSKM90Bt1JE0bN27Um2++aXqfPGMTwsmKfpW+HInIsm+c+vbtqzFjxnhN9OB2u7Vy5UoVFxf73GbXrl1dkqPk5GRJkoVzXCDMkpNsKi/Nl7S/4z2Q7d+vq07O87tekspL82Pmx6hmxyPFVryRRnugt+lImv75z3/qjTfe0KBBsfPsOCQGK/pV+nIkIktv1Zs5c6aWLFmiJ598UuvWrdP06dO1c+dOXX755ZKkKVOmqKyszFO+tLRUFRUVevbZZ1VfX6/XX39dd9xxh0pLSz0JFHqHiQU5qpg8WtkO76/4sx12VUwerbKz8rtdH2vPjjA7nliLN9JoD8STH374QTU1NaqpqZEk1dfXq6amRps2bdLevXt1wQUX6OOPP9bTTz8tl8ulxsZGNTY2as+ePdYGjoRiRb9KX45EY+l05JK0aNEizwNwR40apYceekjjxo2TJI0fP14jRozQ0qVLJUn79u3TPffcoz/+8Y/asmWLhgwZotLSUt1zzz3dzl50IKZ8jS9mTyOPt6eVx1u8kUZ7JJ547IPfeustnXbaaV2WT506VXPmzFFeXp7P7VatWqXx48cHtI94bBfEJiv6VfpyxLNg+l/LE6doY3ACAOvQB/tGuwCANeLiOU4AAAAAEC9InAAAAADABIkTAAAAAJggcQIAAAAAEyROAAAAAGCCxAkAAAAATKRYHQASC896CE2k24/zAwAA4BuJE6KmsrZBc1fUqcHZ5lmW47CrvDSfp4sHINLtx/kBAADwj1v1EBWVtQ2avqza60O5JDU62zR9WbUqaxssiiw+RLr9OD8AAADdI3FCxLnchuauqJPhY13Hsrkr6uRy+yqBSLcf5wcAAMAciRMirqq+ucs3GQcyJDU421RV3xy9oOJIpNuP8wMAAGCOxAkR19Tq/0N5T8olmki3H+cHAADAHJNDIOIy0+xhLZdoIt1+nB8ASBzMngr0HIkTIq4oL0M5DrsanW0+f0djk5Tt2N95o6tItx/nBwASA7OnAqHhVj1EXHKSTeWl+ZL2fwg/UMf78tJ8/sfLj0i3H+cHAHo/Zk8FQkfihKiYWJCjismjle3wvt0r22FXxeTR/E+XiUi3H+cHAHovZk8FwoNb9RA1EwtydEZ+NvdW91Ck24/zAwC9UzCzpxaPHBS9wIA4Q+KEqEpOstEphyDS7cf5AYDeh9lTgfDgVj0AAIBejNlTgfAgcQIAAOjFOmZP9XfjtU37Z9dj9lSgeyROAAAAvRizpwLhQeIEAADQyzF7KhA6JocAAABIAMyeCoSGxAkJzeU2uh1AzNYDABBPmD0V6DkSJySsytoGzV1R5/VsixyHXeWl+ZpYkGO6HgAAAImD3zghIVXWNmj6suouDwRsdLZp+rJqzXulrtv1lbUN0QwXAAAAFiNxQsJxuQ3NXVEnw8c649+vJe/W+10vSXNX1Mnl9lUCAAAAvRGJExJOVX1zl2+SOusuJzIkNTjbVFXfHN7AAAAAELNInJBwmlq7T5qiXQ8AAABiH4kTEk5mmt28UBTrAQAAQOwjcULCKcrLUI7D3uXp6QdKsnV9unoHm/bPrleUlxGB6AAAABCLSJyQcJKTbCovzZfUNTmy/ft11cl5ftdLUnlpPs9zAgAASCAkTkhIEwtyVDF5tLId3rfbZTvsqpg8WmVn5Xe7nuc4AQAAJBYegIuENbEgR2fkZ6uqvllNrW3KTNt/+13HN0lm6wEAAJA4SJyQ0JKTbCoeOajH6wEAAJAYSJwAAABihMttdHunQ+f1Y4YP1JqN26N6Z4RZjEBvReIEAEA33nnnHS1cuFBr1qxRQ0ODli9frvPOO8+z3jAMlZeXa8mSJdqxY4dOPPFEVVRU6IgjjrAuaMSlytoGzV1R5/WQ9hyHXeWl+ZpYkONzfZLN+6HtB5a3IkagN2NyCAAAurFz504VFhbqkUce8bn+vvvu00MPPaRHH31UH374oQ466CBNmDBBbW08JBuBq6xt0PRl1V4JiSQ1Ots0fVm15r1S53P9gUnTgeUraxuiHmMk9gnEEr5xAgCgG5MmTdKkSZN8rjMMQw8++KBuv/12nXvuuZKkp556SllZWXrxxRd18cUXRzNUxCmX29DcFXUyfKzrWLbk3Xqf632Vt0mau6JOZ+Rnh+0WOrMYI7FPINbwjRMAAD1UX1+vxsZGlZSUeJY5HA6NGzdOq1ev9rtde3u7WlpavF5IXFX1zV2+xems8zdL3TEkNTjbVFXfHFpgBzCLMRL7BGINiRMAAD3U2NgoScrKyvJanpWV5Vnny7x58+RwODyv3NzciMaJ2NbUGpnbOsNZb6B1RepYgFhA4gQAQJSVlZXJ6XR6Xps3b7Y6JFgoM81uXsjiegOtK1LHAsQCEicAAHooOztbkrR161av5Vu3bvWs8yU1NVXp6eleLySuorwM5Tjs6u6XQUk2dbv+QDbtn+muKC8jDNHtZxZjJPYJxBoSJwAAeigvL0/Z2dlauXKlZ1lLS4s+/PBDFRcXWxgZ4klykk3lpfmSuiZHtn+/rjo5z+f6zjrWl5fmh3WSBrMYI7FPINaQOAEA0I0ffvhBNTU1qqmpkbR/Qoiamhpt2rRJNptNN910k37zm9/opZde0meffaYpU6Zo6NChXs96AsxMLMhRxeTRynZ43+qW7bCrYvJolZ2V73N95zylo3wknqlkFiPPcUJvZzMMI4h5WuJfS0uLHA6HnE4nt0YAQJTFYx/81ltv6bTTTuuyfOrUqVq6dKnnAbiLFy/Wjh07dNJJJ+l3v/udfvSjHwW8j3hsF0SGy22oqr5ZTa1tykzbf+vbgd/idF4/ZvhArdm43W95K2IE4kkw/S+JEwAgauiDfaNdAMAawfS/3KoHAAAAACZInAAAAADABIkTAAAAAJggcQIAAAAAEyROAAAAAGCCxAkAAAAATJA4AQAAAIAJEicAAAAAMEHiBAAAAAAmSJwAAAAAwASJEwAAAACYSLE6APQuLrehqvpmNbW2KTPNrqK8DCUn2aK2faTFenwAAMSaYMdOxlrEKssTp0ceeUQLFy5UY2OjCgsL9fDDD6uoqMhv+R07dui2227TCy+8oObmZg0fPlwPPvigzjrrrChGDV8qaxs0d0WdGpxtnmU5DrvKS/M1sSAn4ttHWqzHBwBArAl27GSsRSyz9Fa95557TjNnzlR5ebmqq6tVWFioCRMmqKmpyWf5PXv26IwzztCGDRv0/PPPa/369VqyZImGDRsW5cjRWWVtg6Yvq/bq6CSp0dmm6cuqVVnbENHtIy3W4wMAINYEO3Yy1iLWWZo4PfDAA7rqqqt0+eWXKz8/X48++qj69++vxx9/3Gf5xx9/XM3NzXrxxRd14oknasSIETr11FNVWFgY5chxIJfb0NwVdTJ8rOtYNndFnVxuXyVC3z7SYj0+AABiTbBjJ2Mt4oFlidOePXu0Zs0alZSU/CeYpCSVlJRo9erVPrd56aWXVFxcrBkzZigrK0sFBQW699575XK5/O6nvb1dLS0tXi+EV1V9c5f/HTqQIanB2aaq+uaIbB9psR4fAACxJtixk7EW8cCyxGnbtm1yuVzKysryWp6VlaXGxkaf23zzzTd6/vnn5XK59Morr+iOO+7Q/fffr9/85jd+9zNv3jw5HA7PKzc3N6zHAamp1X9HF0i5ULePtFiPDwCAWBPs2MlYi3gQV9ORu91uZWZmavHixRozZox+9rOf6bbbbtOjjz7qd5uysjI5nU7Pa/PmzVGMODFkptlDKhfq9pEW6H63tbZzCwEAAAp+bI/1zwKAZGHiNHjwYCUnJ2vr1q1ey7du3ars7Gyf2+Tk5OhHP/qRkpOTPcuOPvpoNTY2as+ePT63SU1NVXp6utcL4VWUl6Ech13+Jgq1af+MOEV5GRHZPtLM4utw98vrdNKCN/nxKgAg4QU7tsf6ZwFAsjBx6tu3r8aMGaOVK1d6lrndbq1cuVLFxcU+tznxxBP11Vdfye12e5Z9+eWXysnJUd++fSMeM3xLTrKpvDRfkrp0eB3vy0vz/T6DIdTtI627+Dpj5h8AAIIf22P9swAgWXyr3syZM7VkyRI9+eSTWrdunaZPn66dO3fq8ssvlyRNmTJFZWVlnvLTp09Xc3OzbrzxRn355Zd6+eWXde+992rGjBlWHQL+bWJBjiomj1a2w/sr9GyHXRWTR5s+eyHU7SPNX3ydMfMPAAD7BTu2x/pnAcBmGIaln+4WLVrkeQDuqFGj9NBDD2ncuHGSpPHjx2vEiBFaunSpp/zq1at18803q6amRsOGDdMVV1yhWbNmed2+152WlhY5HA45nU5u24uAUJ/2HetPC3e5DS39R73ufnmdadk/XXW8ikcOikJUQPygD/aNdkFvFuzYHuufBdC7BNP/Wp44RRuDE0L115otuvHZGtNy/3PxKJ07ioczAweiD/aNdgEAawTT/8bVrHpALGDmHwAAgMRD4gQEiZl/AAAAEg+JExAkZv4BAABIPCROQA8w8w8AAEBiSbE6ACBeTSzI0Rn52cz8AwAAkABInIAQJCfZmHIcAAAgAXCrHgAAAACYIHECAAAAABMkTgAAAABggsQJAAAAAEyQOAEAAACACRInAAAAADDBdOQAAADwcLmNiD6jMNL1A5FC4gQAQAhcLpfmzJmjZcuWqbGxUUOHDtW0adN0++23y2bjwyDiS2Vtg+auqFODs82zLMdhV3lpviYW5MR8/UAkcaseAAAhWLBggSoqKrRo0SKtW7dOCxYs0H333aeHH37Y6tCAoFTWNmj6smqvpEaSGp1tmr6sWpW1DTFdPxBpJE4AAITg/fff17nnnquzzz5bI0aM0AUXXKAzzzxTVVVVVocGBMzlNjR3RZ0MH+s6ls1dUSeX21cJ6+sHooHECQCAEJxwwglauXKlvvzyS0nSp59+qvfee0+TJk3yu017e7taWlq8XoCVquqbu3wTdCBDUoOzTVX1zTFZPxAN/MYJANArtbW16eGHH9aqVavU1NQkt9vttb66ujos+5k9e7ZaWlp01FFHKTk5WS6XS/fcc48uvfRSv9vMmzdPc+fODcv+gXBoavWf1PSkXLTrB6KBxAkA0CtdccUVeu2113TBBReoqKgoYhM1/O///q+efvppPfPMMzrmmGNUU1Ojm266SUOHDtXUqVN9blNWVqaZM2d63re0tCg3Nzci8QGByEyzh7VctOsHooHECQDQK/3tb3/TK6+8ohNPPDGi+7n11ls1e/ZsXXzxxZKkH//4x9q4caPmzZvnN3FKTU1VampqROMCglGUl6Ech12Nzjafv0OyScp27J86PBbrB6KB3zgBAHqlYcOGKS0tLeL72bVrl5KSvIfT5OTkLrcGArEsOcmm8tJ8SfuTmAN1vC8vze/x85YiXT8QDSROAIBe6f7779esWbO0cePGiO6ntLRU99xzj15++WVt2LBBy5cv1wMPPKDzzz8/ovsFwm1iQY4qJo9WtsP7drlsh10Vk0eH/JylSNcPRJrNMIyEmvexpaVFDodDTqdT6enpVocDAAklmn3wd999p4suukjvvPOO+vfvrz59+nitb24Oz+xdra2tuuOOO7R8+XI1NTVp6NChuuSSS3TnnXeqb9++AdXB2IRY4nIbqqpvVlNrmzLT9t8+F85vgiJdPxCMYPpfEqcIireOId7iBcxwTceeaPbBJSUl2rRpk6644gplZWV1mRzC3++PrEDiBADWCKb/ZXKICKmsbdDcFXVezyzIcdhVXpofk19Fx1u8gBmuabz//vtavXq1CgsLrQ4FANAL8BunCKisbdD0ZdVdHvTW6GzT9GXVqqxtsCgy3+ItXsAM1zQk6aijjtLu3butDgMA0EuEnDgZhqEEu9uvWy63obkr6nxOtdmxbO6KOrncsdFm8RYvYIZrGh3mz5+vW265RW+99Za+//57tbS0eL0AAAhGjxOnP/zhDyooKJDdbpfdbldBQYF+//vfhzO2uFRV39zlf7kPZEhqcLapqj48P0oOVbzFC5jhmkaHiRMnavXq1Tr99NOVmZmpgQMHauDAgRowYIAGDhxodXgAgDjTo9843XnnnXrggQd0/fXXq7i4WJK0evVq3Xzzzdq0aZPuuuuusAYZT5pa/X9g60m5SIu3eAEzXNPosGrVKqtDAAD0Ij1KnCoqKrRkyRJdcsklnmXnnHOOjj32WF1//fUJnThlptnNCwVRLtLiLV7ADNc0Opx66qlWhwAA6EV6lDjt3btXxx13XJflY8aM0b59+0IOKp4V5WUox2FXo7PN528sbNr/oLeivIxoh+ZTvMULmOGaRod33nmn2/WnnHJKlCIBAPQGPUqcLrvsMlVUVOiBBx7wWr548WJdeumlYQksXiUn2VRemq/py6plk7w+uHU8QaS8ND9mniUTb/ECZrim0WH8+PFdlh34LCeXyxXFaAAA8S7kySGuvPJKXXnllfrxj3+sJUuWKCkpSTNnzvS8EtHEghxVTB6tbIf3rUDZDrsqJo+OuWfIxFu8gBmuaUjS9u3bvV5NTU2qrKzU2LFj9dprr1kdHgAgztiMHswlftpppwVWuc2mN998M+igIimaT2d3uQ1V1TerqbVNmWn7bw2K5f/ljrd4ATNc07Enmn2wP2+//bZmzpypNWvWWLJ/X2KhXQAgEQXT//boVj1mKgpMcpJNxSMHWR1GwOItXsAM1zR8ycrK0vr1660OAwAQZ3qUOAEAEOvWrl3r9d4wDDU0NGj+/PkaNWqUNUEBQeKbcyB2kDgBAHqlUaNGyWazqfMd6ccff7wef/xxi6ICAldZ26C5K+q8Huqd47CrvDSf32oCFiBxAgD0SvX19V7vk5KSNGTIENntPMMLsa+ytkHTl1V3eaxCo7NN05dVM9ENYAESJwBArzR8+HCtXLlSK1euVFNTk9xut9d6vnVCrHK5Dc1dUefzWXSG9j9aYe6KOp2Rn81te0AU9Xg6cgAAYtncuXN15plnauXKldq2bVuX6cmBWFVV3+x1e15nhqQGZ5uq6pujFxQAvnECAPROjz76qJYuXarLLrvM6lCAoDS1+k+aelIOQHjwjRMAoFfas2ePTjjhBKvDAIKWmRbY7/ACLQcgPEicAAC90pVXXqlnnnnG6jCAoBXlZSjHYZe/Xy/ZtH92vaK8jGiGBSQ8btUDAPRKbW1tWrx4sd544w0de+yx6tOnj9f6Bx54wKLIgO4lJ9lUXpqv6cuqZZO8JonoSKbKS/OZGAKIMhInAECvtHbtWs+Dbmtra73W2Wx84ERsm1iQo4rJo7s8xymb5zgBliFxAgD0SqtWrbI6BCAkEwtydEZ+tqrqm9XU2qbMtP235/FNE2ANEicAAIAYlZxkU/HIQVaHAUBMDgEAAAAApvjGCTiAy21wSwQAAAC6IHEC/q2ytqHLj3Bz+BEuAAAAxK16gKT9SdP0ZdVeSZMkNTrbNH1ZtSprGyyKDAAAALGAxAkJz+U2NHdFnddzMjp0LJu7ok4ut68SAAAASAQkTkh4VfXNXb5pOpAhqcHZpqr65ugFBQAAgJhC4oSE19TqP2nqSTkAAAD0PiROSHiZafawlgMAAEDvQ+KEhFeUl6Ech13+Jh23af/sekV5GdEMCwAAADGExAkJLznJpvLSfEnqkjx1vC8vzed5TgAAAAmMxAmQNLEgRxWTRyvb4X07XrbDrorJo3mOEwAAQILjAbjAv00syNEZ+dmqqm9WU2ubMtP2357HN00AgHBxuQ3GmRCZtWGo6wF/YiJxeuSRR7Rw4UI1NjaqsLBQDz/8sIqKiky3e/bZZ3XJJZfo3HPP1Ysvvhj5QNHrJSfZVDxykNVhAIgzW7Zs0axZs/Tqq69q165dOvzww/XEE0/ouOOOszo0xJDK2gbNXVHn9QiMHIdd5aX53NkQILM2DHU90B3Lb9V77rnnNHPmTJWXl6u6ulqFhYWaMGGCmpqaut1uw4YN+uUvf6mTTz45SpECANDV9u3bdeKJJ6pPnz569dVXVVdXp/vvv18DBw60OjTEkMraBk1fVt3luYGNzjZNX1atytoGiyKLH2ZtOO+VupDWcw5gxmYYhmFlAOPGjdPYsWO1aNEiSZLb7VZubq6uv/56zZ492+c2LpdLp5xyin7xi1/o3Xff1Y4dOwL+xqmlpUUOh0NOp1Pp6enhOgwAQAB6Yx88e/Zs/eMf/9C7777b4zp6Y7vgP1xuQycteNPvw9Zt2v+b2vdm/ZRbxvwwa0NJSrJJ7m4+1Xa3nnOQuILpfy39xmnPnj1as2aNSkpKPMuSkpJUUlKi1atX+93urrvuUmZmpq644grTfbS3t6ulpcXrBQBAuLz00ks67rjjdOGFFyozM1M/+clPtGTJkm63YWxKLFX1zd1+4DckNTjbVFXfHL2g4oxZG0rdJ01m6zkHCISlidO2bdvkcrmUlZXltTwrK0uNjY0+t3nvvff0hz/8wXRQ6jBv3jw5HA7PKzc3N+S4AQDo8M0336iiokJHHHGE/v73v2v69Om64YYb9OSTT/rdhrEpsTS1dv+BP9hyiShabcM5QHcs/41TMFpbW3XZZZdpyZIlGjx4cEDblJWVyel0el6bN2+OcJQAgETidrs1evRo3XvvvfrJT36iq6++WldddZUeffRRv9swNiWWzDS7eaEgyiWiaLUN5wDdsXRWvcGDBys5OVlbt271Wr5161ZlZ2d3Kf/1119rw4YNKi0t9Sxzu92SpJSUFK1fv14jR4702iY1NVWpqakRiB4AACknJ0f5+fley44++mj95S9/8bsNY1NiKcrLUI7DrkZnm3zdLdbx+5qivIxohxY3zNpQ2v8bJsNQj9ZzDhAIS79x6tu3r8aMGaOVK1d6lrndbq1cuVLFxcVdyh911FH67LPPVFNT43mdc845Ou2001RTU8OtDgCAqDvxxBO1fv16r2Vffvmlhg8fblFEiDXJSTaVl+5PrjtPO9Dxvrw0n0kJumHWhjZJV52c1+P1EucA5iy/VW/mzJlasmSJnnzySa1bt07Tp0/Xzp07dfnll0uSpkyZorKyMkmS3W5XQUGB12vAgAFKS0tTQUGB+vbta+WhAAAS0M0336wPPvhA9957r7766is988wzWrx4sWbMmGF1aIghEwtyVDF5tLId3reCZTvsqpg8mmcIBcCsDcvOyg9pPecAZix/AO7PfvYzfffdd7rzzjvV2NioUaNGqbKy0jNhxKZNm5SUZHl+BwCAT2PHjtXy5ctVVlamu+66S3l5eXrwwQd16aWXWh0aYszEghydkZ+tqvpmNbW2KTNt/61hfMsROLM2DHU90B3Ln+MUbTwrAwCsQx/sG+0CANaIm+c4AQAAAEA8IHECAAAAABMkTgAAAABggsQJAAAAAEyQOAEAAACACRInAAAAADBB4gQAAAAAJkicAAAAAMAEiRMAAAAAmCBxAgAAAAATJE4AAAAAYILECQAAAABMpFgdAKzjchuqqm9WU2ubMtPsKsrLUHKSLe72AQCAP6GOQ523HzN8oNZs3O63vlgb98ziibV4YwFtAn9InBJUZW2D5q6oU4OzzbMsx2FXeWm+JhbkxM0+AADwJ9RxyNf2STbJbfynzIH1xdq4ZxZPrMUbC2gTdMdmGIZhXqz3aGlpkcPhkNPpVHp6utXhWKKytkHTl1Wr84nv+L+UismjQ+4corEPAPGHPtg32iX8Qh2H/G3fWUd9V5+Sp8Xv1MfMuGd2/LEWbyzgs0tiCqb/5TdOCcblNjR3RZ3PgaBj2dwVdXK5e55PR2MfAAD4E+o41N32vuozJC15t2sSEuj+ws3s+GMt3ljAZxcEgsQpwVTVN3t9/dyZIanB2aaq+uaY3gcAAP6EOg6Zbe9Ld5+noz3uBRJ/LMUbC/jsgkCQOCWYptbABoJAy1m1DwAA/Al1HIrU+BStcS9c+0mkcZrPLggEiVOCyUyzh7WcVfsAAMCfUMehSI1P0Rr3wrWfRBqn+eyCQJA4JZiivAzlOOzyN6mmTftnjynKy4jpfQAA4E+o45DZ9r4k2RQz414g8cdSvLGAzy4IBIlTgklOsqm8NF9S1w6z4315aX5IzyuIxj4AAPAn1HGou+07s/37ddXJeT3eX7iZHX+sxRsL+OyCQJA4JaCJBTmqmDxa2Q7vr5uzHfawTbUZjX0AAOBPqOOQv+07f27uqK/srPyYGvfMjj/W4o0FfHaBGZ7jlMCi8WRsnr4N4ED0wb7RLpET6jjUefsxwwdqzcbtfuuLtXHPLJ5YizcW0CaJJZj+l8QJABA19MG+0S4AYA0egAsAAAAAYUTiBAAAAAAmSJwAAAAAwASJEwAAAACYIHECAAAAABMkTgAAAABggsQJAAAAAEyQOAEAAACACRInAAAAADBB4gQAAAAAJkicAAAAAMBEitUBAAAAWMXlNlRV36ym1jZlptlVlJeh5CSb1WHFlGDbKNxtalZfqOsjvX/0HiROAACE0fz581VWVqYbb7xRDz74oNXhoBuVtQ2au6JODc42z7Ich13lpfmaWJBjYWSxI9g2CnebmtUX6vpI7x+9i80wDMPqIKKppaVFDodDTqdT6enpVocDAAmlt/fBH330kS666CKlp6frtNNOCzhx6u3tEosqaxs0fVm1On8I6vieoGLy6IT/4BtsG4W7Tc3qu/qUPC1+p77H683iCXX/XEPxIZj+l984AQAQBj/88IMuvfRSLVmyRAMHDrQ6HHTD5TY0d0Vdlw+8kjzL5q6ok8udUP+37CXYNgp3m5rVZ0ha8m7XpCXQ9WbxhLp/s/oRn0icAAAIgxkzZujss89WSUmJadn29na1tLR4vRA9VfXNXrdWdWZIanC2qaq+OXpBxZhg2yjcbWpWnySZ5STdrTeLJ9T9cw31TvzGCQCAED377LOqrq7WRx99FFD5efPmae7cuRGOCv40tXb/gTjYcr1RsG0U7jaNVtv720+49p/I11BvxDdOAACEYPPmzbrxxhv19NNPy263B7RNWVmZnE6n57V58+YIR4kDZaYFdp4CLdcbBdtG4W7TaLW9v/2Ea/+JfA31RiROAACEYM2aNWpqatLo0aOVkpKilJQUvf3223rooYeUkpIil8vVZZvU1FSlp6d7vRA9RXkZynHY5W/CaJv2z4xWlJcRzbBiSrBtFO42NatPkpJs6vF6s3hC3T/XUO9E4gQAQAhOP/10ffbZZ6qpqfG8jjvuOF166aWqqalRcnKy1SGik+Qkm8pL8yV1/eDb8b68ND+hn8UTbBuFu03N6rNJuurkvB6vN4sn1P2b1Y/4ROIEAEAI0tLSVFBQ4PU66KCDNGjQIBUUFFgdHvyYWJCjismjle3wvpUq22FnGul/C7aNwt2mZvWVnZUf0nqzeELdP9dQ78NznAAAUZMoffD48eM1atQonuMUB1xuQ1X1zWpqbVNm2v5bq/iWwFuwbRTuNjWrL9T1kd4/Ylsw/S+JEwAgauiDfaNdAMAaPAAXAAAAAMKIxAkAAAAATJA4AQAAAIAJEicAAAAAMEHiBAAAAAAmSJwAAAAAwASJEwAAAACYIHECAAAAABMkTgAAAABggsQJAAAAAEyQOAEAAACAiRSrAwAAAOjgchuqqm9WU2ubMtPsKsrLUHKSLWrlg60PiYdrJHHFROL0yCOPaOHChWpsbFRhYaEefvhhFRUV+Sy7ZMkSPfXUU6qtrZUkjRkzRvfee6/f8gAAID5U1jZo7oo6NTjbPMtyHHaVl+ZrYkFOxMsHWx8SD9dIYrP8Vr3nnntOM2fOVHl5uaqrq1VYWKgJEyaoqanJZ/m33npLl1xyiVatWqXVq1crNzdXZ555prZs2RLlyAEAQLhU1jZo+rJqrw+kktTobNP0ZdWqrG2IaPl5r9QFVR8ST7DXHHofm2EYhpUBjBs3TmPHjtWiRYskSW63W7m5ubr++us1e/Zs0+1dLpcGDhyoRYsWacqUKablW1pa5HA45HQ6lZ6eHnL8AIDA0Qf7lujt4nIbOmnBm10+kHawScp22PXerJ8qOckW9vKSlGST3H4+EXWuD4kn2GsO8SOY/tfSb5z27NmjNWvWqKSkxLMsKSlJJSUlWr16dUB17Nq1S3v37lVGRobP9e3t7WppafF6AQCA2FFV39xtUmNIanC2qaq+OSLlJf9Jk6/6kHiCvebQO1maOG3btk0ul0tZWVley7OystTY2BhQHbNmzdLQoUO9kq8DzZs3Tw6Hw/PKzc0NOW4AABA+Ta3dJzWdy0WqfKD1IfEEe82hd7L8N06hmD9/vp599lktX75cdrvdZ5mysjI5nU7Pa/PmzVGOEgAAdCczzfcY7q9cpMoHWh8ST7DXHHonSxOnwYMHKzk5WVu3bvVavnXrVmVnZ3e77f/7f/9P8+fP12uvvaZjjz3Wb7nU1FSlp6d7vQAAQOwoystQjsMuf78MsWn/zGVFeRkRKS/t/41ToPUh8QR7zaF3sjRx6tu3r8aMGaOVK1d6lrndbq1cuVLFxcV+t7vvvvt09913q7KyUscdd1w0QgUAABGSnGRTeWm+pK7JS8f78tJ8z4/uw13eJumqk/MCrg+JJ9hrDr2T5bfqzZw5U0uWLNGTTz6pdevWafr06dq5c6cuv/xySdKUKVNUVlbmKb9gwQLdcccdevzxxzVixAg1NjaqsbFRP/zwg1WHAAAAQjSxIEcVk0cr2+F9q1O2w66KyaO7PCMn3OXLzsoPqj4knmCvOfQ+lk9HLkmLFi3yPAB31KhReuihhzRu3DhJ0vjx4zVixAgtXbpUkjRixAht3LixSx3l5eWaM2eO6b4SfcpXALASfbBvtMt/uNyGquqb1dTapsy0/bc+dfe/+OEuH2x9SDxcI71LMP1vTCRO0cTgBADWoQ/2jXYBAGvEzXOcAAAAACAekDgBAAAAgAkSJwAAAAAwQeIEAAAAACZInAAAAADABIkTAAAAAJggcQIAAAAAEyROAAAAAGCCxAkAAAAATJA4AQAAAIAJEicAAAAAMEHiBAAAAAAmUqwOAAAARI/LbaiqvllNrW3KTLOrKC9DyUk2q8PyK97i7cws/ng/vt4g3Oegc31jhg/Umo3b/b7nnMcPEicAAEI0b948vfDCC/riiy/Ur18/nXDCCVqwYIGOPPJIq0PzUlnboLkr6tTgbPMsy3HYVV6ar4kFORZG5lu8xduZWfzxfny9QbjPga/6kmyS25Df95zz+GEzDMMwL9Z7tLS0yOFwyOl0Kj093epwACCh9NY+eOLEibr44os1duxY7du3T7/+9a9VW1ururo6HXTQQabbR6NdKmsbNH1ZtToP+h3/z10xeXRMfXCLt3g7M4v/6lPytPid+rg9vt4g3NeYv/rMcM6tFUz/y2+cAAAIUWVlpaZNm6ZjjjlGhYWFWrp0qTZt2qQ1a9ZYHZqk/bcOzV1R5/MDXceyuSvq5HLHxv+lxlu8nZnFb0ha8m7XpKljvRTbx9cbhPsa664+M5zz+EHiBABAmDmdTklSRkaGz/Xt7e1qaWnxekVSVX2z161DnRmSGpxtqqpvjmgcgYq3eDszi1/yvlWrs1g/vt4g3NdYIOe8O5zz+EDiBABAGLndbt1000068cQTVVBQ4LPMvHnz5HA4PK/c3NyIxtTUGtgHukDLRVq8xdtZuOKK1ePrDcJ9jXHOEwOJEwAAYTRjxgzV1tbq2Wef9VumrKxMTqfT89q8eXNEY8pMs4e1XKTFW7ydhSuuWD2+3iDc1xjnPDGQOAEAECbXXXed/va3v2nVqlU65JBD/JZLTU1Venq61yuSivIylOOwy9+Exzbtn9mrKM/3rYXRFm/xdmYWv7R/ZrV4Pb7eINzXWCDnvDuc8/hA4gQAQIgMw9B1112n5cuX680331ReXp7VIXlJTrKpvDRfUtcP6x3vy0vzY+ZZMvEWb2dm8dskXXVynt/1UmwfX28Q7musu/rMcM7jB4kTAAAhmjFjhpYtW6ZnnnlGaWlpamxsVGNjo3bv3m11aB4TC3JUMXm0sh3etwJlO+wxOQ1yvMXbmVn8ZWflx/Xx9Qbhvsb81dc5F+r8nnMeP3iOEwAganprH2yz+f5f4ieeeELTpk0z3T6a7eJyG6qqb1ZTa5sy0/bfGhTL/8sdb/F2ZhZ/vB9fbxDuc9C5vjHDB2rNxu1+33POrRVM/0viBACIGvpg32gXALAGD8AFAAAAgDAicQIAAAAAEyROAAAAAGCCxAkAAAAATJA4AQAAAIAJEicAAAAAMEHiBAAAAAAmSJwAAAAAwASJEwAAAACYIHECAAAAABMkTgAAAABgIsXqAAAAQOxwuQ1V1TerqbVNmWl2jRk+UGs2bvf7vigvQ8lJtrDtL9bqi7X9ofcz+xs0u8aC3Z5rOHAkTgAAQJJUWduguSvq1OBs8yxLskluQ37f5zjsKi/N18SCnLDsL5bqi7X9ofcL5G+wu2ss2O25hoNjMwzDMC/We7S0tMjhcMjpdCo9Pd3qcAAgodAH+xYL7VJZ26Dpy6oV7IeCjv+Xrpg8OqgPWv72Fyv1xdr+0PsF+jfo7xoLdvurT8nT4nfqE/4aDqb/5TdOAAAkOJfb0NwVdUEnTZI828xdUSeXO7AauttfLNQXa/tD7xfM36CvayzY7Q1JS97tmjT5qx/7kTgBAJDgquqbvW7VCZYhqcHZpqr65rDsz+r6Ym1/6P2C/RvsfI315G+4u5yIa9g3EicAABJcU2vPk6ae1BPr5WJtf+j9enqtdGwXqWuNa9gbiRMAAAkuM80e1XpivVys7Q+9X0+vlY7tInWtcQ17I3ECACDBFeVlKMdhV08nILZp/0xcRXkZYdmf1fXF2v7Q+wX7N9j5GuvJ33CSTVzDQSJxAgAgwSUn2VRemi/J/wcpfzrKl5fmB/zsl+72Fwv1xdr+0PsF8zfo6xoLdnubpKtOzvNZnmvYPxInAACgiQU5qpg8WtkO71tzOn9u6vw+22Hv0bTF/vYXK/XF2v7Q+wX6N+jvGgt2+7Kz8rmGg8RznAAAUUMf7FsstYvLbaiqvllNrW3KTLNrzPCBWrNxu9/3RXkZIf2vdOf9xVp9sbY/9H5mf4Nm11iw2yf6NRxM/0viBACIGvpg32gXALAGD8AFAAAAgDAicQIAAAAAEyROAAAAAGCCxAkAAAAATJA4AQAAAIAJEicAAAAAMEHiBAAAAAAmSJwAAAAAwASJEwAAAACYIHECAAAAABMkTgAAAABgIsXqAOKJy23og6+/1z++/k5btu+WYRja9sMete1zyZ6SrEEH9dX3O//zfvDBqbLZ/rO9WflIv+8cj5nO8Qa7fazsI5z7s9lsynKkqnX3PjW1tKttr0v5Q9P1Q3vvfV8wzCFH/z76oqFFW3a0adgAu47MTtP6xta4e/+v7buVmpLk+dtod7l1yIB+XcofnZOu7bv36LPNzm7/vpp37VW/vknKTLPr4NQUff5ti9f72i3Obst3Xh/N/sBX/yCZ92lJSTYNG9hPJ4wcrOMPG6TkpAj+wSIgHWPTu181aa3JNevrug/276L2X91f58Feh2Z/R8H+XZltH8z7dHsfNTh3W32KEWFWj+279+xTxgF/G/36pOiYIMp3/tsKZPtBB/T5Zn1CrIxT3fVpuQP7679HH6ITDh8c0XHJZhiGEbHaA/TII49o4cKFamxsVGFhoR5++GEVFRX5Lf/nP/9Zd9xxhzZs2KAjjjhCCxYs0FlnnRXQvlpaWuRwOOR0OpWenh5wjJW1DZr9wmfasWtvwNsAQG82oH8fzf+vH2tiQU7A2/S0D44HwY5lB2JsAoDQHdQ3WfdfVBixccnyW/Wee+45zZw5U+Xl5aqurlZhYaEmTJigpqYmn+Xff/99XXLJJbriiiv0ySef6LzzztN5552n2traiMVYWduga5ZVMzABwAF27Nqra5ZVq7K2wepQLBfsWBYOjE0A4G3nHldExyXLv3EaN26cxo4dq0WLFkmS3G63cnNzdf3112v27Nldyv/sZz/Tzp079be//c2z7Pjjj9eoUaP06KOPmu4v2P/Vc7kNnTh/pRpb2oM4KgBIHDkOu96b9dOAbo/ord84BTuWdcbYBADhE6lxydJvnPbs2aM1a9aopKTEsywpKUklJSVavXq1z21Wr17tVV6SJkyY4Ld8e3u7WlpavF7BqKpvZmACgG40ONtUVd9sdRiW6clYxtgEAJETqXHJ0sRp27ZtcrlcysrK8lqelZWlxsZGn9s0NjYGVX7evHlyOByeV25ublAxNrW2BVUeABJRIveVPRnLGJsAILIi0U9a/hunSCsrK5PT6fS8Nm/eHNT2mWn2CEUGAL0HfWVwGJsAILIi0U9aOh354MGDlZycrK1bt3ot37p1q7Kzs31uk52dHVT51NRUpaam9jjGorwMZaencksEAPiR47CrKC/D6jAs05OxjLEJACInUuOSpd849e3bV2PGjNHKlSs9y9xut1auXKni4mKf2xQXF3uVl6TXX3/db/lQJSfZNOecYyJSNwD0BuWl+Qn9PKeejGWhYmwCAP8iNS5ZfqvezJkztWTJEj355JNat26dpk+frp07d+ryyy+XJE2ZMkVlZWWe8jfeeKMqKyt1//3364svvtCcOXP08ccf67rrrotYjBMLcvTo5NEa0L9PxPYBAPFmYP8+enTy6KCel9FbmY1lkcDYBADeDkpNjui4ZOmtetL+6cW/++473XnnnWpsbNSoUaNUWVnp+ZHtpk2blJT0n/zuhBNO0DPPPKPbb79dv/71r3XEEUfoxRdfVEFBQUTjnFiQozPys/XB19/rH19/py3bd8sw9j9x2d8TjQcfnCrbAcmuWflIv+8cj5nO8Qa7fazsI5z7s/rp4la8LxjmkKN/H33R0KItO9o0bIDd83TxeHvf+WnonZ+W3lH+6Jx0bd+9R59tdnb799W8a6/69U1SZppdB6em6PNvW7ze125xdlu+83orn8A++ICnyHdXJinJpmED++mEkYN1/GGDEvqbpgOZjWWRcuDY9O5XTVprcs36uu6D/buo/Vf313mw16HZ31Gwf1dm2wfzPt3eRw3O3RE9h7Ce1WP77j37lHHA30a/Pik6Jojynf+2Atl+0AF9vlmfECvjVHd9Wu7A/vrv0YfohMMHR3Rcsvw5TtHWW58hAgDxgD7YN9oFAKwRN89xAgAAAIB4QOIEAAAAACZInAAAAADABIkTAAAAAJggcQIAAAAAEyROAAAAAGCCxAkAAAAATJA4AQAAAIAJEicAAAAAMEHiBAAAAAAmUqwOINoMw5AktbS0WBwJACSejr63oy/GfoxNAGCNYMalhEucWltbJUm5ubkWRwIAiau1tVUOh8PqMGIGYxMAWCuQcclmJNh/+7ndbn377bdKS0uTzWYLevuWlhbl5uZq8+bNSk9Pj0CEvR9tGBraLzS0X2hCbT/DMNTa2qqhQ4cqKYm7xTswNlmL9gsdbRga2i80obRfMONSwn3jlJSUpEMOOSTketLT07mwQ0Qbhob2Cw3tF5pQ2o9vmrpibIoNtF/oaMPQ0H6h6Wn7BTou8d99AAAAAGCCxAkAAAAATJA4BSk1NVXl5eVKTU21OpS4RRuGhvYLDe0XGtovNnFeQkP7hY42DA3tF5potV/CTQ4BAAAAAMHiGycAAAAAMEHiBAAAAAAmSJwAAAAAwASJEwAAAACYIHEK0iOPPKIRI0bIbrdr3LhxqqqqsjqkmPDOO++otLRUQ4cOlc1m04svvui13jAM3XnnncrJyVG/fv1UUlKif/7zn15lmpubdemllyo9PV0DBgzQFVdcoR9++CGKR2GdefPmaezYsUpLS1NmZqbOO+88rV+/3qtMW1ubZsyYoUGDBunggw/Wf//3f2vr1q1eZTZt2qSzzz5b/fv3V2Zmpm699Vbt27cvmodiiYqKCh177LGeB98VFxfr1Vdf9ayn7YIzf/582Ww23XTTTZ5ltGFsY2zyjbGp5xiXQsfYFF4xMTYZCNizzz5r9O3b13j88ceNzz//3LjqqquMAQMGGFu3brU6NMu98sorxm233Wa88MILhiRj+fLlXuvnz59vOBwO48UXXzQ+/fRT45xzzjHy8vKM3bt3e8pMnDjRKCwsND744APj3XffNQ4//HDjkksuifKRWGPChAnGE088YdTW1ho1NTXGWWedZRx66KHGDz/84ClzzTXXGLm5ucbKlSuNjz/+2Dj++OONE044wbN+3759RkFBgVFSUmJ88sknxiuvvGIMHjzYKCsrs+KQouqll14yXn75ZePLL7801q9fb/z61782+vTpY9TW1hqGQdsFo6qqyhgxYoRx7LHHGjfeeKNnOW0Yuxib/GNs6jnGpdAxNoVPrIxNJE5BKCoqMmbMmOF573K5jKFDhxrz5s2zMKrY03lwcrvdRnZ2trFw4ULPsh07dhipqanGn/70J8MwDKOurs6QZHz00UeeMq+++qphs9mMLVu2RC32WNHU1GRIMt5++23DMPa3V58+fYw///nPnjLr1q0zJBmrV682DGP/B4SkpCSjsbHRU6aiosJIT0832tvbo3sAMWDgwIHG73//e9ouCK2trcYRRxxhvP7668app57qGZxow9jG2BQYxqbQMC6FB2NT8GJpbOJWvQDt2bNHa9asUUlJiWdZUlKSSkpKtHr1agsji3319fVqbGz0ajuHw6Fx48Z52m716tUaMGCAjjvuOE+ZkpISJSUl6cMPP4x6zFZzOp2SpIyMDEnSmjVrtHfvXq82POqoo3TooYd6teGPf/xjZWVlecpMmDBBLS0t+vzzz6MYvbVcLpeeffZZ7dy5U8XFxbRdEGbMmKGzzz7bq60krr9YxtjUc4xNwWFcCg1jU8/F0tiU0sNjSDjbtm2Ty+XyanhJysrK0hdffGFRVPGhsbFRkny2Xce6xsZGZWZmeq1PSUlRRkaGp0yicLvduummm3TiiSeqoKBA0v726du3rwYMGOBVtnMb+mrjjnW93Weffabi4mK1tbXp4IMP1vLly5Wfn6+amhraLgDPPvusqqur9dFHH3VZx/UXuxibeo6xKXCMSz3H2BSaWBubSJyAGDNjxgzV1tbqvffeszqUuHLkkUeqpqZGTqdTzz//vKZOnaq3337b6rDiwubNm3XjjTfq9ddfl91utzocADGGcannGJt6LhbHJm7VC9DgwYOVnJzcZaaOrVu3Kjs726Ko4kNH+3TXdtnZ2WpqavJav2/fPjU3NydU+1533XX629/+plWrVumQQw7xLM/OztaePXu0Y8cOr/Kd29BXG3es6+369u2rww8/XGPGjNG8efNUWFio//mf/6HtArBmzRo1NTVp9OjRSklJUUpKit5++2099NBDSklJUVZWFm0Yoxibeo6xKTCMS6FhbOq5WBybSJwC1LdvX40ZM0YrV670LHO73Vq5cqWKi4stjCz25eXlKTs726vtWlpa9OGHH3rarri4WDt27NCaNWs8Zd5880253W6NGzcu6jFHm2EYuu6667R8+XK9+eabysvL81o/ZswY9enTx6sN169fr02bNnm14WeffeY1yL/++utKT09Xfn5+dA4khrjdbrW3t9N2ATj99NP12WefqaamxvM67rjjdOmll3r+TRvGJsamnmNs6h7jUmQwNgUuJsemUGa5SDTPPvuskZqaaixdutSoq6szrr76amPAgAFeM3UkqtbWVuOTTz4xPvnkE0OS8cADDxiffPKJsXHjRsMw9k/5OmDAAOOvf/2rsXbtWuPcc8/1OeXrT37yE+PDDz803nvvPeOII45IiClfDcMwpk+fbjgcDuOtt94yGhoaPK9du3Z5ylxzzTXGoYcearz55pvGxx9/bBQXFxvFxcWe9R1Tbp555plGTU2NUVlZaQwZMiQhpi2dPXu28fbbbxv19fXG2rVrjdmzZxs2m8147bXXDMOg7XriwJmLDIM2jGWMTf4xNvUc41LoGJvCz+qxicQpSA8//LBx6KGHGn379jWKioqMDz74wOqQYsKqVasMSV1eU6dONQxj/7Svd9xxh5GVlWWkpqYap59+urF+/XqvOr7//nvjkksuMQ4++GAjPT3duPzyy43W1lYLjib6fLWdJOOJJ57wlNm9e7dx7bXXGgMHDjT69+9vnH/++UZDQ4NXPRs2bDAmTZpk9OvXzxg8eLBxyy23GHv37o3y0UTfL37xC2P48OFG3759jSFDhhinn366Z2AyDNquJzoPTrRhbGNs8o2xqecYl0LH2BR+Vo9NNsMwjOC/pwIAAACAxMFvnAAAAADABIkTAAAAAJggcQIAAAAAEyROAAAAAGCCxAkAAAAATJA4AQAAAIAJEicAAAAAMEHiBAAAAAAmSJwAAAAAwASJExBHxo8fr5tuusnqMAAA8GBsQqIgcQIAAAAAEzbDMAyrgwBgbtq0aXryySe9ltXX12vEiBHWBAQASHiMTUgkJE5AnHA6nZo0aZIKCgp01113SZKGDBmi5ORkiyMDACQqxiYkkhSrAwAQGIfDob59+6p///7Kzs62OhwAABibkFD4jRMAAAAAmCBxAgAAAAATJE5AHOnbt69cLpfVYQAA4MHYhERB4gTEkREjRujDDz/Uhg0btG3bNrndbqtDAgAkOMYmJAoSJyCO/PKXv1RycrLy8/M1ZMgQbdq0yeqQAAAJjrEJiYLpyAEAAADABN84AQAAAIAJEicAAAAAMEHiBAAAAAAmSJwAAAAAwASJEwAAAACYIHECAAAAABMkTgAAAABggsQJAAAAAEyQOAEAAACACRInAAAAADBB4gQAAAAAJv4/7bFFQ3ZdbIAAAAAASUVORK5CYII=", "text/plain": [ "
" ] }, "execution_count": 117, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "\n", "# This is an array that contains all possible integer values for the hours of study \n", "# (all possible *t* values) in ascending order.\n", "# [0 1 2 3 4 ... 400]\n", "ts: np.ndarray = np.arange(0, 400)\n", " \n", "# TASK: Fill this array with the empirically estimated conditional probabilities\n", "# p(x=qualified|t) that a student has achieved the necessary qualifications \n", "# under the condition of having studied *exactly* t hours.\n", "\n", "# HINT: If there are no students having exactly studied a given number of hours \n", "# Leave the value in this array at zero.\n", "\n", "p_y_t: np.ndarray = np.zeros_like(ts, dtype=float)\n", " \n", "num_t: np.ndarray = np.zeros_like(ts, dtype=float)\n", "\n", "for i,t in enumerate(ts):\n", " p_y_t[i] = (np.sum(df.loc[df.hours_study == t][\"qualified\"]) / np.sum(df.hours_study == t))\n", " num_t[i] = np.sum(df.hours_study == t)\n", "\n", "p_y_t[np.isnan(p_y_t)] = 0.0\n", "\n", "# ~ plotting the solution\n", "\n", "fig, (ax1, ax2) = plt.subplots(\n", " ncols=2,\n", " nrows=1,\n", " figsize=(10, 5)\n", ")\n", "\n", "# p(x=qualified) over different t values\n", "ax1.scatter(ts, p_y_t)\n", "ax1.set_ylabel('p')\n", "ax1.set_xlabel('t')\n", "#ax1.legend()\n", "ax1.set_title('$P(x=\\mathrm{qualified}\\;|\\;t)$')\n", "\n", "# number of students having studied exactly t hours\n", "ax2.scatter(ts, num_t)\n", "ax2.set_ylabel('num')\n", "ax2.set_xlabel('t')\n", "#ax2.legend()\n", "ax2.set_title('Number of students for $t$ hours of study')\n", "\n", "fig" ] }, { "cell_type": "code", "execution_count": 118, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "e933d29e3689f9c86e278e77649dc55a", "grade": true, "grade_id": "test-5-7-hours-study", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# ID: test-5-7-hours-study - possible points: 3\n", "\n", "assert isinstance(p_y_t, np.ndarray), 'solution must be array'\n", "assert len(p_y_t) == 400, 'solution is missing elements'\n", "assert np.max(p_y_t) > 0.1, 'solution is likely still empty'\n", "assert np.isclose(p_y_t[0], 0.0), 'solution is likely incorrect'\n", "\n", "# NOTE: The hidden tests will check some selected values from the array to match\n", "# the expected values with a tolerance of 2 decimals\n", "\n", "# HINT: Only the \"p_y_t\" array is relevant to the solution.\n", "\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "1a1daae995821e36cfe4b2f5023f4203", "grade": false, "grade_id": "cell-d644cd5dbaa35a77", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Unfortunately, the resulting plot doesn't seem to be very informative. The main problem is that there is simply not enough data available to properly justify an empirical approximation of these probabilities. For most exact values of $t$ there exists not even a single sample in the given student survey dataset to estimate from and there seem to be no more than 20 students for any value of $t$.\n", "\n", "So to improve these results we'd need a *much* larger dataset size, such that there exists a reasonable number of students for each possible number of hours $t$. *Alternatively*, we can slightly modify the problem. Instead of asking for the probability for each exact value of $t$ we can consider a certain range of values. So for example one could calculate the probability $P(x=\\mathrm{qualified}\\;|\\;\\tau < t < \\tau + 50)$ of having the necessary qualification when having studied between $\\tau$ and $\\tau + 50$ hours. In this formulation, a factor $\\times 50$ more samples are used for the empirical estimation of each probability." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "9f1b6125b0037ff9b34263a9d3ae46a3", "grade": false, "grade_id": "cell-7a59e5326140d5bc", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**🛠️ Task 5.8 (2 points)** In this exercise we want to determine $P(y=\\mathrm{qualified}|\\tau < t < \\tau + 50)$ for all possible values of $\\tau \\in \\{0, \\dots, 400\\}$. Here we use a possible range of values instead of exact values for $t$ to increase the effective sample size to get better probability estimates. Calculate the conditional probabilities from the student survey dataset and use them to fill the ``p_y_tau`` array." ] }, { "cell_type": "code", "execution_count": 120, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "6e882c394b4e23fbbdfc13aa8b0be909", "grade": false, "grade_id": "cell-a9717e93a1e26f13", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/tmp/ipykernel_22/4013558153.py:16: RuntimeWarning: invalid value encountered in scalar divide\n", " p_y_tau[i] = (np.sum(df.loc[(df.hours_study > t)*(df.hours_study < (t+50))][\"qualified\"]) / np.sum((df.hours_study > t)*(df.hours_study < (t+50))))\n", "No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.\n" ] }, { "data": { "text/plain": [ "Text(0.5, 1.0, '$P(x=\\\\mathrm{qualified}\\\\;|\\\\; \\\\tau < t < \\\\tau + 50)$')" ] }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# This is an array that contains all possible integer values for the hours of study \n", "# (all possible *t* values) in ascending order.\n", "# [0 1 2 3 4 ... 400]\n", "ts: np.ndarray = np.arange(0, 400)\n", " \n", "# TASK: Fill this array with the empirically estimated conditional probabilities\n", "# p(x=qualified|tau < t < tau + 50) that a student has achieved the necessary qualifications \n", "# under the condition of having studied between tau and tau+50 hours.\n", "\n", "# HINT: If there are no students having exactly studied a given number of hours \n", "# Leave the value in this array at zero.\n", "\n", "p_y_tau: np.ndarray = np.zeros_like(ts, dtype=float)\n", "\n", "for i,t in enumerate(ts):\n", " p_y_tau[i] = (np.sum(df.loc[(df.hours_study > t)*(df.hours_study < (t+50))][\"qualified\"]) / np.sum((df.hours_study > t)*(df.hours_study < (t+50))))\n", "\n", "p_y_tau[np.isnan(p_y_tau)] = 0.0\n", "\n", "\n", "fig, ax = plt.subplots(\n", " ncols=1,\n", " nrows=1,\n", " figsize=(10, 5)\n", ")\n", "\n", "# p(x=qualified) over different tau values\n", "ts_ = [t for i, t in enumerate(ts) if p_y_tau[i] > 0]\n", "ps_ = [p for i, p in enumerate(p_y_tau) if p_y_tau[i] > 0]\n", "ax.scatter(ts_, ps_, alpha=0.3)\n", "ax.plot(ts_, ps_)\n", "ax.set_ylabel('p')\n", "ax.set_xlabel(r'$\\tau$')\n", "ax.legend()\n", "ax.set_title(r'$P(x=\\mathrm{qualified}\\;|\\; \\tau < t < \\tau + 50)$')" ] }, { "cell_type": "code", "execution_count": 121, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "6993e41348f87f26f443fc1331862d70", "grade": true, "grade_id": "test-5-8-smoothing", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# ID: test-5-8-smoothing - possible points: 2\n", "\n", "assert isinstance(p_y_tau, np.ndarray), 'solution must be array'\n", "assert len(p_y_tau) == 400, 'solution is missing elements'\n", "assert np.max(p_y_tau) > 0.1, 'solution is likely still empty'\n", "assert np.isclose(p_y_tau[0], 0.2), 'solution is likely incorrect'\n", "\n", "# NOTE: The hidden tests will test a selection of 5 random elements of the p_y_tau array \n", "# against their true values with a tolerance of 2 decimals.\n", "\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "f61d2aaeddc065648dabce348397558a", "grade": false, "grade_id": "cell-0e9d655bc6a60b86", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "The resulting plot is much more informative! Due to the increased sample size, there are no more missing values and we can clearly see a trend. For $\\tau = 0$ (study between 0 and 50 hours), the probability starts of relatively low and then rises more or less steadily for an increasing amount studying until it saturates to almost $P(x=\\mathrm{qualified}) \\approx 100\\%$ for $\\tau > 300$." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "4a16be2f9284e5357205ed14610051fd", "grade": false, "grade_id": "cell-9a0a3a7a04d6a269", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**flipping the condition.** In the previous section we have computed the conditional probability $P(x=\\mathrm{qualified} \\;|\\; t)$ of a student having obtained the necessary qualifications after studying for $t$ hours. In the following section we want to look at the reverse case $P(t \\;|\\; x=\\mathrm{qualified})$ which models the likelihood of having worked $t$ hours under the condition of beloning to the group of students that have obtained the necessary qualifcations. This might seem counterintuitive at first, but we'll see that this statistic will be useful to make future predictions." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "1c4fe7f3426d9a5a5d2b71d40c34ad7a", "grade": false, "grade_id": "cell-e92b94e164446ba9", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**🛠️ Task 5.9 (2 points)** In this task you'll have to compute the flipped conditional probabilities $P(t \\;|\\; x=\\mathrm{unqualified})$ and $P(t \\;|\\; x=\\mathrm{qualified})$ from the student survey dataset. Save the results for all values $t \\in \\{0, \\dots, 400\\}$ into the two variables ``p_t_y0`` and ``p_t_y1`` respectively." ] }, { "cell_type": "code", "execution_count": 133, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "bf7d957a6f8dcea661cfc05737ffa291", "grade": false, "grade_id": "cell-cfbc0a3b3372ce3a", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.\n", "No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 133, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# This is an array that contains all possible integer values for the hours of study \n", "# (all possible *t* values) in ascending order.\n", "# [0 1 2 3 4 ... 400]\n", "ts: np.ndarray = np.arange(0, 400)\n", "\n", "p_t_y0: np.ndarray = np.zeros_like(ts, dtype=float)\n", "p_t_y1: np.ndarray = np.zeros_like(ts, dtype=float)\n", " \n", "for i,t in enumerate(ts):\n", " p_t_y0[i] = (np.sum(df.loc[df.qualified == 0].hours_study == t) / np.sum(df.qualified == 0))\n", " p_t_y1[i] = (np.sum(df.loc[df.qualified == 1].hours_study == t) / np.sum(df.qualified == 1))\n", "\n", "p_t_y0[np.isnan(p_t_y0)] = 0.0\n", "p_t_y1[np.isnan(p_t_y1)] = 0.0\n", "\n", "\n", "fig_t_y, (ax_y0, ax_y1) = plt.subplots(\n", " ncols=2,\n", " nrows=1,\n", " figsize=(10, 5)\n", ")\n", "\n", "# p(t | y=unqualified)\n", "ax_y0.plot(ts, p_t_y0, color='tab:blue')\n", "ax_y0.set_ylabel(r'$p$')\n", "ax_y0.set_xlabel(r'$t$')\n", "ax_y0.set_title(r'$P(t\\;|\\;y=\\mathrm{unqualified})$')\n", "ax_y0.legend()\n", "\n", "# p(t | y=qualified)\n", "ax_y1.plot(ts, p_t_y1, color='tab:orange')\n", "ax_y1.set_ylabel(r'$p$')\n", "ax_y1.set_xlabel(r'$t$')\n", "ax_y1.set_title(r'$P(t\\;|\\;y=\\mathrm{qualified})$')\n", "ax_y1.legend()" ] }, { "cell_type": "code", "execution_count": 134, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "7ba67bb2aa71e8e486f3500856567a71", "grade": true, "grade_id": "test-5-9-conditional-qualified", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# ID: test-5-9-conditional-qualified - possible points: 1\n", "\n", "assert isinstance(p_t_y1, np.ndarray), 'solution must be array'\n", "assert len(p_t_y1) == 400, 'solution is missing elements'\n", "assert np.isclose(np.sum(p_t_y1), 1.0), 'solution is not a proper density distribution!'\n", "\n", "# NOTE: The hidden tests will check the statistical properties of the value arrays (mean, variance)\n", "\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "code", "execution_count": 135, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "06217a8d56c2d401d1670d03aed5b86f", "grade": true, "grade_id": "test-5-9-conditional-unqualified", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# ID: test-5-9-conditional-unqualified - possible points: 1\n", "\n", "assert isinstance(p_t_y0, np.ndarray), 'solution must be array'\n", "assert len(p_t_y0) == 400, 'solution is missing elements'\n", "assert np.isclose(np.sum(p_t_y0), 1.0), 'solution is not a proper density distribution!'\n", "\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "1f5b579bcf8f7a36463b57a44a7bcd5f", "grade": false, "grade_id": "cell-50144eb78ec4632c", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "The first thing we notice here is that we encounter somewhat the same problem as in the previous section. For some values of $t$ the probability is zero since there weren't any students who reported that exact study time. However, it is still possible to recognize the overall shape of the distributions - especially in case of the second plot $P(t\\;|\\;y=\\mathrm{qualified})$ since it is based on more overall samples. In this case, we can vaguely recognize the shape as a [gaussian distribution](https://en.wikipedia.org/wiki/Normal_distribution). This intuitively makes sense, since many random variables are normally distributed.\n", "\n", "**Curve fitting.** Based on this insight, another option to obtain a more smooth probability distribution is to simply fit the existing data to a cont. gaussian. Based on the formula for the gaussian distribution\n", "\n", "$$\n", "p(t;\\mu,\\sigma) \\sim \\frac{1}{\\sqrt{2 \\pi \\sigma^2}} \\mathrm{exp}(\\frac{- (t - \\mu)}{2 \\sigma^2})\n", "$$\n", "\n", "this only requires to determine the *mean* $\\mu$ and the *standard deviation* $\\sigma$ from the given data." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "07be2a656b94c087ad00ff3fcb137118", "grade": false, "grade_id": "cell-0a64ce09be3b3fd6", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**🛠️ Task 5.10 (2 points)** For this task, you'll have to implement the two functions ``gauss_y0`` and ``gauss_y1`` which should implement gaussian distributions fitting the conditional probabilities $P(t \\;|\\; y=\\mathrm{unqualified})$ and $P(t \\;|\\; y=\\mathrm{qualified})$ respectively." ] }, { "cell_type": "code", "execution_count": 160, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "e6512d2239075aee12d5ccf62671b1da", "grade": false, "grade_id": "cell-d82bb1959a8e402c", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "execution_count": 160, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# TASK: Since the direct estimations of the conditional probabilities p(t|y=0) and p(t|y=1) have \n", "# a lot of missing values we want to fit gaussian distributions instead. The two functions \n", "# below should be implemented to return the probability density value for these fitted gaussian \n", "# distributions for any continuous value of t.\n", "\n", "def gaussian(x, mu, sig):\n", " return 1./(np.sqrt(2.*np.pi)*sig)*np.exp(-np.power((x - mu)/sig, 2.)/2)\n", "\n", "\n", "def gauss_y0(t: float) -> float:\n", " data = p_t_y0#[p_t_y0 != 0] \n", " loc = ts#[p_t_y0 != 0]\n", " from scipy.optimize import curve_fit\n", " coeff, var_matrix = curve_fit(gaussian, loc, data, p0=(100,30))\n", " return gaussian(t, *coeff)\n", " \n", "def gauss_y1(t: float) -> float:\n", " data = p_t_y1#[p_t_y1 != 0] \n", " loc = ts#[p_t_y1 != 0]\n", " from scipy.optimize import curve_fit\n", " \n", " coeff, var_matrix = curve_fit(gaussian, loc, data, p0=(100,30))\n", " return gaussian(t, *coeff)\n", " \n", " \n", "fig = copy.deepcopy(fig_t_y)\n", "ax_y0, ax_y1 = fig.get_axes()\n", " \n", "# p(t | y=unqualified)\n", "values_y0 = [gauss_y0(t) for t in ts]\n", "ax_y0.plot(ts, values_y0, color='blue')\n", "ax_y0.fill_between(ts, values_y0, color='blue')\n", "\n", "avg_y0, _ = quad(lambda t: t * gauss_y0(t), -np.inf, np.inf)\n", "var_y0, _ = quad(lambda t: (t - avg_y0)**2 * gauss_y0(t), -np.inf, np.inf)\n", "std_y0 = np.sqrt(var_y0)\n", "\n", "ax_y0.axvline(avg_y0, label=r'$\\mu$' + f' = {avg_y0:.2f} hrs', color='black', ls='-', zorder=10)\n", "ax_y0.hlines(np.mean(ax_y0.get_ylim()), avg_y0 - std_y0, avg_y0 + std_y0, \n", " label=r'$\\sigma$' + f' = {std_y0:.2f} hrs', color='black', ls='--')\n", "#ax_y0.axvline(np.mean(df['hours_study']))\n", "ax_y0.legend(loc='upper right')\n", "ax_y0.set_title(r'$P(t\\;|\\;y=\\mathrm{unqualified})$')\n", "\n", "# p(t | y=qualified)\n", "values_y1 = [gauss_y1(t) for t in ts]\n", "ax_y1.plot(ts, values_y1, color='red')\n", "ax_y1.fill_between(ts, values_y1, color='red')\n", "\n", "avg_y1, _ = quad(lambda t: t * gauss_y1(t), -np.inf, np.inf)\n", "var_y1, _ = quad(lambda t: (t - avg_y1)**2 * gauss_y1(t), -np.inf, np.inf)\n", "std_y1 = np.sqrt(var_y1)\n", "\n", "ax_y1.axvline(avg_y1, label=r'$\\mu$' + f' = {avg_y1:.2f} hrs', color='black', ls='-')\n", "ax_y1.hlines(np.mean(ax_y1.get_ylim()), avg_y1 - std_y1, avg_y1 + std_y1, \n", " label=r'$\\sigma$' + f' = {std_y1:.2f} hrs', color='black', ls='--')\n", "ax_y1.legend(loc='upper right')\n", "\n", "\n", "fig" ] }, { "cell_type": "code", "execution_count": 161, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "6f700e6438c2c54101ccdd0f499d1d0b", "grade": true, "grade_id": "test-5-10-curve-fitting", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# ID: test-5-10-curve-fitting - possible points: 2\n", "\n", "assert callable(gauss_y0), 'solution must be callable function'\n", "assert np.isclose(quad(gauss_y0, -1000, 1000)[0], 1.0), 'not a proper probability density function'\n", "value_y0 = gauss_y0(200)\n", "assert 0.005 > value_y0 > 0.002, 'fitted function likely incorrect'\n", "\n", "assert callable(gauss_y1), 'solution must be callable function'\n", "assert np.isclose(quad(gauss_y1, -1000, 1000)[0], 1.0), 'not a proper probability density function'\n", "value_y1 = gauss_y1(200)\n", "assert 0.008 > value_y1 > 0.004, 'fitted function likely incorrect'\n", "\n", "# NOTE: The hidden tests will evaluate the statistical attributes of the fitted gaussian distributions \n", "# (mean and standard deviation) and will compare them to the expected values with a tolerance \n", "# of 2 decimals.\n", "\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "efdb71fa0cabfeb2e034d8982b830ca2", "grade": false, "grade_id": "cell-d20627b99ea7be54", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "From these results we can see that the groups of \"qualified\" and \"unqualified\" students seem to slightly differ in their studying behavior. On average, an unqualified students spents about ~40 hours less studying. Additionally, the highest reported study time in the group of unqualified students is ~270 hours while for the qualified students this is ~350. Generally, these results support the inuitive interpretation that the longer a student studies, the higher the chance that they obtain the necessary qualifications.\n", "\n", "In the next section, we can use statistical differences like these to not only analyze past data but to build a predictor model for future data points as well." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "88febf9449f3e836fd59497bf1202bc4", "grade": false, "grade_id": "cell-35ea3e838e87e13c", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "## 5.3 Naive Bayes Estimator\n", "\n", "At the end of the previous section, we've established that amongst the two groups / \"classes\" of students we can find differences regarding their statistical properties. In this section, we want to use these insights not only analyze past data, but to build an *estimator* model for future data.\n", "\n", "**Predicting exam outcomes.** Previously, we investigated the question of a student's true qualifications $y = \\{\\mathrm{qualified}, \\mathrm{unqualified} \\}$. Arguably, while this is an important question from the perspective of a *professor*, a *student* likely cares more about simply passing the exam. Therefore, we'll try to build a model which predicts $x = \\{ \\mathrm{pass}, \\mathrm{fail} \\}$ whether a student will pass or fail based on previously available information, such as the number of hours spent studying, the points of the exercise and lecture attendence.\n", "\n", "**Naive Bayes estimation.** There are various different machine learning methods available to solve the aforementioned classification task, but the one which we'll be focusing on is a [Naive Bayes Classifier](https://en.wikipedia.org/wiki/Naive_Bayes_classifier). To understand the Naive Bayes classifier, we can assume a generic classification problem with $k$ possible classes $C_k$. To make this classification, we have a concrete vector of observations $\\mathbf{x} = [x_1 \\; x_2 \\; x_3]$. We can pose the classification problem as a probabilistic problem in which we want to find the conditional probability\n", "\n", "$$\n", "p(C_k \\;|\\; \\mathbf{x}) = p(C_k \\;|\\; x_1,x_2,x_3)\n", "$$\n", "\n", "of classifying as $C_k$ when having observed $\\mathbf{x}$. Given these probabilities, we can then determine the predicted class\n", "\n", "$$\n", "k_{\\mathrm{pred}} = \\mathrm{argmax}_{k} p(C_k \\;|\\; \\mathbf{x})\n", "$$\n", "\n", "as the one with the highest probability given the specific observation $\\mathbf{x}$. To determine this conditional probability $p(C_k \\;|\\; \\mathbf{x})$ we can apply *Bayes Rule*:\n", "\n", "$$\n", "p(C_k \\;|\\; \\mathbf{x}) = \\frac{p(C_k) p(\\mathbf{x} \\;|\\; C_k)}{p(\\mathbf{x})}\n", "$$\n", "\n", "To determine the argmax we can furthermore drop the denominator $p(\\mathbf{x})$ of this expression as it is the same value for all the different classes with respect to the same observation. Usually, the term $p(\\mathbf{x} \\;|\\; C_k)$ would be a complex expression which would not only have to take into account the conditional independence of each observation regarding the class but also the interdependence between the observations. This is where the *naive* part of the method comes into play: We will simply assume that all the features of our observation $\\mathbf{x}$ are themselves mutually *independent* of each other. Under this assumption, the term can be simplified like this:\n", "\n", "$$\n", "p(C_k \\;|\\; \\mathbf{x}) \\sim p(C_k) \\cdot p(x_1 \\;|\\; C_k) \\cdot p(x_2 \\;|\\; C_k) \\cdot p(x_3 \\;|\\; C_k)\n", "$$\n", "\n", "In the end, this formula only requires the prior proabilitiy $p(C_k)$ of each class and the independent conditional probability of each individual observation. \n", "\n", "---\n", "\n", "For a more in-depth understanding of the Naive Bayes Classifier, you can watch the following two videos which explain the topic in more detail:\n", "\n", "- [Naive Bayes, Clearly Explained](https://www.youtube.com/watch?v=O2L2Uv9pdDA)\n", "- [Gaussian Naive Bayes, Clearly Explained](https://www.youtube.com/watch?v=H3EjCKtlVog)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "fa249653fc9e23aeb754d3bb3fede5fc", "grade": false, "grade_id": "cell-5bfc053b0d4ac9d4", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "from IPython.display import YouTubeVideo\n", "YouTubeVideo('O2L2Uv9pdDA', width=800, height=500)\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "6eb0100f6bd92e8d276946fce7ce55a4", "grade": false, "grade_id": "cell-92c65f39f15247bf", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "from IPython.display import YouTubeVideo\n", "YouTubeVideo('H3EjCKtlVog', width=800, height=500)\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "0e45509a83dff45bd51dc9797be87b1e", "grade": false, "grade_id": "cell-db46dbd0af1170a3", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**Types of observations.** There are basically two different types of observations on which the NaiveBayes estimator can base its predictions: *continuous* and *categorical* variables. In our example of predicting the exam outcome, we can use the students' other survey questions as observations. An example for a continuous observation is the number of hours spent studying - theoretically, a student might answer with any real number and there is a continuous sprectrum of possible answers. A categorical observation on the other hand only has a discrete set of possible answers. The question of whether a student has regularly attended the lectures or not, for example, can only be answered with either \"yes\" or \"no\". For the implementation of the NaiveBayes classifier, both of these will have to be treated differently. However, in both cases we want to obtain some function $p(\\mathbf{x} \\;|\\; C_k)$ that returns the likelihood for an observation $x$.\n", "\n", "**Gaussian fitting for continuous data.** For continuous observations, we want to model this likelihood by fitting the parameters of a guassian distribution to approximately model the probability density function of the observation variable. More specifically, we want to fit a *family* of multiple gaussian distributions to model the conditional probabilities for each possible target class separately. Given $K$ possible classes, we need to fit exactly $K$ different gaussion distributions $p(\\mathbf{x} \\;|\\; C_k) \\sim \\mathcal{N}(\\mu_k, \\sigma_k)$." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "66814dfb94882c04b8fa32b4e0cf16aa", "grade": false, "grade_id": "cell-417b1f5ed0b040c6", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**🛠️ Task 5.11 (4 points)** For this task, you'll have to implement a function ``fit_gaussian``, which given a dataset of observations and corresponding target classes, fits a family of gaussian distributions $p(\\mathbf{x} \\;|\\; C_k)$. Specifically, this function is itself supposed to return a function ``p(x: float, y: int) -> float`` which implements this conditional probability density function for a specific continuous observation value ``x`` and a target class ``y``." ] }, { "cell_type": "code", "execution_count": 188, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "b7de039e68edde5ab7054301392aa0ea", "grade": false, "grade_id": "cell-08ca74ce77d2b306", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "import typing as t\n", "\n", "# TASK: This function accepts two arrays which constitute a dataset on which to fit the gaussian\n", "# distributions. The first array \"xs\" contains the continuous / float observations and the \n", "# second array \"ys\" contains the targets for the conditioning. The function should then \n", "# return another function p(x, y) which outputs the probability DENSITY value p(x) of a \n", "# gaussian distribution - but a different one depending on the value of the second parameter \n", "# y.\n", "\n", "# HINT: You can simply define a local function inside of another function by using the \"def\"\n", "# statement and then return that local function.\n", "\n", "# HINT: First think about how to generically obtain all the possible target values that appear\n", "# throughout the target array \"ys\". The final function p(x, y) must implement a different \n", "# gaussian for all of them. \n", "\n", "def fit_gaussian(xs: np.ndarray,\n", " ys: np.ndarray,\n", " ) -> t.Callable[[float, int], float]:\n", " \"\"\"\n", " This function is supposed to fit a family of conditional gaussiain probabilitiy density functions\n", " based on a given array observations ``xs`` of shape (num_elements, ) and a given array of the associated \n", " categorical (integer) targets ``ys`` of the shape (num_elements, ).\n", " \n", " This function should itself return a function ``p(x: float, y: int) -> float`` which receives the \n", " observation value x as the first parameter and the target value y as the second parameter and the \n", " function should implement the evaluation of the conditional gaussion p(x|y) for the \n", " given combination.\n", " \n", " :param xs: array of shape (num_elements, ) - continuous / float observations\n", " :param ys: array of shape (num_elements, ) - categorical / integer targets\n", " \n", " :returns: callable function p(x|y)\n", " \"\"\"\n", " \n", " import sklearn.mixture\n", " gmm = sklearn.mixture.GaussianMixture()\n", " \n", " def gaussian(x, mu, sig):\n", " return 1./(np.sqrt(2.*np.pi)*sig)*np.exp(-np.power((x - mu)/sig, 2.)/2)\n", "\n", " \n", " cat = {}\n", "\n", " for i in np.unique(ys):\n", " loc = xs[ys == i]\n", " r = gmm.fit(loc[:, np.newaxis])\n", " cat[i] = r.means_[0, 0], r.covariances_[0, 0][0]\n", " #print(cat)\n", " \n", " return lambda x, i: 1./(np.sqrt(2.*np.pi)*cat[i][1])*np.exp(-np.power((x - cat[i][0])/cat[i][1], 2.)/2)\n", "\n" ] }, { "cell_type": "code", "execution_count": 189, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "51784291858bd04824ed6a5e71fda496", "grade": true, "grade_id": "test-5-11-gaussian-fit", "locked": true, "points": 4, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# ID: test-5-11-gaussian-fit - possible points: 4\n", "\n", "\n", "xs = np.array([1, 2, 3, 3, 4], dtype=float)\n", "ys = np.array([0, 0, 0, 1, 1], dtype=int)\n", "\n", "p = fit_gaussian(xs, ys)\n", "assert callable(p), 'fit is not a callable function'\n", "assert isinstance(p(0, 0), float), 'function output is not a float'\n", "p_int, _ = quad(lambda x: p(x, 0), -100, 100) # integral over the whole distribution\n", "assert np.isclose(p_int, 1.0), 'fitted function not a proper density function'\n", "p_mean, _ = quad(lambda x: x * p(x, 0), -100, 100) # mean value for the distribution (1.moment)\n", "assert np.isclose(p_mean, 2.0), 'gaussian has incorrect mean value'\n", "\n", "# NOTE: The hidden tests will perform similar tests, but for a more complex example, which \n", "# includes >2 possible target values.\n", "\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "570a0d293059ca5856f7966c19bf947f", "grade": false, "grade_id": "cell-0ee4d8594d48e9da", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**Categorical fit.** Categorical observations are the alternative to continuous observations. A categorical observation only has a discrete set of possible values that can possibly be observed. Therefore, the categorical fit is much easier to model since, ony only has to determine a single probability value for all possible combinations of of discrete target and observation values." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "ce12e199d0a5af876db9439993de1b61", "grade": false, "grade_id": "cell-ec0cec4f10bd78d6", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**🛠️ Task 5.12 (4 points)** For this task, you'll have to implement a function ``fit_categorical``, which given a dataset of observations and corresponding target classes, fits a family of conditional probability values $p(\\mathbf{x} \\;|\\; C_k)$. Specifically, this function is itself supposed to return a function ``p(x: int, y: int) -> float`` which implements this conditional probability density function for a specific discrete observation value ``x`` and a discrete target class ``y``." ] }, { "cell_type": "code", "execution_count": 192, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "f73aaa75962322c9c8a969f2dc71a61d", "grade": false, "grade_id": "cell-914053ae87f91bae", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# TASK: This function accepts two arrays which constitute a dataset on which to fit the gaussian\n", "# distributions. The first array \"xs\" contains the continuous / float observations and the \n", "# second array \"ys\" contains the targets for the conditioning. The function should then \n", "# return another function p(x, y) which outputs the probability DENSITY value p(x) of a \n", "# gaussian distribution - but a different one depending on the value of the second parameter \n", "# y.\n", "\n", "# HINT: You can simply define a local function inside of another function by using the \"def\"\n", "# statement and then return that local function.\n", "\n", "# HINT: First think about how to generically obtain all the possible discrete values that appear\n", "# throughout the target array \"ys\" and the observation array \"xs\". The fitted function \n", "# p(x, y) has to return a distinct value for every possible pairwise combination!\n", "\n", "\n", "def fit_categorical(xs: np.ndarray,\n", " ys: np.ndarray,\n", " ) -> t.Callable[[int], float]:\n", " \"\"\"\n", " This function is supposed to fit a family of conditional probability values \n", " based on a given array observations ``xs`` of shape (num_elements, ) and a given array of the associated \n", " categorical (integer) targets ``ys`` of the shape (num_elements, ).\n", " \n", " This function should itself return a function ``p(x: float, y: int) -> float`` which receives the \n", " observation value x as the first parameter and the target value y as the second parameter and the \n", " function should implement the evaluation of the conditional gaussion p(x|y) for the \n", " given combination.\n", " \n", " :param xs: array of shape (num_elements, ) - categorical / integer observations\n", " :param ys: array of shape (num_elements, ) - categorical / integer targets\n", " \n", " :returns: callable function p(x|y)\n", " \n", " \"\"\"\n", "\n", " return lambda x,y: sum(xs[ys==y]==x)/sum(ys==y)\n" ] }, { "cell_type": "code", "execution_count": 193, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "14b7f7e7930e2a3f57dcbdb498d4c7e2", "grade": true, "grade_id": "test-5-12-categorical-fit", "locked": true, "points": 4, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# ID: test-5-12-categorical-fit - possible points: 4\n", "\n", "\n", "xs = np.array([0, 0, 0, 1, 0, 0, 1, 1], dtype=int)\n", "ys = np.array([0, 0, 0, 0, 1, 1, 1, 1], dtype=int)\n", "\n", "p = fit_categorical(xs, ys)\n", "assert callable(p), 'fit is not a callable function'\n", "assert isinstance(p(0, 0), float), 'function output is not a float'\n", "p_sum = p(0, 0) + p(1, 0)\n", "assert np.isclose(p_sum, 1.0), 'fitted function not a proper density function - must add to 1'\n", "assert np.isclose(p(1, 0), 0.25)\n", "\n", "\n", "# NOTE: The hidden tests will perform similar tests, but for a more complex example, which \n", "# includes >2 possible target values and observations.\n", "\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "code", "execution_count": 194, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "54cd88a0d47a5a573246b5e71b3763c2", "grade": false, "grade_id": "cell-1bd512e3e592e3b9", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# NOTE: You will have to understand, but not modify the content of this cell!\n", "\n", "\n", "class NaiveBayesClassifier:\n", " \"\"\"\n", " Custom implementation of a Naive Bayes classifier.\n", " \"\"\"\n", " \n", " # :const CONTINUOUS: The constant string literal that can be used as a value in the ``observation_types`` \n", " # dictionary to indicate a continuous / float observation.\n", " CONTINUOUS: str = 'continuous'\n", " \n", " # :const CATEGORICAL: The constant string literal that can be used as a value in the ``observation_types`` \n", " # dictionary to indicate a categorical / integer observation.\n", " CATEGORICAL: str = 'categorical'\n", " \n", " def __init__(self, \n", " num_targets: int,\n", " observation_types: dict[int, str],\n", " ) -> None:\n", " \"\"\"\n", " :param num_targets: The number of targets to be predicted.\n", " :param observation_type: A dict whose keys are the integer indices of the observations and \n", " the values is either of the string literals \"continuous\" or \"categorical\" which \n", " indicate whether the corresponding observation is a continuous random variable \n", " (requires gaussian fit) or a categorical random variable (requires categorical fit)\n", " \"\"\"\n", " self.num_targets = num_targets\n", " # This list contains the target indices aka the possible target values / classes.\n", " self.target_indices: list[int] = np.arange(num_targets)\n", " \n", " # The keys of this dict are the integer indices of the observations within the \n", " # observation vectors and the values are string literals which identify whether that \n", " # corresponding observation is continuous (needing gaussian fit) or categorical \n", " # (needing categorical fit).\n", " self.observation_types: dict[int, str] = observation_types\n", " # This list contains the observation indices that will be considered by the \n", " # model.\n", " self.observation_indices: list[int] = list(observation_types.keys())\n", " \n", " # During the fit process this dictionary will be populated, where the \n", " # integers are the observation indices and the values are the corrsponding \n", " # conditional probability functions p(x|y).\n", " self.observation_funcs: dict[int, t.Callable] = {}\n", " \n", " # During the fit process this dict will be populated, where the keys are the \n", " # target keys and the values are the prior probabilities of the targets\n", " self.target_priors: dict[int, float] = {}\n", " \n", " def fit(self, xs: np.ndarray, ys: np.ndarray) -> None:\n", " \"\"\"\n", " fits the model for the given observations ``xs`` of shape (num_elements, num_observations) \n", " and the corresponding target classes ``ys`` of shape (num_elements, ).\n", " \n", " :param xs: The list of observation vectors with shape (num_elements, num_observations)\n", " :param ys: The list of target classes with shape (num_elements, )\n", " \n", " :returns: None\n", " \"\"\"\n", " # x: (num_elements, num_attributes)\n", " # y: (num_elements, num_targets)\n", " \n", " # ~ fitting the prior target probabilities\n", " for target_index in self.target_indices:\n", " self.target_priors[target_index] = len([y for y in ys if y == target_index]) / len(ys)\n", " \n", " # ~ fitting the conditional probability densities\n", " for observation_index, observation_type in self.observation_types.items():\n", " \n", " if observation_type == 'categorical':\n", " func = self.fit_categorical(xs[:, observation_index], ys)\n", " elif observation_type == 'continuous':\n", " func = self.fit_gaussian(xs[:, observation_index], ys)\n", " \n", " self.observation_funcs[observation_index] = func\n", " \n", " def fit_gaussian(self, xs: np.ndarray, ys: np.ndarray) -> t.Callable[[float, int], float]:\n", " return fit_gaussian(xs, ys)\n", " \n", " def fit_categorical(self, xs: np.ndarray, ys: np.ndarray) -> t.Callable[[int, int], float]:\n", " return fit_categorical(xs, ys)\n", " \n", " def predict_single(self, x: np.ndarray) -> int:\n", " \"\"\"\n", " performs a single prediction based on the given vector of observations ``x`` with the shape \n", " (num_observations, ). Method returns the integer index of the predicted target class.\n", " \n", " :param x: observation array of shape (num_observations, )\n", " \n", " :returns: integer target index\n", " \"\"\"\n", " assert len(x.shape) == 1, 'observation array needs to have shape (num_observations, )!'\n", " num_observations = x.shape[0]\n", " \n", " # In this dictionary we will store the accumulated likelihoods corresponding to the different \n", " # target values. The keys of this dict are the target indices and the values are the \n", " # corresponding likelihoods that have been calculated according to bayes rule.\n", " target_ps: dict[int, float] = defaultdict(int)\n", " \n", " # log likelihoods:\n", " # Instead of multiplicatively accumulating the raw likelihood values in the range [0, 1] \n", " # we will actually additively accumulate the log likelihoods. This does not make a difference \n", " # for the argmax computation.\n", " for target_index in self.target_indices:\n", " target_ps[target_index] = np.log(self.target_priors[target_index])\n", " \n", " for observation_index in range(num_observations):\n", " func = self.observation_funcs[observation_index]\n", " p = func(float(x[observation_index]), float(target_index)) + 1e-6\n", " log_p = np.log(p)\n", " target_ps[target_index] += log_p\n", " \n", " return max(self.target_indices, key=lambda i: target_ps[i])\n", " \n", " def predict(self, xs: np.ndarray) -> np.ndarray:\n", " \"\"\"\n", " performs multiple predictions for all of the elements contained in the given array \n", " of observations ``xs``.\n", " \n", " :param xs: An array of the shape (num_elements, num_observations)\n", " \n", " :returns: The array of the resulting target class predictions of the shape (num_elements, )\n", " where all elements are an integer number indicating the predicted class.\n", " \"\"\"\n", " assert isinstance(xs, np.ndarray)\n", " assert len(xs.shape) == 2, 'observation array needs to have shape (num_elements, num_observations)'\n", " \n", " # For multiple predictions we simple iterate over the list of all individual observations \n", " # and use the ``predict_single`` function.\n", " ys = []\n", " for x in xs:\n", " ys.append(self.predict_single(x))\n", " \n", " return np.array(ys)\n", "\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "d93ab3219eb03b24bdff55d852103c86", "grade": false, "grade_id": "cell-c76b37fce585a35a", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**🛠️ Task 5.13 (1 points)** For this task, you'll have to implement a function ``dataset_from_df``. Currently the student survey dataset is still available as a DataFrame object. To fit the ``NaiveBayesClassifier``. It will have to be processed into a dataset consisting of two numpy arrays ``xs`` and ``ys`` containing the observations and targets respectively. The observation array of the dataset should be structured like this exmaple:\n", "\n", "```python\n", "xs = [\n", " [\n", " 210, # hours of study\n", " 93, # points in the exercise\n", " 0, # lecture attendence\n", " 1, # studying uses old exams\n", " ],\n", " # ...\n", "]\n", "```\n", "\n", "For the target value, we want to predict the **exam outcome** $\\in \\{\\mathrm{fail}, \\mathrm{pass}\\} = \\{0, 1\\}$. The target array should therefore be structured like this example:\n", "\n", "```python\n", "ys = [\n", " 0, # fail\n", " 1, # pass\n", " # ...\n", "]\n", "```" ] }, { "cell_type": "code", "execution_count": 208, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "475aead2250a9febc527428615c64f81", "grade": false, "grade_id": "cell-4b373d0419ac1b83", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "# TASK: Implement the following function which receives the student survey dataset in the form \n", "# of a data frame and returns a tuple of two numpy arrays that represent the observation and \n", "# targets respectively in the form needed to train the classifier.\n", "\n", "\n", "def dataset_from_df(data_frame: pd.DataFrame\n", " ) -> tuple[np.ndarray, np.ndarray]:\n", " \"\"\"\n", " This function takes a pandas DataFrame ``data_frame`` object as an input and is supposed \n", " to return two numeric numpy arrays which contain the observation vectors and target classes \n", " for each sample of the dataset respectively.\n", " \n", " The observation array ``xs`` has to have the shape (num_elements, num_observations)\n", " The target array ``ys`` has to have the shape (num_elements, )\n", " \n", " :param data_frame: A dataframe object containing the student survey dataset.\n", " \n", " :returns: A tuple of numpy arrays (xs, ys)\n", " \"\"\"\n", " \n", " ys = np.array(data_frame.passed)\n", " xs = np.array(data_frame[[\"hours_study\", \"exercise_points\", \"lecture\", \"old_exams\"]])\n", " return xs, ys\n" ] }, { "cell_type": "code", "execution_count": 209, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "271c2ad05a71dfbc397a8f0d99c55b34", "grade": true, "grade_id": "test-5-13-convert-dataset", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# ID: test-5-13-convert-dataset - possible points: 1\n", "\n", "\n", "df_ = pd.DataFrame({\n", " 'hours_study': [200, 300], \n", " 'exercise_points': [87, 45],\n", " 'lecture': [0, 1],\n", " 'old_exams': [1, 0],\n", " 'passed': [1, 0],\n", "})\n", "\n", "xs_, ys_ = dataset_from_df(df_)\n", "\n", "assert isinstance(xs_, np.ndarray)\n", "assert len(xs_) == 2\n", "\n", "assert isinstance(ys_, np.ndarray)\n", "assert len(ys_) == 2\n", "\n", "# NOTE: The hidden tests will construct a more complex example df_ and check for the exact \n", "# shapes as well as the exact values.\n", "\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "500e12555697d74659ad0e169c9ed8ad", "grade": false, "grade_id": "cell-43fd6caf454a67c7", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "**🛠️ Task 5.14 (2 points)** For this task, we will actually instantiate a new ``NaiveBayesClassifier`` model and fit it with the data from the student survey dataset." ] }, { "cell_type": "code", "execution_count": 212, "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "f631bb097f8a623be424413f73f13a3d", "grade": false, "grade_id": "cell-0020b31412e327a1", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model training accuracy: 86.00%\n" ] } ], "source": [ "from sklearn.metrics import accuracy_score\n", "\n", "xs, ys = dataset_from_df(df)\n", "# preparing the training and validation split for the model training.\n", "xs_train, xs_val = xs[:200], xs[200:]\n", "ys_train, ys_val = ys[:200], ys[200:]\n", "\n", "# TASK: Instantiate a new model into this variable ``model`` and use the ``fit`` method to train \n", "# the classifier on the training split of the dataset.\n", "\n", "model: NaiveBayesClassifier = None\n", "\n", "# YOUR CODE HERE\n", "model = NaiveBayesClassifier(2, {0:\"continuous\", 1:\"continuous\", 2:\"categorical\", 3:\"categorical\"})\n", "model.fit(xs, ys)\n", "\n", "ys_pred = model.predict(xs_train)\n", "acc_train = accuracy_score(ys_train, ys_pred)\n", "print(f'Model training accuracy: {acc_train*100:.2f}%')" ] }, { "cell_type": "code", "execution_count": 213, "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "8ca65c4206f9070841e66f784248e058", "grade": true, "grade_id": "test-5-14-train-model", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model validation accuracy: 82.81%\n" ] } ], "source": [ "##### DO NOT CHANGE #####\n", "# ID: test-5-14-train-model - possible points: 2\n", "\n", "from sklearn.metrics import accuracy_score\n", "\n", "ys_pred = model.predict(xs_val)\n", "assert ys_pred.shape == ys_val.shape, 'multi-element prediction failed'\n", "\n", "acc_val = accuracy_score(ys_val, ys_pred)\n", "print(f'Model validation accuracy: {acc_val*100:.2f}%')\n", "assert acc_val >= 0.8, 'validation accuracy too low'\n", "\n", "# Note: The hidden tests will download an independent and unseen test set (w. same distribution) and \n", "# test the accuracy of the fitted model! The model will not be checked for an exact accuracy \n", "# but will have to pass a minimal accuracy threshold of 80% on this unseen test set.\n", "\n", "# Note: The hidden tests rely on a working implementation of ``dataset_from_df`` to process the \n", "# downloaded test set - so make sure that it is working properly.\n", "\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "3bf618e40df9ca745c51beff4d16bc77", "grade": false, "grade_id": "cell-a0b208d5291352cb", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "⚠️ **DISCLAIMER.** The student survey dataset used in this exercise was *synthetically* created. The data is not based on any real individuals or events. Consequently, it is not advisable to use the trained predictive models to predict your own exam outcome!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.2" } }, "nbformat": 4, "nbformat_minor": 4 }