{ "cells": [ { "cell_type": "markdown", "id": "84b2ed6d", "metadata": { "id": "84b2ed6d" }, "source": [ "# Exercise Sheet No. 1\n", "\n", "---\n", "\n", "> Machine Learning for Natural Sciences, Summer 2024, Jun.-Prof. Pascal Friederich, pascal.friederich@kit.edu\n", "\n", "> Instructor: André Eberhard (andre.eberhard@kit.edu)\n", "\n", "---\n", "\n", "**Topic**: This exercise sheet will not be graded and serves as an\n", "introduction to explain the online exercise regulations and to help you to\n", "familiarize yourself with Python, Jupyter and Numpy. The exercises in this\n", "sheet are meant as an appetizer to show you what future exercises could cover." ] }, { "cell_type": "markdown", "id": "eee3c2a3", "metadata": { "id": "eee3c2a3" }, "source": [ "## Preliminaries\n", "If you are not familiar with Python, you may want to learn more about Python\n", "and its basic syntax. Since there are a lof of free and well written tutorials\n", " online, we refer you to one of the following online tutorials:\n", "\n", "* http://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook\n", "* https://www.learnpython.org/\n", "* https://automatetheboringstuff.com/" ] }, { "cell_type": "markdown", "id": "20c7a0a0", "metadata": { "id": "20c7a0a0" }, "source": [ "## 1.1 Corona (not graded)\n", "\n", "*Disclaimer*: If you are in any way personally affected by the Corona crisis,\n", "you do not have to participate in this exercise. It will not be graded or is\n", "necessary for the progress of this course.\n", "\n", "To get to know Python's data science workflows, we briefly analyze some data from the\n", "corona epidemic. First download a historical dataset on the corona\n", "infections worldwide from the European Centre for Disease Prevention and\n", "Control in 2020 ([link](https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide.xlsx)).\n", "We can do this in Python via the ``requests`` package." ] }, { "cell_type": "code", "execution_count": 1, "id": "84713313", "metadata": { "id": "84713313" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/usr/lib/python3/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.17.3 and <1.25.0 is required for this version of SciPy (detected version 1.26.4\n", " warnings.warn(f\"A NumPy version >={np_minversion} and <{np_maxversion}\"\n" ] } ], "source": [ "import os\n", "from datetime import datetime\n", "\n", "import matplotlib.dates as mdates\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import requests\n", "import scipy.optimize\n", "from sklearn.neural_network import MLPRegressor" ] }, { "cell_type": "code", "execution_count": 2, "id": "4e3050ba", "metadata": { "id": "4e3050ba" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading dataset ...\n", "Downloading dataset done.\n" ] } ], "source": [ "data_url = \"https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide.xlsx\"\n", "data_file = \"COVID-19-geographic-disbtribution-worldwide.xlsx\"\n", "if not os.path.exists(data_file):\n", " print(\"Downloading dataset ...\")\n", " with open(data_file, \"wb\") as f:\n", " f.write(requests.get(data_url).content)\n", " print(\"Downloading dataset done.\")" ] }, { "cell_type": "markdown", "id": "b38f90bf", "metadata": { "id": "b38f90bf" }, "source": [ "Now, we load the dataset via the data library ``pandas``, which will return a ``DataFrame`` object. We print the head of the table with ``.head()``:" ] }, { "cell_type": "code", "execution_count": 4, "id": "2c83c30e", "metadata": { "id": "2c83c30e" }, "outputs": [ { "data": { "text/html": [ "
\n", " | dateRep | \n", "day | \n", "month | \n", "year | \n", "cases | \n", "deaths | \n", "countriesAndTerritories | \n", "geoId | \n", "countryterritoryCode | \n", "popData2019 | \n", "continentExp | \n", "Cumulative_number_for_14_days_of_COVID-19_cases_per_100000 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2020-12-14 | \n", "14 | \n", "12 | \n", "2020 | \n", "746 | \n", "6 | \n", "Afghanistan | \n", "AF | \n", "AFG | \n", "38041757.0 | \n", "Asia | \n", "9.013779 | \n", "
1 | \n", "2020-12-13 | \n", "13 | \n", "12 | \n", "2020 | \n", "298 | \n", "9 | \n", "Afghanistan | \n", "AF | \n", "AFG | \n", "38041757.0 | \n", "Asia | \n", "7.052776 | \n", "
2 | \n", "2020-12-12 | \n", "12 | \n", "12 | \n", "2020 | \n", "113 | \n", "11 | \n", "Afghanistan | \n", "AF | \n", "AFG | \n", "38041757.0 | \n", "Asia | \n", "6.868768 | \n", "
3 | \n", "2020-12-11 | \n", "11 | \n", "12 | \n", "2020 | \n", "63 | \n", "10 | \n", "Afghanistan | \n", "AF | \n", "AFG | \n", "38041757.0 | \n", "Asia | \n", "7.134266 | \n", "
4 | \n", "2020-12-10 | \n", "10 | \n", "12 | \n", "2020 | \n", "202 | \n", "16 | \n", "Afghanistan | \n", "AF | \n", "AFG | \n", "38041757.0 | \n", "Asia | \n", "6.968658 | \n", "