{ "cells": [ { "cell_type": "markdown", "id": "40a164cb-2b01-4b40-b6c1-e51d5ef39c06", "metadata": {}, "source": [ "### multithreading and multiprocessing in Python\n", "\n", "Parallel execution of functions is possible in python to optimize CPU usage and sharing among \n", "functions. There are two mechanisms: *multi-threading* within one instance of the *python* \n", "interpreter and *multi-processing* with an own one for each process. The mehods of the two \n", "approaches are very similar, although there are differences in detail.\n", "\n", "The following very simple examples illustrate how to use multithreading and multiprocessing\n", "and demonstrate the differences for two extreme cases - functions that are either limited by \n", "waiting times for input or output or by CPU needs.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "8e519c5d-ae52-409e-805e-f512f2046f02", "metadata": {}, "outputs": [], "source": [ "#imports\n", "\n", "import time, os\n", "# for multithrading\n", "from threading import Thread, current_thread\n", "# for multiprocessing\n", "from multiprocessing import Process, current_process\n", " " ] }, { "cell_type": "code", "execution_count": null, "id": "7262f009-e3ff-4a35-b86b-0cdc22503eed", "metadata": {}, "outputs": [], "source": [ "# simple example with an i/o-bound and a cpu-bound process \n", "\n", "COUNT = 100000000\n", "SLEEP = 5\n", "\n", "def io_bound(sec):\n", "# this function does almost nothing (exept wait) \n", " pid = os.getpid()\n", " threadName = current_thread().name\n", " processName = current_process().name\n", " \n", " print(f\"{pid}:{processName}.{threadName} \\\n", " ---> Start sleeping...\")\n", " time.sleep(sec)\n", " print(f\"{pid}:{processName}.{threadName} \\\n", " ---> Finished sleeping...\")\n", "\n", " \n", "def cpu_bound(n):\n", "# this function heavily uses CPu (for counting)\n", " pid = os.getpid()\n", " threadName = current_thread().name\n", " processName = current_process().name\n", " \n", " print(f\"{pid}:{processName}.{threadName} \\\n", " ---> Start counting...\")\n", " \n", " while n>0:\n", " n -= 1\n", " \n", " print(f\"{pid}:{processName}:{threadName} \\\n", " ---> Finished counting...\")\n" ] }, { "cell_type": "markdown", "id": "b2c74e77-5974-4e82-a29e-6cc66145d75c", "metadata": {}, "source": [ "Call i/o bouund processes " ] }, { "cell_type": "code", "execution_count": null, "id": "d9fd4b78-adcb-4d1a-b2c3-10e1030adc79", "metadata": {}, "outputs": [], "source": [ "start_time = time.time()\n", "io_bound(SLEEP)\n", "io_bound(SLEEP)\n", "print(\"time taken in s\", time.time() - start_time)" ] }, { "cell_type": "markdown", "id": "1e819bd1-4319-458d-bdbe-0151176b9696", "metadata": {}, "source": [ "Now with multi-threading" ] }, { "cell_type": "code", "execution_count": null, "id": "5a4e6012-ebc9-4796-a54c-396c18397549", "metadata": { "scrolled": true }, "outputs": [], "source": [ "\n", "# i/o bound with multi-threading\n", "\n", "start_time = time.time()\n", "\n", "t1 = Thread(target = io_bound, args =(SLEEP, ))\n", "t2 = Thread(target = io_bound, args =(SLEEP, ))\n", "t1.start()\n", "t2.start()\n", "t1.join()\n", "t2.join()\n", "\n", "print(\"time taken in s\", time.time() - start_time)" ] }, { "cell_type": "markdown", "id": "b7232573-ac12-4c04-b29b-62146aa7fccb", "metadata": {}, "source": [ "Thetimee needed is significantly shorter, as wating time of one thread can be used by the other one.\n", "\n", "Now chek with CPU-bound functions" ] }, { "cell_type": "code", "execution_count": null, "id": "eff44d4a-c2fc-4040-a07a-fb7b9f663e81", "metadata": {}, "outputs": [], "source": [ "start_time = time.time()\n", "cpu_bound(COUNT)\n", "cpu_bound(COUNT)\n", "print(\"time taken in s\", time.time() - start_time)" ] }, { "cell_type": "markdown", "id": "fd1b8851-63e7-4cb8-8f9a-3d8d996df801", "metadata": {}, "source": [ "CPU-bound with multi-threading" ] }, { "cell_type": "code", "execution_count": null, "id": "64a7e45e-d70e-478c-9d4a-e766095d6bcf", "metadata": {}, "outputs": [], "source": [ "start_time = time.time()\n", "\n", "t1 = Thread(target = cpu_bound, args =(COUNT, ))\n", "t2 = Thread(target = cpu_bound, args =(COUNT, ))\n", "t1.start()\n", "t2.start()\n", "t1.join()\n", "t2.join()\n", "\n", "print(\"time taken in s\", time.time() - start_time)" ] }, { "cell_type": "markdown", "id": "c95baf50-7e90-4595-9f31-84fe5967cf59", "metadata": {}, "source": [ "No gain at all, it takes even longer du to overhad manageing the threads.\n", "\n", "Now try with multi-processing" ] }, { "cell_type": "code", "execution_count": null, "id": "3ac8f48e-902a-4bda-8f40-9fa4d67a8570", "metadata": {}, "outputs": [], "source": [ "start_time = time.time()\n", "\n", "p1 = Process(target = cpu_bound, args =(COUNT, ))\n", "p2 = Process(target = cpu_bound, args =(COUNT, ))\n", "p1.start()\n", "p2.start()\n", "p1.join()\n", "p2.join()\n", "\n", "print(\"time taken in s\", time.time() - start_time)" ] }, { "cell_type": "markdown", "id": "e1d6b396-d1ee-4abd-81cc-551dbb90a6ad", "metadata": {}, "source": [ "Speed-up by almost a factor of two, because CPU resources on a second core are made available." ] }, { "cell_type": "markdown", "id": "e3122e68-c76f-47ed-b763-fe9bf7d50872", "metadata": {}, "source": [ "**Remarks:** \n", "\n", " - *multithreading* is good if enough CPU is available\n", " - switching betweeen the different tasks, called a *thread* in this case,\n", " is handled by the *Python* interpreter running as a single process on\n", " one CPU core\n", " - threads can use CPU while other threads are waiting\n", " - variales in name space of calling process are available in all threads \n", " \n", " - *mutiprocessing* allows to use CPU resources from all available cores\n", " - task switching is done by the operating system by creating sub-processes\n", " - processes each use one core if available; if all cores are used,\n", " CPU allocation is handled by the task scheduler of the operting system\n", " - python environment and variable name spaces are cloned upon start of a\n", " process, but can not be updated dynamically \n", " - messageging methods (Queue, Pipe) must be used to transfer date betwenn\n", " processes\n", " - shared memory areas may also be used to provide access to common memory\n", " for all processes\n", " - initializing a Process is more resource-demanding than creating a Thread\n", "\n", " - Some resources, like hardware devices or shared memory, require exclusive access by only one thread or process. This is achieved by a *Lock* method provided by both the *multiprocessing* and *multithreading* packages." ] }, { "cell_type": "markdown", "id": "b5130faf-1854-48ec-86e1-decf4461e592", "metadata": {}, "source": [ "### Example for *Lock*\n", "\n", "A very illustrative example for exclusive locking of a resource is printing. \n", "To avoid mixed output from diffetent threads or processes, a process must acquire a lock before acessing the resource. This lock is only granted if no other thread or process is holding the lock. When done, the lock must be released." ] }, { "cell_type": "code", "execution_count": null, "id": "d344a19f-9d47-4b8a-a8e4-008a22008556", "metadata": {}, "outputs": [], "source": [ "from multiprocessing import Lock\n", "lock = Lock\n", "\n", "def threadsafe_print(text):\n", " lock.acquire()\n", " print(text)\n", " lock.release()" ] }, { "cell_type": "markdown", "id": "a4503766-636f-4bb8-addf-fe8f9148a28b", "metadata": {}, "source": [ "**Warning**: Do not try your own impementation of a locking mechanism !\n", "Locking relies on a dedicated instruction at machine language level (\"*test_and_set*\"), which returns the old value of a memory location and \n", "sets it to true. This so-called \"atomic\" instruction can no circumstances\n", "be interupted by the task scheduler of the operating system and is therefore\n", "\"thread-safe\". Thread-save methods for acces to the above metioned Queues, Pipes and shared-memory areas exist and must be used in applications to ensure thread-safeness of you programs. If in doubt, use the *Lock* method described above, but note that this may cost efficiency. \n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "2126b410-ac2e-4bda-a5a5-2c0888dfe75f", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 5 }