{ "cells": [ { "cell_type": "markdown", "id": "8efd9368", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "05e4f11e64c7e938fd33e7c7164b04ed", "grade": false, "grade_id": "cell-e12df818732a12f6", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "# Exercise Sheet No. 7\n", "\n", "---\n", "\n", "> Machine Learning for Natural Sciences, Summer 2024, Jun.-Prof. Pascal Friederich\n", ">\n", "> Tutor: navid.haghmoradi@kit.edu\n", ">\n", "> **Please ask questions in the forum/discussion board and for the grading issue contact the Tutor **\n", "------\n", "**Topic**: This exercise sheet will use neural networks for a molecular dynamics simulation" ] }, { "cell_type": "markdown", "id": "36550b61", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "c17023961d609cdb5b4f0c98185a7041", "grade": false, "grade_id": "cell-b2499662f79185ba", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Please add here your group members' names and student IDs. \n", "\n", "You are encouraged to work in groups of a maximum of 3 people, however **each of you** has to submit a solution.\n", "\n", "Names: \n", "\n", "IDs:" ] }, { "cell_type": "markdown", "id": "258c1db5", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "40556eb5d04c55fd84d4eb60fa25640e", "grade": false, "grade_id": "cell-74e50a2e55c8739f", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "# Molecular dynamics simulation\n", "\n", "\"Molecular dynamics (MD) is a computer simulation method for analyzing the physical movements of atoms and molecules [that are interacting to each other and creating forces among themselves]. (...) In the most common version, the trajectories (movements) of atoms and molecules are determined by numerically solving Newton's equations of motion for a system of interacting particles, where forces between the particles and their potential energies are often calculated using interatomic potentials or molecular mechanics force fields.\" [wikipedia](https://en.wikipedia.org/wiki/Molecular_dynamics)\n", "\n", "In this exercise we will perform a MD simulation of a single very simple molecule, namely methanol (CH3OH). The propagation of the atomic positions in time is usually treated classically and is given by Newton's Equation for an ensemble of particles:\n", "\n", "$$ F(X) = - \\nabla U(X) = M \\dot{V}(t)$$\n", "\n", "$$ V(t) = \\dot{X}(t) $$\n", "\n", "A molecular dynamics simulation therfore requires the definition of a potential function $U(X)$, or a description of the force terms $F(X)$ by which the particles in the simulation will interact. To correctly capture the molecular interactions, different methods are used for MD simulations (different methods of calculating the energy of atoms and then calculating the forces between them), depending on system. The most common are:\n", "\n", "* (Classical) force fields are empirical energy functions and consist of a summation of bonded interactions (associated with chemical bonds, bond angles, and dihedral angles), and non-bonded interactions (associated with van der Waals interaction, Pauli repulsion and electrostatic interaction)\n", "\n", "* Pair potentials between particles, in which the total potential energy can be calculated from the sum of energy contributions between pairs of atoms. An example of such a pair potential is the non-bonded Lennard–Jones potential.\n", "\n", "* Semi-empirical potentials are based on quantum mechanical methods, but use empirical parameterizations to estimate the energy contributions of orbitals, e.g. tight-binding potentials.\n", "\n", "* Ab Initio Molecular Dynamics (AIMD) simulations use quantum mechanical methods to compute energies and forces, which is more accurate than classical force fields and can describe chemical processes (e.g. bond breaking or formation) but quantum mechanical methods are much more expensive and thus limited in system size and time scale.\n", "\n", "* QM/MM methods are hybrid methods between quantum mechanical (QM) and molecular or classical mechanics (MM), where only a small (and important) part is modeled using QM methods.\n", "\n", "The goal of this exercise is to derive the correct potential energy by a neural network, which is fast in prediction and, if trained on a dataset from quantum mechanical (QM) calculations, also ideally as precise a AIMD/QM methods. Only with QM calculations effects such as bond breaking and reactions can be captured in a MD simulation. To work for arbitrary molecules and inter-molecular interactions the neural network potentials have to be convolutional (e.g. deep convolutional filter or graph networks) or atom-centered but which goes beyond the scope of this exercise. " ] }, { "cell_type": "markdown", "id": "6468d74b", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "54a115dfccbd0c1cfa8c1d0a65498967", "grade": false, "grade_id": "cell-6813759251cd9611", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Why do we want to use Neural networks for MD simulations?\n", "\n", "1. Because Neural networks are always cool.\n", "2. Neural networks are fast in prediction and in principle differentiable and could have QM accuracy if trained on QM data.\n", "3. Force Fields do not work in MD simulation" ] }, { "cell_type": "code", "execution_count": 1, "id": "29c6b706", "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "9cc25ef088810c820defab2965cc86fb", "grade": false, "grade_id": "cell-7f0687c72ba0fb72", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "answer_md = 0 # please pick your answer\n", "\n", "answer_md = 1" ] }, { "cell_type": "code", "execution_count": 2, "id": "b0fdb40b", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "e3e7a1c3a046979786a7613276282f58", "grade": true, "grade_id": "answer_md", "locked": true, "points": 1, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# ID: answer_md - possible points: 1\n", "\n", "# 1 Point\n", "assert answer_md != 0\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "id": "e6910632", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "c2b5987720b3fd223cbcf20b61441b67", "grade": false, "grade_id": "cell-ef4a78c9d794f007", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "We start with loading and inspecting different geometries of methanol that has been sampled via distorting the molecule (the random distortiaon of the molecule gives many differnt molecules with diffetn shapes, in which the position of the atoms of the moleules is given in text file). A very common format are `.xyz` files. The format of a single xyz-file is:\n", "1. 1st Line: Number of atoms (data format: integer)\n", "2. 2nd Line: comment (data format: string)\n", "3. 3rd Line and following: Elements and x- y- z-coordinates (data format: string and float, respectively)" ] }, { "cell_type": "code", "execution_count": 3, "id": "d984d1f7", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "6bc2c227ec4cc044912ad0a460de8f3f", "grade": false, "grade_id": "cell-8006b2cfebecdb34", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1;31merror\u001b[0m: \u001b[1mexternally-managed-environment\u001b[0m\n", "\n", "\u001b[31m×\u001b[0m This environment is externally managed\n", "\u001b[31m╰─>\u001b[0m To install Python packages system-wide, try apt install\n", "\u001b[31m \u001b[0m python3-xyz, where xyz is the package you are trying to\n", "\u001b[31m \u001b[0m install.\n", "\u001b[31m \u001b[0m \n", "\u001b[31m \u001b[0m If you wish to install a non-Debian-packaged Python package,\n", "\u001b[31m \u001b[0m create a virtual environment using python3 -m venv path/to/venv.\n", "\u001b[31m \u001b[0m Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make\n", "\u001b[31m \u001b[0m sure you have python3-full installed.\n", "\u001b[31m \u001b[0m \n", "\u001b[31m \u001b[0m If you wish to install a non-Debian packaged Python application,\n", "\u001b[31m \u001b[0m it may be easiest to use pipx install xyz, which will manage a\n", "\u001b[31m \u001b[0m virtual environment for you. Make sure you have pipx installed.\n", "\u001b[31m \u001b[0m \n", "\u001b[31m \u001b[0m See /usr/share/doc/python3.11/README.venv for more information.\n", "\n", "\u001b[1;35mnote\u001b[0m: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.\n", "\u001b[1;36mhint\u001b[0m: See PEP 668 for the detailed specification.\n" ] } ], "source": [ "##### DO NOT CHANGE #####\n", "# importing and installing the necessary python libraries\n", "!pip install py3Dmol\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import py3Dmol\n", "\n", "molecule_name = \"methanol\"\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "code", "execution_count": 4, "id": "d6936f08", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "dc54485f58b23d987fecd1698d8d2a45", "grade": false, "grade_id": "cell-3ba2a65d3dc3782d", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# Defining a function which reads the xyz file and put the data in the value list\n", "# This function will be needed to read the files we have, regarding the atomic positions, energy values etc.\n", "def load_csv(filepath, delimeter=\" \"):\n", " values = []\n", " # open file\n", " infile = open(filepath,\"r\")\n", " lines = infile.readlines()\n", " # read separate entries\n", " for line in lines:\n", " line_list = line.strip().split(delimeter)\n", " line_list = [x.strip() for x in line_list if x != '']\n", " values.append(line_list)\n", " # close file\n", " infile.close()\n", " return values\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "code", "execution_count": 5, "id": "660d8b06", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "c2600f9be036dcd6d14dd111aeb070a8", "grade": false, "grade_id": "cell-ecd04da13e2a0805", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "text/plain": [ "[['6'],\n", " [],\n", " ['C', '-0.37100', '0.00770', '-0.00860'],\n", " ['O', '0.91670', '-0.50480', '-0.29720'],\n", " ['H', '-0.52000', '0.03280', '1.07370'],\n", " ['H', '-0.46500', '1.01430', '-0.42340'],\n", " ['H', '-1.12230', '-0.64270', '-0.46260'],\n", " ['H', '1.56170', '0.09260', '0.11810'],\n", " ['6'],\n", " [],\n", " ['C', '0.00000', '0.00000', '0.00000'],\n", " ['O', '1.46290', '0.00000', '0.00000'],\n", " ['H', '-0.20120', '0.00000', '-1.06120'],\n", " ['H', '-0.39850', '-0.92730', '0.18950'],\n", " ['H', '-0.47420', '0.67650', '0.67230'],\n", " ['H', '1.65910', '-0.71020', '-0.59430'],\n", " ['6'],\n", " [],\n", " ['C', '0.00000', '0.00000', '0.00000'],\n", " ['O', '1.60470', '0.00000', '0.00000'],\n", " ['H', '-0.01090', '0.00000', '-1.20830'],\n", " ['H', '-0.29620', '-0.95610', '0.28810'],\n", " ['H', '-0.54470', '0.58050', '0.58660'],\n", " ['H', '1.61590', '-0.80740', '-0.31430'],\n", " ['6'],\n", " [],\n", " ['C', '0.00000', '0.00000', '0.00000'],\n", " ['O', '1.32260', '0.00000', '0.00000'],\n", " ['H', '-0.11950', '0.00000', '-1.20520'],\n", " ['H', '-0.23320', '-0.71730', '0.91280']]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "##### DO NOT CHANGE #####\n", "# Here we show the first 30 elements (list elelemts, not the atoms) of the \"values\" list\n", "# The 1st element is the number of the atoms in the first molecule, the 2nd one is a an empty line, \n", "# the 3rd element is a list of:'C' and its coordinates...\n", "lines = load_csv(molecule_name+\"_conformers.xyz\")\n", "lines[:30]\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "id": "26bd5d72", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "06dd94bf35afc1c3b6c719be594f62d5", "grade": false, "grade_id": "cell-7a5780a6f8aa55b7", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "Now your task is to split separated lines into a nested list that runs over the molecules and looks like the example below. The coordinates shoud be given as floats and the comment line (empty line) removed. The number of atoms tells you how many lines to add to the list. Note: your function should also work for arbitrary number of molecules and with different number of atoms in the same xyz-file.\n", "```\n", "[[['C', -0.371, 0.0077, -0.0086],\n", " ['O', 0.9167, -0.5048, -0.2972],\n", " ['H', -0.52, 0.0328, 1.0737],\n", " ['H', -0.465, 1.0143, -0.4234],\n", " ['H', -1.1223, -0.6427, -0.4626],\n", " ['H', 1.5617, 0.0926, 0.1181]],\n", " [['C', 0.0, 0.0, 0.0],\n", " ['O', 1.4629, 0.0, 0.0],\n", " ['H', -0.2012, 0.0, -1.0612],\n", " ['H', -0.3985, -0.9273, 0.1895],\n", " ['H', -0.4742, 0.6765, 0.6723],\n", " ['H', 1.6591, -0.7102, -0.5943]], ...]\n", "```" ] }, { "cell_type": "code", "execution_count": 6, "id": "56ab749c", "metadata": { "deletable": false, "nbgrader": { "cell_type": "code", "checksum": "f549ea5dddb71078f54f878b8ded61c7", "grade": false, "grade_id": "cell-b3bae9d621aae81c", "locked": false, "schema_version": 3, "solution": true, "task": false } }, "outputs": [], "source": [ "def lines_to_xyz(values):\n", " convert_list = []\n", " curr = []\n", " for value in values:\n", " if len(value) == 4:\n", " a,b,c,d = value\n", " curr.append([a,float(b),float(c), float(d)])\n", " elif len(curr) > 0:\n", " assert len(value) <= 1\n", " convert_list.append(curr)\n", " curr = []\n", " convert_list.append(curr)\n", " return convert_list" ] }, { "cell_type": "code", "execution_count": 7, "id": "ea15d3c5", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "3e43430946b5a99475a84c56e0e042cb", "grade": false, "grade_id": "cell-3ff56d53f55277f9", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "mols = lines_to_xyz(lines)\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "code", "execution_count": 8, "id": "6d0ddeba", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "fcead66e6707a636dda728fe9ca1fdc3", "grade": true, "grade_id": "mols", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# ID: mols - possible points: 2\n", "\n", "# 2 Points\n", "assert np.sum(np.abs(np.array(mols[1][2][1:]) - np.array([-0.2012, 0.0, -1.0612]))) < 0.001\n", "assert mols[0][1][0] == 'O' and mols[0][3][0] == 'H'\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "code", "execution_count": 9, "id": "974f0b55", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "a48d594612929dcfd3e4602304eb7cf3", "grade": true, "grade_id": "lines_to_mols", "locked": true, "points": 2, "schema_version": 3, "solution": false, "task": false } }, "outputs": [], "source": [ "##### DO NOT CHANGE #####\n", "# ID: lines_to_mols - possible points: 2\n", "\n", "# 2 Points\n", "assert lines_to_xyz([['1'],['my Comment'],['C','0.0','0.0','0.0'],\n", " ['2'],['my Comment'],['C','0.0','0.0','0.0'],['O','1.0','1.0','1.0']])[1][1][0] == 'O'\n", "assert lines_to_xyz([['1'],['my Comment'],['C','0.0','0.0','0.0'],\n", " ['2'],['my Comment'],['C','0.0','0.0','0.0'],['O','1.0','1.0','1.0']])[1][1][1] - 1.0 < 1e-5\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "markdown", "id": "c117fdbd", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "markdown", "checksum": "a3caf42777cd66f03163bf199255db88", "grade": false, "grade_id": "cell-797d83f0e110d963", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "source": [ "# 1. Learn Energies and Gradients of the molecule\n", "\n", "## 1.1 Load Data\n", "\n", "So that you can continue without solving previous exercise, load the numpy arrays below." ] }, { "cell_type": "code", "execution_count": 10, "id": "b56ca4fa", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "e5bbee9d6f2024f1ace18a4f4e4436e6", "grade": false, "grade_id": "cell-9b6fb98d2845c18d", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Geometries: (6001, 6, 3)\n", "Energies: (6001,)\n", "Gradients: (6001, 6, 3)\n", "Elements: ['C', 'O', 'H', 'H', 'H', 'H']\n" ] } ], "source": [ "##### DO NOT CHANGE #####\n", "# Already prepared. In case that you have not got the necessary data structure for the rest of the notebook\n", "import numpy as np\n", "geos = np.load(molecule_name+\"_coordinates.npy\") # in A\n", "energies = np.load(molecule_name+\"_energy.npy\") # in eV\n", "grads = np.load(molecule_name+\"_gradients.npy\")*27.21138624598853/0.52917721090380 # from H/B to eV/A\n", "\n", "\n", "elements = []\n", "number_of_elements = geos.shape[1]\n", "for lineidx, line in enumerate(open(\"methanol_conformers.xyz\", \"r\")):\n", " if lineidx>=2 and lineidx < number_of_elements+2:\n", " elements.append(line.split()[0])\n", "# look at the shape of the loaded objects\n", "print(\"Geometries: \", geos.shape)\n", "print(\"Energies: \", energies.shape)\n", "print(\"Gradients: \", grads.shape)\n", "print(\"Elements: \", elements)\n", "\n", "##### DO NOT CHANGE #####" ] }, { "cell_type": "code", "execution_count": 11, "id": "8bcf7413", "metadata": { "deletable": false, "editable": false, "nbgrader": { "cell_type": "code", "checksum": "604aa5c3dbf023318a48c7b13d240ce4", "grade": false, "grade_id": "cell-247e9966aa04d3e2", "locked": true, "schema_version": 3, "solution": false, "task": false } }, "outputs": [ { "data": { "application/3dmoljs_load.v0": "
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
3Dmol.js failed to load for some reason. Please check your browser console for error messages.