{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "# Processing data with Pandas\n",
    "\n",
    "```{attention}\n",
    "Finnish university students are encouraged to use the CSC Notebooks platform.<br/>\n",
    "<a href=\"https://notebooks.csc.fi/#/blueprint/7e62ac3bddf74483b7ac7333721630e2\"><img alt=\"CSC badge\" src=\"https://img.shields.io/badge/launch-CSC%20notebook-blue.svg\" style=\"vertical-align:text-bottom\"></a>\n",
    "\n",
    "Others can follow the lesson and fill in their student notebooks using Binder.<br/>\n",
    "<a href=\"https://mybinder.org/v2/gh/geo-python/notebooks/master?urlpath=lab/tree/L4/functions.ipynb\"><img alt=\"Binder badge\" src=\"https://img.shields.io/badge/launch-binder-red.svg\" style=\"vertical-align:text-bottom\"></a>\n",
    "```\n",
    "\n",
    "During the first part of this lesson you learned the basics of pandas data structures (*Series* and *DataFrame*) and got familiar with basic methods loading and exploring data.\n",
    "Here, we will continue with basic data manipulation and analysis methods such calculations and selections.\n",
    "\n",
    "We are now working in a new notebook file and we need to import pandas again. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's work with the same input data `'Kumpula-June-2016-w-metadata.txt'` and load it using the `pd.read_csv()` method. Remember, that the first 8 lines contain metadata so we can skip those. This time, let's store the filepath into a separate variable in order to make the code more readable and easier to change afterwards: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# Define file path:\n",
    "fp = 'Kumpula-June-2016-w-metadata.txt'\n",
    "\n",
    "# Read in the data from the file (starting at row 9):\n",
    "data = pd.read_csv(fp, skiprows=8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remember to always check the data after reading it in:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>YEARMODA</th>\n",
       "      <th>TEMP</th>\n",
       "      <th>MAX</th>\n",
       "      <th>MIN</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>20160601</td>\n",
       "      <td>65.5</td>\n",
       "      <td>73.6</td>\n",
       "      <td>54.7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>20160602</td>\n",
       "      <td>65.8</td>\n",
       "      <td>80.8</td>\n",
       "      <td>55.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>20160603</td>\n",
       "      <td>68.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>55.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>20160604</td>\n",
       "      <td>57.5</td>\n",
       "      <td>70.9</td>\n",
       "      <td>47.3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>20160605</td>\n",
       "      <td>51.4</td>\n",
       "      <td>58.3</td>\n",
       "      <td>43.2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   YEARMODA  TEMP   MAX   MIN\n",
       "0  20160601  65.5  73.6  54.7\n",
       "1  20160602  65.8  80.8  55.0\n",
       "2  20160603  68.4   NaN  55.6\n",
       "3  20160604  57.5  70.9  47.3\n",
       "4  20160605  51.4  58.3  43.2"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "````{admonition} Filepaths\n",
    "Note, that our input file `'Kumpula-June-2016-w-metadata.txt'` is located **in the same folder** as the notebook we are running. Furthermore, the same folder is the working directory for our Python session (you can check this by running the `pwd()`command).\n",
    "For these two reasons, we are able to pass only the filename to `.read_csv()` function and pandas is able to find the file and read it in. In fact, we are using a **relative filepath** when reading in the file.\n",
    "    \n",
    "The **absolute filepath** to the input data file in the CSC cloud computing environment is `/home/jovyan/work/notebooks/L5/Kumpula-June-2016-w-metadata.txt`, and we could also use this as input when reading in the file. When working with absolute filepaths, it's good practice to pass the file paths as a [raw string](https://docs.python.org/3/reference/lexical_analysis.html#literals) using the prefix `r` in order to avoid problems with escape characters such as `\"\\n\"`.\n",
    "\n",
    "```     \n",
    "# Define file path as a raw string:\n",
    "fp = r'/home/jovyan/work/notebooks/L5/Kumpula-June-2016-w-metadata.txt'\n",
    "\n",
    "# Read in the data from the file (starting at row 9):\n",
    "data = pd.read_csv(fp, skiprows=8)\n",
    "``` \n",
    "````"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Basic calculations\n",
    "\n",
    "One of the most common things to do in pandas is to create new columns based on calculations between different variables (columns).\n",
    "\n",
    "We can create a new column `DIFF` in our DataFrame by specifying the name of the column and giving it some default value (in this case the decimal number `0.0`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>YEARMODA</th>\n",
       "      <th>TEMP</th>\n",
       "      <th>MAX</th>\n",
       "      <th>MIN</th>\n",
       "      <th>DIFF</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>20160601</td>\n",
       "      <td>65.5</td>\n",
       "      <td>73.6</td>\n",
       "      <td>54.7</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>20160602</td>\n",
       "      <td>65.8</td>\n",
       "      <td>80.8</td>\n",
       "      <td>55.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>20160603</td>\n",
       "      <td>68.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>55.6</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>20160604</td>\n",
       "      <td>57.5</td>\n",
       "      <td>70.9</td>\n",
       "      <td>47.3</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>20160605</td>\n",
       "      <td>51.4</td>\n",
       "      <td>58.3</td>\n",
       "      <td>43.2</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>20160606</td>\n",
       "      <td>52.2</td>\n",
       "      <td>59.7</td>\n",
       "      <td>42.8</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>20160607</td>\n",
       "      <td>56.9</td>\n",
       "      <td>65.1</td>\n",
       "      <td>45.9</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>20160608</td>\n",
       "      <td>54.2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>47.5</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>20160609</td>\n",
       "      <td>49.4</td>\n",
       "      <td>54.1</td>\n",
       "      <td>45.7</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>20160610</td>\n",
       "      <td>49.5</td>\n",
       "      <td>55.9</td>\n",
       "      <td>43.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>20160611</td>\n",
       "      <td>54.0</td>\n",
       "      <td>62.1</td>\n",
       "      <td>41.7</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>11</td>\n",
       "      <td>20160612</td>\n",
       "      <td>55.4</td>\n",
       "      <td>64.2</td>\n",
       "      <td>46.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>12</td>\n",
       "      <td>20160613</td>\n",
       "      <td>58.3</td>\n",
       "      <td>68.2</td>\n",
       "      <td>47.3</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>13</td>\n",
       "      <td>20160614</td>\n",
       "      <td>59.7</td>\n",
       "      <td>67.8</td>\n",
       "      <td>47.8</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>14</td>\n",
       "      <td>20160615</td>\n",
       "      <td>63.4</td>\n",
       "      <td>70.3</td>\n",
       "      <td>49.3</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>15</td>\n",
       "      <td>20160616</td>\n",
       "      <td>57.8</td>\n",
       "      <td>67.5</td>\n",
       "      <td>55.6</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>16</td>\n",
       "      <td>20160617</td>\n",
       "      <td>60.4</td>\n",
       "      <td>70.7</td>\n",
       "      <td>55.9</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>17</td>\n",
       "      <td>20160618</td>\n",
       "      <td>57.3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>54.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>18</td>\n",
       "      <td>20160619</td>\n",
       "      <td>56.3</td>\n",
       "      <td>59.2</td>\n",
       "      <td>54.1</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>19</td>\n",
       "      <td>20160620</td>\n",
       "      <td>59.3</td>\n",
       "      <td>69.1</td>\n",
       "      <td>52.2</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>20</td>\n",
       "      <td>20160621</td>\n",
       "      <td>62.6</td>\n",
       "      <td>71.4</td>\n",
       "      <td>50.4</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>21</td>\n",
       "      <td>20160622</td>\n",
       "      <td>61.7</td>\n",
       "      <td>70.2</td>\n",
       "      <td>55.4</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>22</td>\n",
       "      <td>20160623</td>\n",
       "      <td>60.9</td>\n",
       "      <td>67.1</td>\n",
       "      <td>54.9</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>23</td>\n",
       "      <td>20160624</td>\n",
       "      <td>61.1</td>\n",
       "      <td>68.9</td>\n",
       "      <td>56.7</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>24</td>\n",
       "      <td>20160625</td>\n",
       "      <td>65.7</td>\n",
       "      <td>75.4</td>\n",
       "      <td>57.9</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>25</td>\n",
       "      <td>20160626</td>\n",
       "      <td>69.6</td>\n",
       "      <td>77.7</td>\n",
       "      <td>60.3</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>26</td>\n",
       "      <td>20160627</td>\n",
       "      <td>60.7</td>\n",
       "      <td>70.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>27</td>\n",
       "      <td>20160628</td>\n",
       "      <td>65.4</td>\n",
       "      <td>73.0</td>\n",
       "      <td>55.8</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>28</td>\n",
       "      <td>20160629</td>\n",
       "      <td>65.8</td>\n",
       "      <td>73.2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>29</td>\n",
       "      <td>20160630</td>\n",
       "      <td>65.7</td>\n",
       "      <td>72.7</td>\n",
       "      <td>59.2</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    YEARMODA  TEMP   MAX   MIN  DIFF\n",
       "0   20160601  65.5  73.6  54.7   0.0\n",
       "1   20160602  65.8  80.8  55.0   0.0\n",
       "2   20160603  68.4   NaN  55.6   0.0\n",
       "3   20160604  57.5  70.9  47.3   0.0\n",
       "4   20160605  51.4  58.3  43.2   0.0\n",
       "5   20160606  52.2  59.7  42.8   0.0\n",
       "6   20160607  56.9  65.1  45.9   0.0\n",
       "7   20160608  54.2   NaN  47.5   0.0\n",
       "8   20160609  49.4  54.1  45.7   0.0\n",
       "9   20160610  49.5  55.9  43.0   0.0\n",
       "10  20160611  54.0  62.1  41.7   0.0\n",
       "11  20160612  55.4  64.2  46.0   0.0\n",
       "12  20160613  58.3  68.2  47.3   0.0\n",
       "13  20160614  59.7  67.8  47.8   0.0\n",
       "14  20160615  63.4  70.3  49.3   0.0\n",
       "15  20160616  57.8  67.5  55.6   0.0\n",
       "16  20160617  60.4  70.7  55.9   0.0\n",
       "17  20160618  57.3   NaN  54.0   0.0\n",
       "18  20160619  56.3  59.2  54.1   0.0\n",
       "19  20160620  59.3  69.1  52.2   0.0\n",
       "20  20160621  62.6  71.4  50.4   0.0\n",
       "21  20160622  61.7  70.2  55.4   0.0\n",
       "22  20160623  60.9  67.1  54.9   0.0\n",
       "23  20160624  61.1  68.9  56.7   0.0\n",
       "24  20160625  65.7  75.4  57.9   0.0\n",
       "25  20160626  69.6  77.7  60.3   0.0\n",
       "26  20160627  60.7  70.0   NaN   0.0\n",
       "27  20160628  65.4  73.0  55.8   0.0\n",
       "28  20160629  65.8  73.2   NaN   0.0\n",
       "29  20160630  65.7  72.7  59.2   0.0"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Define a new column \"DIFF\"\n",
    "data['DIFF'] = 0.0\n",
    "\n",
    "# Check how the dataframe looks like:\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Let's check the datatype of our new column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dtype('float64')"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['DIFF'].dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Okey, so we see that Pandas created a new column and recognized automatically that the data type is float as we passed a 0.0 value to it.\n",
    "\n",
    "Let's update the column `DIFF` by calculating the difference between `MAX` and `MIN` columns to get an idea how much the temperatures have\n",
    "been varying during different days:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>YEARMODA</th>\n",
       "      <th>TEMP</th>\n",
       "      <th>MAX</th>\n",
       "      <th>MIN</th>\n",
       "      <th>DIFF</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>20160601</td>\n",
       "      <td>65.5</td>\n",
       "      <td>73.6</td>\n",
       "      <td>54.7</td>\n",
       "      <td>18.9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>20160602</td>\n",
       "      <td>65.8</td>\n",
       "      <td>80.8</td>\n",
       "      <td>55.0</td>\n",
       "      <td>25.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>20160603</td>\n",
       "      <td>68.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>55.6</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>20160604</td>\n",
       "      <td>57.5</td>\n",
       "      <td>70.9</td>\n",
       "      <td>47.3</td>\n",
       "      <td>23.6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>20160605</td>\n",
       "      <td>51.4</td>\n",
       "      <td>58.3</td>\n",
       "      <td>43.2</td>\n",
       "      <td>15.1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   YEARMODA  TEMP   MAX   MIN  DIFF\n",
       "0  20160601  65.5  73.6  54.7  18.9\n",
       "1  20160602  65.8  80.8  55.0  25.8\n",
       "2  20160603  68.4   NaN  55.6   NaN\n",
       "3  20160604  57.5  70.9  47.3  23.6\n",
       "4  20160605  51.4  58.3  43.2  15.1"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Calculate max min difference\n",
    "data['DIFF'] = data['MAX'] - data['MIN']\n",
    "\n",
    "# Check the result\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "The calculations were stored into the ``DIFF`` column as planned. \n",
    "\n",
    "You can also create new columns on-the-fly at the same time when doing the calculation (the column does not have to exist before). Furthermore, it is possible to use any kind of math\n",
    "algebra (e.g. subtracttion, addition, multiplication, division, exponentiation, etc.) when creating new columns."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "We can for example convert the Fahrenheit temperatures in the `TEMP` column into Celsius using the formula that we have seen already many times. Let's do that and store it in a new column called `TEMP_CELSIUS`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>YEARMODA</th>\n",
       "      <th>TEMP</th>\n",
       "      <th>MAX</th>\n",
       "      <th>MIN</th>\n",
       "      <th>DIFF</th>\n",
       "      <th>TEMP_CELSIUS</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>20160601</td>\n",
       "      <td>65.5</td>\n",
       "      <td>73.6</td>\n",
       "      <td>54.7</td>\n",
       "      <td>18.9</td>\n",
       "      <td>18.611111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>20160602</td>\n",
       "      <td>65.8</td>\n",
       "      <td>80.8</td>\n",
       "      <td>55.0</td>\n",
       "      <td>25.8</td>\n",
       "      <td>18.777778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>20160603</td>\n",
       "      <td>68.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>55.6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20.222222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>20160604</td>\n",
       "      <td>57.5</td>\n",
       "      <td>70.9</td>\n",
       "      <td>47.3</td>\n",
       "      <td>23.6</td>\n",
       "      <td>14.166667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>20160605</td>\n",
       "      <td>51.4</td>\n",
       "      <td>58.3</td>\n",
       "      <td>43.2</td>\n",
       "      <td>15.1</td>\n",
       "      <td>10.777778</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   YEARMODA  TEMP   MAX   MIN  DIFF  TEMP_CELSIUS\n",
       "0  20160601  65.5  73.6  54.7  18.9     18.611111\n",
       "1  20160602  65.8  80.8  55.0  25.8     18.777778\n",
       "2  20160603  68.4   NaN  55.6   NaN     20.222222\n",
       "3  20160604  57.5  70.9  47.3  23.6     14.166667\n",
       "4  20160605  51.4  58.3  43.2  15.1     10.777778"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a new column and convert temp fahrenheit to celsius:\n",
    "data['TEMP_CELSIUS'] = (data['TEMP'] - 32) / (9/5)\n",
    "\n",
    "#Check output\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Check your understanding\n",
    "\n",
    "Calculate the temperatures in Kelvins using the Celsius values **and store the result a new column** calle `TEMP_KELVIN` in our dataframe.\n",
    "    \n",
    "0 Kelvins is is -273.15 degrees Celsius as we learned during [Lesson 4](https://geo-python-site.readthedocs.io/en/latest/notebooks/L4/functions.html#let-s-make-another-function)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "tags": [
     "hide-cell"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>YEARMODA</th>\n",
       "      <th>TEMP</th>\n",
       "      <th>MAX</th>\n",
       "      <th>MIN</th>\n",
       "      <th>DIFF</th>\n",
       "      <th>TEMP_CELSIUS</th>\n",
       "      <th>TEMP_KELVIN</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>20160601</td>\n",
       "      <td>65.5</td>\n",
       "      <td>73.6</td>\n",
       "      <td>54.7</td>\n",
       "      <td>18.9</td>\n",
       "      <td>18.611111</td>\n",
       "      <td>291.761111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>20160602</td>\n",
       "      <td>65.8</td>\n",
       "      <td>80.8</td>\n",
       "      <td>55.0</td>\n",
       "      <td>25.8</td>\n",
       "      <td>18.777778</td>\n",
       "      <td>291.927778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>20160603</td>\n",
       "      <td>68.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>55.6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20.222222</td>\n",
       "      <td>293.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>20160604</td>\n",
       "      <td>57.5</td>\n",
       "      <td>70.9</td>\n",
       "      <td>47.3</td>\n",
       "      <td>23.6</td>\n",
       "      <td>14.166667</td>\n",
       "      <td>287.316667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>20160605</td>\n",
       "      <td>51.4</td>\n",
       "      <td>58.3</td>\n",
       "      <td>43.2</td>\n",
       "      <td>15.1</td>\n",
       "      <td>10.777778</td>\n",
       "      <td>283.927778</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   YEARMODA  TEMP   MAX   MIN  DIFF  TEMP_CELSIUS  TEMP_KELVIN\n",
       "0  20160601  65.5  73.6  54.7  18.9     18.611111   291.761111\n",
       "1  20160602  65.8  80.8  55.0  25.8     18.777778   291.927778\n",
       "2  20160603  68.4   NaN  55.6   NaN     20.222222   293.372222\n",
       "3  20160604  57.5  70.9  47.3  23.6     14.166667   287.316667\n",
       "4  20160605  51.4  58.3  43.2  15.1     10.777778   283.927778"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Solution\n",
    "data['TEMP_KELVIN'] = data['TEMP_CELSIUS'] + 273.15\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Selecting rows and columns\n",
    "\n",
    "We often want to select only specific rows from a DataFrame for further analysis. There are multiple ways of selecting subsets of a pandas DataFrame. In this section we will go through most useful tricks for selecting specific rows, columns and individual values.\n",
    "\n",
    "### Selecting several rows\n",
    "\n",
    "One common way of selecting only specific rows from your DataFrame is done via **index slicing** to extract part of the DataFrame. Slicing in pandas can be done in a similar manner as with normal Python lists, i.e. you specify index range you want to select inside the square brackets: ``dataframe[start_index:stop_index]``.\n",
    "\n",
    "Let's select the first five rows and assign them to a variable called `selection`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>YEARMODA</th>\n",
       "      <th>TEMP</th>\n",
       "      <th>MAX</th>\n",
       "      <th>MIN</th>\n",
       "      <th>DIFF</th>\n",
       "      <th>TEMP_CELSIUS</th>\n",
       "      <th>TEMP_KELVIN</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>20160601</td>\n",
       "      <td>65.5</td>\n",
       "      <td>73.6</td>\n",
       "      <td>54.7</td>\n",
       "      <td>18.9</td>\n",
       "      <td>18.611111</td>\n",
       "      <td>-254.538889</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>20160602</td>\n",
       "      <td>65.8</td>\n",
       "      <td>80.8</td>\n",
       "      <td>55.0</td>\n",
       "      <td>25.8</td>\n",
       "      <td>18.777778</td>\n",
       "      <td>-254.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>20160603</td>\n",
       "      <td>68.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>55.6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20.222222</td>\n",
       "      <td>-252.927778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>20160604</td>\n",
       "      <td>57.5</td>\n",
       "      <td>70.9</td>\n",
       "      <td>47.3</td>\n",
       "      <td>23.6</td>\n",
       "      <td>14.166667</td>\n",
       "      <td>-258.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>20160605</td>\n",
       "      <td>51.4</td>\n",
       "      <td>58.3</td>\n",
       "      <td>43.2</td>\n",
       "      <td>15.1</td>\n",
       "      <td>10.777778</td>\n",
       "      <td>-262.372222</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   YEARMODA  TEMP   MAX   MIN  DIFF  TEMP_CELSIUS  TEMP_KELVIN\n",
       "0  20160601  65.5  73.6  54.7  18.9     18.611111  -254.538889\n",
       "1  20160602  65.8  80.8  55.0  25.8     18.777778  -254.372222\n",
       "2  20160603  68.4   NaN  55.6   NaN     20.222222  -252.927778\n",
       "3  20160604  57.5  70.9  47.3  23.6     14.166667  -258.983333\n",
       "4  20160605  51.4  58.3  43.2  15.1     10.777778  -262.372222"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Select first five rows of dataframe using row index values\n",
    "selection = data[0:5]\n",
    "selection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "```{note}\n",
    "Here we have selected the first five rows (index 0-4) using the integer index.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "### Selecting several rows and columns\n",
    "\n",
    "It is also possible to control which columns are chosen when selecting a subset of rows. In this case we will use [pandas.DataFrame.loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) which selects data based on axis labels (row labels and column labels). \n",
    "\n",
    "Let's select temperature values (column `TEMP`) from rows 0-5:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    65.5\n",
       "1    65.8\n",
       "2    68.4\n",
       "3    57.5\n",
       "4    51.4\n",
       "5    52.2\n",
       "Name: TEMP, dtype: float64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Select temp column values on rows 0-5\n",
    "selection = data.loc[0:5, 'TEMP']\n",
    "selection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{note}\n",
    "In this case, we get six rows of data (index 0-5)! We are now doing the selection based on axis labels instead of the integer index.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "It is also possible to select multiple columns when using `loc`. Here, we select the `TEMP` and `TEMP_CELSIUS` columns from a set of rows by passing them inside a list (`.loc[start_index:stop_index, list_of_columns]`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>TEMP</th>\n",
       "      <th>TEMP_CELSIUS</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>65.5</td>\n",
       "      <td>18.611111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>65.8</td>\n",
       "      <td>18.777778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>68.4</td>\n",
       "      <td>20.222222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>57.5</td>\n",
       "      <td>14.166667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>51.4</td>\n",
       "      <td>10.777778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>52.2</td>\n",
       "      <td>11.222222</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   TEMP  TEMP_CELSIUS\n",
       "0  65.5     18.611111\n",
       "1  65.8     18.777778\n",
       "2  68.4     20.222222\n",
       "3  57.5     14.166667\n",
       "4  51.4     10.777778\n",
       "5  52.2     11.222222"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Select columns temp and temp_celsius on rows 0-5\n",
    "selection = data.loc[0:5, ['TEMP', 'TEMP_CELSIUS']]\n",
    "selection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Check your understanding\n",
    "\n",
    "Find the mean temperatures (in Celsius) for the last seven days of June. Do the selection using the row index values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "tags": [
     "hide-cell"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "18.253968253968257"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Here is the solution\n",
    "data.loc[23:29, 'TEMP_CELSIUS'].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting a single row\n",
    "\n",
    "You can also select an individual row from specific position using the `.loc[]` indexing. Here we select all the data values using index 4 (the 5th row):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "YEARMODA        2.016060e+07\n",
       "TEMP            5.140000e+01\n",
       "MAX             5.830000e+01\n",
       "MIN             4.320000e+01\n",
       "DIFF            1.510000e+01\n",
       "TEMP_CELSIUS    1.077778e+01\n",
       "TEMP_KELVIN    -2.623722e+02\n",
       "Name: 4, dtype: float64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Select one row using index\n",
    "row = data.loc[4]\n",
    "row"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "``.loc[]`` indexing returns the values from that position as a ``pd.Series`` where the indices are actually the column names of those variables. Hence, you can access the value of an individual column by referring to its index using following format (both should work):\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "51.4"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#Print one attribute from the selected row\n",
    "row['TEMP']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting a single value based on row and column\n",
    "\n",
    "Sometimes it is enough to access a single value in a DataFrame. In this case, we can use [DataFrame.at](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.at.html#pandas-dataframe-at) instead of `Data.Frame.loc`.\n",
    "\n",
    "Let's select the temperature (column `TEMP`) on the first row (index `0`) of our DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "65.5"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "selection.at[0, \"TEMP\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### EXTRA: Selections by integer position\n",
    "\n",
    "```{admonition} .iloc\n",
    "`.loc` and `.at` are based on the *axis labels* - the names of columns and rows. Axis labels can be also something else than \"traditional\" index values. For example, datetime is commonly used as the row index.\n",
    "`.iloc` is another indexing operator which is based on *integer value* indices. Using `.iloc`, it is possible to refer also to the columns based on their index value. For example,  `data.iloc[0,0]` would return `20160601` in our example data frame.\n",
    "    \n",
    "See the pandas documentation for more information about [indexing and selecting data](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-and-selecting-data).\n",
    "```\n",
    "\n",
    "For example, we could select select `TEMP` and the `TEMP_CELSIUS` columns from a set of rows based on their index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>YEARMODA</th>\n",
       "      <th>TEMP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>20160601</td>\n",
       "      <td>65.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>20160602</td>\n",
       "      <td>65.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>20160603</td>\n",
       "      <td>68.4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>20160604</td>\n",
       "      <td>57.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>20160605</td>\n",
       "      <td>51.4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   YEARMODA  TEMP\n",
       "0  20160601  65.5\n",
       "1  20160602  65.8\n",
       "2  20160603  68.4\n",
       "3  20160604  57.5\n",
       "4  20160605  51.4"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.iloc[0:5:,0:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To access the value on the first row and second column (`TEMP`), the syntax for `iloc` would be:\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "65.5"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.iloc[0,1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also access individual rows using `iloc`. Let's check out the last row of data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "YEARMODA        2.016063e+07\n",
       "TEMP            6.570000e+01\n",
       "MAX             7.270000e+01\n",
       "MIN             5.920000e+01\n",
       "DIFF            1.350000e+01\n",
       "TEMP_CELSIUS    1.872222e+01\n",
       "TEMP_KELVIN    -2.544278e+02\n",
       "Name: 29, dtype: float64"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.iloc[-1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Filtering and updating data\n",
    "\n",
    "One really useful feature in pandas is the ability to easily filter and select rows based on a conditional statement.\n",
    "The following example shows how to select rows when the Celsius temperature has been higher than 15 degrees into variable `warm_temps` (warm temperatures). Pandas checks if the condition is `True` or `False` for each row, and returns those rows where the condition is `True`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      True\n",
       "1      True\n",
       "2      True\n",
       "3     False\n",
       "4     False\n",
       "5     False\n",
       "6     False\n",
       "7     False\n",
       "8     False\n",
       "9     False\n",
       "10    False\n",
       "11    False\n",
       "12    False\n",
       "13     True\n",
       "14     True\n",
       "15    False\n",
       "16     True\n",
       "17    False\n",
       "18    False\n",
       "19     True\n",
       "20     True\n",
       "21     True\n",
       "22     True\n",
       "23     True\n",
       "24     True\n",
       "25     True\n",
       "26     True\n",
       "27     True\n",
       "28     True\n",
       "29     True\n",
       "Name: TEMP_CELSIUS, dtype: bool"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Check the condition\n",
    "data['TEMP_CELSIUS'] > 15"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>YEARMODA</th>\n",
       "      <th>TEMP</th>\n",
       "      <th>MAX</th>\n",
       "      <th>MIN</th>\n",
       "      <th>DIFF</th>\n",
       "      <th>TEMP_CELSIUS</th>\n",
       "      <th>TEMP_KELVIN</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>20160601</td>\n",
       "      <td>65.5</td>\n",
       "      <td>73.6</td>\n",
       "      <td>54.7</td>\n",
       "      <td>18.9</td>\n",
       "      <td>18.611111</td>\n",
       "      <td>-254.538889</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>20160602</td>\n",
       "      <td>65.8</td>\n",
       "      <td>80.8</td>\n",
       "      <td>55.0</td>\n",
       "      <td>25.8</td>\n",
       "      <td>18.777778</td>\n",
       "      <td>-254.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>20160603</td>\n",
       "      <td>68.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>55.6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20.222222</td>\n",
       "      <td>-252.927778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>13</td>\n",
       "      <td>20160614</td>\n",
       "      <td>59.7</td>\n",
       "      <td>67.8</td>\n",
       "      <td>47.8</td>\n",
       "      <td>20.0</td>\n",
       "      <td>15.388889</td>\n",
       "      <td>-257.761111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>14</td>\n",
       "      <td>20160615</td>\n",
       "      <td>63.4</td>\n",
       "      <td>70.3</td>\n",
       "      <td>49.3</td>\n",
       "      <td>21.0</td>\n",
       "      <td>17.444444</td>\n",
       "      <td>-255.705556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>16</td>\n",
       "      <td>20160617</td>\n",
       "      <td>60.4</td>\n",
       "      <td>70.7</td>\n",
       "      <td>55.9</td>\n",
       "      <td>14.8</td>\n",
       "      <td>15.777778</td>\n",
       "      <td>-257.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>19</td>\n",
       "      <td>20160620</td>\n",
       "      <td>59.3</td>\n",
       "      <td>69.1</td>\n",
       "      <td>52.2</td>\n",
       "      <td>16.9</td>\n",
       "      <td>15.166667</td>\n",
       "      <td>-257.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>20</td>\n",
       "      <td>20160621</td>\n",
       "      <td>62.6</td>\n",
       "      <td>71.4</td>\n",
       "      <td>50.4</td>\n",
       "      <td>21.0</td>\n",
       "      <td>17.000000</td>\n",
       "      <td>-256.150000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>21</td>\n",
       "      <td>20160622</td>\n",
       "      <td>61.7</td>\n",
       "      <td>70.2</td>\n",
       "      <td>55.4</td>\n",
       "      <td>14.8</td>\n",
       "      <td>16.500000</td>\n",
       "      <td>-256.650000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>22</td>\n",
       "      <td>20160623</td>\n",
       "      <td>60.9</td>\n",
       "      <td>67.1</td>\n",
       "      <td>54.9</td>\n",
       "      <td>12.2</td>\n",
       "      <td>16.055556</td>\n",
       "      <td>-257.094444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>23</td>\n",
       "      <td>20160624</td>\n",
       "      <td>61.1</td>\n",
       "      <td>68.9</td>\n",
       "      <td>56.7</td>\n",
       "      <td>12.2</td>\n",
       "      <td>16.166667</td>\n",
       "      <td>-256.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>24</td>\n",
       "      <td>20160625</td>\n",
       "      <td>65.7</td>\n",
       "      <td>75.4</td>\n",
       "      <td>57.9</td>\n",
       "      <td>17.5</td>\n",
       "      <td>18.722222</td>\n",
       "      <td>-254.427778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>25</td>\n",
       "      <td>20160626</td>\n",
       "      <td>69.6</td>\n",
       "      <td>77.7</td>\n",
       "      <td>60.3</td>\n",
       "      <td>17.4</td>\n",
       "      <td>20.888889</td>\n",
       "      <td>-252.261111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>26</td>\n",
       "      <td>20160627</td>\n",
       "      <td>60.7</td>\n",
       "      <td>70.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>15.944444</td>\n",
       "      <td>-257.205556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>27</td>\n",
       "      <td>20160628</td>\n",
       "      <td>65.4</td>\n",
       "      <td>73.0</td>\n",
       "      <td>55.8</td>\n",
       "      <td>17.2</td>\n",
       "      <td>18.555556</td>\n",
       "      <td>-254.594444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>28</td>\n",
       "      <td>20160629</td>\n",
       "      <td>65.8</td>\n",
       "      <td>73.2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>18.777778</td>\n",
       "      <td>-254.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>29</td>\n",
       "      <td>20160630</td>\n",
       "      <td>65.7</td>\n",
       "      <td>72.7</td>\n",
       "      <td>59.2</td>\n",
       "      <td>13.5</td>\n",
       "      <td>18.722222</td>\n",
       "      <td>-254.427778</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    YEARMODA  TEMP   MAX   MIN  DIFF  TEMP_CELSIUS  TEMP_KELVIN\n",
       "0   20160601  65.5  73.6  54.7  18.9     18.611111  -254.538889\n",
       "1   20160602  65.8  80.8  55.0  25.8     18.777778  -254.372222\n",
       "2   20160603  68.4   NaN  55.6   NaN     20.222222  -252.927778\n",
       "13  20160614  59.7  67.8  47.8  20.0     15.388889  -257.761111\n",
       "14  20160615  63.4  70.3  49.3  21.0     17.444444  -255.705556\n",
       "16  20160617  60.4  70.7  55.9  14.8     15.777778  -257.372222\n",
       "19  20160620  59.3  69.1  52.2  16.9     15.166667  -257.983333\n",
       "20  20160621  62.6  71.4  50.4  21.0     17.000000  -256.150000\n",
       "21  20160622  61.7  70.2  55.4  14.8     16.500000  -256.650000\n",
       "22  20160623  60.9  67.1  54.9  12.2     16.055556  -257.094444\n",
       "23  20160624  61.1  68.9  56.7  12.2     16.166667  -256.983333\n",
       "24  20160625  65.7  75.4  57.9  17.5     18.722222  -254.427778\n",
       "25  20160626  69.6  77.7  60.3  17.4     20.888889  -252.261111\n",
       "26  20160627  60.7  70.0   NaN   NaN     15.944444  -257.205556\n",
       "27  20160628  65.4  73.0  55.8  17.2     18.555556  -254.594444\n",
       "28  20160629  65.8  73.2   NaN   NaN     18.777778  -254.372222\n",
       "29  20160630  65.7  72.7  59.2  13.5     18.722222  -254.427778"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Select rows with temp celsius higher than 15 degrees\n",
    "warm_temps = data.loc[data['TEMP_CELSIUS'] > 15]\n",
    "warm_temps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "It is also possible to combine multiple criteria at the same time. Here, we select temperatures above 15 degrees that were recorded on the second half of June in 2016 (i.e. `YEARMODA >= 20160615`).\n",
    "Combining multiple criteria can be done with the `&` operator (AND) or the `|` operator (OR). Notice, that it is often useful to separate the different clauses inside the parentheses `()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>YEARMODA</th>\n",
       "      <th>TEMP</th>\n",
       "      <th>MAX</th>\n",
       "      <th>MIN</th>\n",
       "      <th>DIFF</th>\n",
       "      <th>TEMP_CELSIUS</th>\n",
       "      <th>TEMP_KELVIN</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>14</td>\n",
       "      <td>20160615</td>\n",
       "      <td>63.4</td>\n",
       "      <td>70.3</td>\n",
       "      <td>49.3</td>\n",
       "      <td>21.0</td>\n",
       "      <td>17.444444</td>\n",
       "      <td>-255.705556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>16</td>\n",
       "      <td>20160617</td>\n",
       "      <td>60.4</td>\n",
       "      <td>70.7</td>\n",
       "      <td>55.9</td>\n",
       "      <td>14.8</td>\n",
       "      <td>15.777778</td>\n",
       "      <td>-257.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>19</td>\n",
       "      <td>20160620</td>\n",
       "      <td>59.3</td>\n",
       "      <td>69.1</td>\n",
       "      <td>52.2</td>\n",
       "      <td>16.9</td>\n",
       "      <td>15.166667</td>\n",
       "      <td>-257.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>20</td>\n",
       "      <td>20160621</td>\n",
       "      <td>62.6</td>\n",
       "      <td>71.4</td>\n",
       "      <td>50.4</td>\n",
       "      <td>21.0</td>\n",
       "      <td>17.000000</td>\n",
       "      <td>-256.150000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>21</td>\n",
       "      <td>20160622</td>\n",
       "      <td>61.7</td>\n",
       "      <td>70.2</td>\n",
       "      <td>55.4</td>\n",
       "      <td>14.8</td>\n",
       "      <td>16.500000</td>\n",
       "      <td>-256.650000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>22</td>\n",
       "      <td>20160623</td>\n",
       "      <td>60.9</td>\n",
       "      <td>67.1</td>\n",
       "      <td>54.9</td>\n",
       "      <td>12.2</td>\n",
       "      <td>16.055556</td>\n",
       "      <td>-257.094444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>23</td>\n",
       "      <td>20160624</td>\n",
       "      <td>61.1</td>\n",
       "      <td>68.9</td>\n",
       "      <td>56.7</td>\n",
       "      <td>12.2</td>\n",
       "      <td>16.166667</td>\n",
       "      <td>-256.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>24</td>\n",
       "      <td>20160625</td>\n",
       "      <td>65.7</td>\n",
       "      <td>75.4</td>\n",
       "      <td>57.9</td>\n",
       "      <td>17.5</td>\n",
       "      <td>18.722222</td>\n",
       "      <td>-254.427778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>25</td>\n",
       "      <td>20160626</td>\n",
       "      <td>69.6</td>\n",
       "      <td>77.7</td>\n",
       "      <td>60.3</td>\n",
       "      <td>17.4</td>\n",
       "      <td>20.888889</td>\n",
       "      <td>-252.261111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>26</td>\n",
       "      <td>20160627</td>\n",
       "      <td>60.7</td>\n",
       "      <td>70.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>15.944444</td>\n",
       "      <td>-257.205556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>27</td>\n",
       "      <td>20160628</td>\n",
       "      <td>65.4</td>\n",
       "      <td>73.0</td>\n",
       "      <td>55.8</td>\n",
       "      <td>17.2</td>\n",
       "      <td>18.555556</td>\n",
       "      <td>-254.594444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>28</td>\n",
       "      <td>20160629</td>\n",
       "      <td>65.8</td>\n",
       "      <td>73.2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>18.777778</td>\n",
       "      <td>-254.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>29</td>\n",
       "      <td>20160630</td>\n",
       "      <td>65.7</td>\n",
       "      <td>72.7</td>\n",
       "      <td>59.2</td>\n",
       "      <td>13.5</td>\n",
       "      <td>18.722222</td>\n",
       "      <td>-254.427778</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    YEARMODA  TEMP   MAX   MIN  DIFF  TEMP_CELSIUS  TEMP_KELVIN\n",
       "14  20160615  63.4  70.3  49.3  21.0     17.444444  -255.705556\n",
       "16  20160617  60.4  70.7  55.9  14.8     15.777778  -257.372222\n",
       "19  20160620  59.3  69.1  52.2  16.9     15.166667  -257.983333\n",
       "20  20160621  62.6  71.4  50.4  21.0     17.000000  -256.150000\n",
       "21  20160622  61.7  70.2  55.4  14.8     16.500000  -256.650000\n",
       "22  20160623  60.9  67.1  54.9  12.2     16.055556  -257.094444\n",
       "23  20160624  61.1  68.9  56.7  12.2     16.166667  -256.983333\n",
       "24  20160625  65.7  75.4  57.9  17.5     18.722222  -254.427778\n",
       "25  20160626  69.6  77.7  60.3  17.4     20.888889  -252.261111\n",
       "26  20160627  60.7  70.0   NaN   NaN     15.944444  -257.205556\n",
       "27  20160628  65.4  73.0  55.8  17.2     18.555556  -254.594444\n",
       "28  20160629  65.8  73.2   NaN   NaN     18.777778  -254.372222\n",
       "29  20160630  65.7  72.7  59.2  13.5     18.722222  -254.427778"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Select rows with temp celsius higher than 15 degrees from late June 2016\n",
    "warm_temps = data.loc[(data['TEMP_CELSIUS'] > 15) & (data['YEARMODA'] >= 20160615)]\n",
    "warm_temps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Now we have a subset of our DataFrame with only rows where the `TEMP_CELSIUS` is above 15 and the dates in `YEARMODA` column start from 15th of June.\n",
    "\n",
    "Notice, that the index values (numbers on the left) are still showing the positions from the original DataFrame. It is possible to **reset** the index using `reset_index()` function that\n",
    "might be useful in some cases to be able to slice the data in a similar manner as above. By default the `reset_index()` would make a new column called `index` to keep track on the previous\n",
    "index which might be useful in some cases but here not, so we can omit that by passing parameter `drop=True`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>YEARMODA</th>\n",
       "      <th>TEMP</th>\n",
       "      <th>MAX</th>\n",
       "      <th>MIN</th>\n",
       "      <th>DIFF</th>\n",
       "      <th>TEMP_CELSIUS</th>\n",
       "      <th>TEMP_KELVIN</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>20160615</td>\n",
       "      <td>63.4</td>\n",
       "      <td>70.3</td>\n",
       "      <td>49.3</td>\n",
       "      <td>21.0</td>\n",
       "      <td>17.444444</td>\n",
       "      <td>-255.705556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>20160617</td>\n",
       "      <td>60.4</td>\n",
       "      <td>70.7</td>\n",
       "      <td>55.9</td>\n",
       "      <td>14.8</td>\n",
       "      <td>15.777778</td>\n",
       "      <td>-257.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>20160620</td>\n",
       "      <td>59.3</td>\n",
       "      <td>69.1</td>\n",
       "      <td>52.2</td>\n",
       "      <td>16.9</td>\n",
       "      <td>15.166667</td>\n",
       "      <td>-257.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>20160621</td>\n",
       "      <td>62.6</td>\n",
       "      <td>71.4</td>\n",
       "      <td>50.4</td>\n",
       "      <td>21.0</td>\n",
       "      <td>17.000000</td>\n",
       "      <td>-256.150000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>20160622</td>\n",
       "      <td>61.7</td>\n",
       "      <td>70.2</td>\n",
       "      <td>55.4</td>\n",
       "      <td>14.8</td>\n",
       "      <td>16.500000</td>\n",
       "      <td>-256.650000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>20160623</td>\n",
       "      <td>60.9</td>\n",
       "      <td>67.1</td>\n",
       "      <td>54.9</td>\n",
       "      <td>12.2</td>\n",
       "      <td>16.055556</td>\n",
       "      <td>-257.094444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>20160624</td>\n",
       "      <td>61.1</td>\n",
       "      <td>68.9</td>\n",
       "      <td>56.7</td>\n",
       "      <td>12.2</td>\n",
       "      <td>16.166667</td>\n",
       "      <td>-256.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>20160625</td>\n",
       "      <td>65.7</td>\n",
       "      <td>75.4</td>\n",
       "      <td>57.9</td>\n",
       "      <td>17.5</td>\n",
       "      <td>18.722222</td>\n",
       "      <td>-254.427778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>20160626</td>\n",
       "      <td>69.6</td>\n",
       "      <td>77.7</td>\n",
       "      <td>60.3</td>\n",
       "      <td>17.4</td>\n",
       "      <td>20.888889</td>\n",
       "      <td>-252.261111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>20160627</td>\n",
       "      <td>60.7</td>\n",
       "      <td>70.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>15.944444</td>\n",
       "      <td>-257.205556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>20160628</td>\n",
       "      <td>65.4</td>\n",
       "      <td>73.0</td>\n",
       "      <td>55.8</td>\n",
       "      <td>17.2</td>\n",
       "      <td>18.555556</td>\n",
       "      <td>-254.594444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>11</td>\n",
       "      <td>20160629</td>\n",
       "      <td>65.8</td>\n",
       "      <td>73.2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>18.777778</td>\n",
       "      <td>-254.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>12</td>\n",
       "      <td>20160630</td>\n",
       "      <td>65.7</td>\n",
       "      <td>72.7</td>\n",
       "      <td>59.2</td>\n",
       "      <td>13.5</td>\n",
       "      <td>18.722222</td>\n",
       "      <td>-254.427778</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    YEARMODA  TEMP   MAX   MIN  DIFF  TEMP_CELSIUS  TEMP_KELVIN\n",
       "0   20160615  63.4  70.3  49.3  21.0     17.444444  -255.705556\n",
       "1   20160617  60.4  70.7  55.9  14.8     15.777778  -257.372222\n",
       "2   20160620  59.3  69.1  52.2  16.9     15.166667  -257.983333\n",
       "3   20160621  62.6  71.4  50.4  21.0     17.000000  -256.150000\n",
       "4   20160622  61.7  70.2  55.4  14.8     16.500000  -256.650000\n",
       "5   20160623  60.9  67.1  54.9  12.2     16.055556  -257.094444\n",
       "6   20160624  61.1  68.9  56.7  12.2     16.166667  -256.983333\n",
       "7   20160625  65.7  75.4  57.9  17.5     18.722222  -254.427778\n",
       "8   20160626  69.6  77.7  60.3  17.4     20.888889  -252.261111\n",
       "9   20160627  60.7  70.0   NaN   NaN     15.944444  -257.205556\n",
       "10  20160628  65.4  73.0  55.8  17.2     18.555556  -254.594444\n",
       "11  20160629  65.8  73.2   NaN   NaN     18.777778  -254.372222\n",
       "12  20160630  65.7  72.7  59.2  13.5     18.722222  -254.427778"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Reset index\n",
    "warm_temps = warm_temps.reset_index(drop=True)\n",
    "warm_temps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As can be seen, now the index values goes from 0 to 12."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Check your understanding\n",
    "\n",
    "Find the mean temperatures (in Celsius) for the last seven days of June again. This time you should select the rows based on a condition for the `YEARMODA` column!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "tags": [
     "hide-cell"
    ]
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "18.253968253968257"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Here's the solution\n",
    "data['TEMP_CELSIUS'].loc[data['YEARMODA'] >= 20160624].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{admonition} Deep copy\n",
    "In this lesson, we have stored subsets of a DataFrame as a new variable. In some cases, we are still referring to the original data and any modifications made to the new variable might influence the original DataFrame.\n",
    "    \n",
    "If you want to be extra careful to not modify the original DataFrame, then you should take a proper copy of the data before proceeding using the `.copy()` method. You can read more about indexing, selecting data and deep and shallow copies in [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html) and in [this excellent blog post](https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-part-4-c4216f84d388).\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Dealing with missing data\n",
    "\n",
    "As you may have noticed by now, we have several missing values for the temperature minimum, maximum, and difference columns (`MIN`, `MAX`, `DIFF`, and `DIFF_MIN`). These missing values are indicated as `NaN` (not-a-number). Having missing data in your datafile is really common situation and typically you want to deal with it somehow. Common procedures to deal with `NaN` values are to either **remove** them from\n",
    "the DataFrame or **fill** them with some value. In Pandas both of these options are really easy to do.\n",
    "\n",
    "Let's first see how we can remove the NoData values (i.e. clean the data) using the [.dropna()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html) function. Inside the function you can pass a list of column(s) from which the `NaN` values should found using the `subset` parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>YEARMODA</th>\n",
       "      <th>TEMP</th>\n",
       "      <th>MAX</th>\n",
       "      <th>MIN</th>\n",
       "      <th>DIFF</th>\n",
       "      <th>TEMP_CELSIUS</th>\n",
       "      <th>TEMP_KELVIN</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>20160615</td>\n",
       "      <td>63.4</td>\n",
       "      <td>70.3</td>\n",
       "      <td>49.3</td>\n",
       "      <td>21.0</td>\n",
       "      <td>17.444444</td>\n",
       "      <td>-255.705556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>20160617</td>\n",
       "      <td>60.4</td>\n",
       "      <td>70.7</td>\n",
       "      <td>55.9</td>\n",
       "      <td>14.8</td>\n",
       "      <td>15.777778</td>\n",
       "      <td>-257.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>20160620</td>\n",
       "      <td>59.3</td>\n",
       "      <td>69.1</td>\n",
       "      <td>52.2</td>\n",
       "      <td>16.9</td>\n",
       "      <td>15.166667</td>\n",
       "      <td>-257.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>20160621</td>\n",
       "      <td>62.6</td>\n",
       "      <td>71.4</td>\n",
       "      <td>50.4</td>\n",
       "      <td>21.0</td>\n",
       "      <td>17.000000</td>\n",
       "      <td>-256.150000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>20160622</td>\n",
       "      <td>61.7</td>\n",
       "      <td>70.2</td>\n",
       "      <td>55.4</td>\n",
       "      <td>14.8</td>\n",
       "      <td>16.500000</td>\n",
       "      <td>-256.650000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>20160623</td>\n",
       "      <td>60.9</td>\n",
       "      <td>67.1</td>\n",
       "      <td>54.9</td>\n",
       "      <td>12.2</td>\n",
       "      <td>16.055556</td>\n",
       "      <td>-257.094444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>20160624</td>\n",
       "      <td>61.1</td>\n",
       "      <td>68.9</td>\n",
       "      <td>56.7</td>\n",
       "      <td>12.2</td>\n",
       "      <td>16.166667</td>\n",
       "      <td>-256.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>20160625</td>\n",
       "      <td>65.7</td>\n",
       "      <td>75.4</td>\n",
       "      <td>57.9</td>\n",
       "      <td>17.5</td>\n",
       "      <td>18.722222</td>\n",
       "      <td>-254.427778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>20160626</td>\n",
       "      <td>69.6</td>\n",
       "      <td>77.7</td>\n",
       "      <td>60.3</td>\n",
       "      <td>17.4</td>\n",
       "      <td>20.888889</td>\n",
       "      <td>-252.261111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>20160628</td>\n",
       "      <td>65.4</td>\n",
       "      <td>73.0</td>\n",
       "      <td>55.8</td>\n",
       "      <td>17.2</td>\n",
       "      <td>18.555556</td>\n",
       "      <td>-254.594444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>12</td>\n",
       "      <td>20160630</td>\n",
       "      <td>65.7</td>\n",
       "      <td>72.7</td>\n",
       "      <td>59.2</td>\n",
       "      <td>13.5</td>\n",
       "      <td>18.722222</td>\n",
       "      <td>-254.427778</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    YEARMODA  TEMP   MAX   MIN  DIFF  TEMP_CELSIUS  TEMP_KELVIN\n",
       "0   20160615  63.4  70.3  49.3  21.0     17.444444  -255.705556\n",
       "1   20160617  60.4  70.7  55.9  14.8     15.777778  -257.372222\n",
       "2   20160620  59.3  69.1  52.2  16.9     15.166667  -257.983333\n",
       "3   20160621  62.6  71.4  50.4  21.0     17.000000  -256.150000\n",
       "4   20160622  61.7  70.2  55.4  14.8     16.500000  -256.650000\n",
       "5   20160623  60.9  67.1  54.9  12.2     16.055556  -257.094444\n",
       "6   20160624  61.1  68.9  56.7  12.2     16.166667  -256.983333\n",
       "7   20160625  65.7  75.4  57.9  17.5     18.722222  -254.427778\n",
       "8   20160626  69.6  77.7  60.3  17.4     20.888889  -252.261111\n",
       "10  20160628  65.4  73.0  55.8  17.2     18.555556  -254.594444\n",
       "12  20160630  65.7  72.7  59.2  13.5     18.722222  -254.427778"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Drop no data values based on the MIN column\n",
    "warm_temps_clean = warm_temps.dropna(subset=['MIN'])\n",
    "warm_temps_clean"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "As you can see by looking at the table above (and the change in index values), we now have a DataFrame without the NoData values.\n",
    "\n",
    "````{note}\n",
    "Note that we replaced the original `warm_temps` variable with version where no data are removed. The `.dropna()` function, among other pandas functions can also be applied \"inplace\" which means that the function updates the DataFrame object and returns `None`:\n",
    "    \n",
    "```python\n",
    "warm_temps.dropna(subset=['MIN'], inplace=True)\n",
    "```\n",
    "````\n",
    "\n",
    "Another option is to fill the NoData with some value using the `fillna()` function. Here we can fill the missing values in the with value -9999. Note that we are not giving the `subset` parameter this time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>YEARMODA</th>\n",
       "      <th>TEMP</th>\n",
       "      <th>MAX</th>\n",
       "      <th>MIN</th>\n",
       "      <th>DIFF</th>\n",
       "      <th>TEMP_CELSIUS</th>\n",
       "      <th>TEMP_KELVIN</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>20160615</td>\n",
       "      <td>63.4</td>\n",
       "      <td>70.3</td>\n",
       "      <td>49.3</td>\n",
       "      <td>21.0</td>\n",
       "      <td>17.444444</td>\n",
       "      <td>-255.705556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>20160617</td>\n",
       "      <td>60.4</td>\n",
       "      <td>70.7</td>\n",
       "      <td>55.9</td>\n",
       "      <td>14.8</td>\n",
       "      <td>15.777778</td>\n",
       "      <td>-257.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>20160620</td>\n",
       "      <td>59.3</td>\n",
       "      <td>69.1</td>\n",
       "      <td>52.2</td>\n",
       "      <td>16.9</td>\n",
       "      <td>15.166667</td>\n",
       "      <td>-257.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>20160621</td>\n",
       "      <td>62.6</td>\n",
       "      <td>71.4</td>\n",
       "      <td>50.4</td>\n",
       "      <td>21.0</td>\n",
       "      <td>17.000000</td>\n",
       "      <td>-256.150000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>20160622</td>\n",
       "      <td>61.7</td>\n",
       "      <td>70.2</td>\n",
       "      <td>55.4</td>\n",
       "      <td>14.8</td>\n",
       "      <td>16.500000</td>\n",
       "      <td>-256.650000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>20160623</td>\n",
       "      <td>60.9</td>\n",
       "      <td>67.1</td>\n",
       "      <td>54.9</td>\n",
       "      <td>12.2</td>\n",
       "      <td>16.055556</td>\n",
       "      <td>-257.094444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>20160624</td>\n",
       "      <td>61.1</td>\n",
       "      <td>68.9</td>\n",
       "      <td>56.7</td>\n",
       "      <td>12.2</td>\n",
       "      <td>16.166667</td>\n",
       "      <td>-256.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>20160625</td>\n",
       "      <td>65.7</td>\n",
       "      <td>75.4</td>\n",
       "      <td>57.9</td>\n",
       "      <td>17.5</td>\n",
       "      <td>18.722222</td>\n",
       "      <td>-254.427778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>20160626</td>\n",
       "      <td>69.6</td>\n",
       "      <td>77.7</td>\n",
       "      <td>60.3</td>\n",
       "      <td>17.4</td>\n",
       "      <td>20.888889</td>\n",
       "      <td>-252.261111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>20160627</td>\n",
       "      <td>60.7</td>\n",
       "      <td>70.0</td>\n",
       "      <td>-9999.0</td>\n",
       "      <td>-9999.0</td>\n",
       "      <td>15.944444</td>\n",
       "      <td>-257.205556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>20160628</td>\n",
       "      <td>65.4</td>\n",
       "      <td>73.0</td>\n",
       "      <td>55.8</td>\n",
       "      <td>17.2</td>\n",
       "      <td>18.555556</td>\n",
       "      <td>-254.594444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>11</td>\n",
       "      <td>20160629</td>\n",
       "      <td>65.8</td>\n",
       "      <td>73.2</td>\n",
       "      <td>-9999.0</td>\n",
       "      <td>-9999.0</td>\n",
       "      <td>18.777778</td>\n",
       "      <td>-254.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>12</td>\n",
       "      <td>20160630</td>\n",
       "      <td>65.7</td>\n",
       "      <td>72.7</td>\n",
       "      <td>59.2</td>\n",
       "      <td>13.5</td>\n",
       "      <td>18.722222</td>\n",
       "      <td>-254.427778</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    YEARMODA  TEMP   MAX     MIN    DIFF  TEMP_CELSIUS  TEMP_KELVIN\n",
       "0   20160615  63.4  70.3    49.3    21.0     17.444444  -255.705556\n",
       "1   20160617  60.4  70.7    55.9    14.8     15.777778  -257.372222\n",
       "2   20160620  59.3  69.1    52.2    16.9     15.166667  -257.983333\n",
       "3   20160621  62.6  71.4    50.4    21.0     17.000000  -256.150000\n",
       "4   20160622  61.7  70.2    55.4    14.8     16.500000  -256.650000\n",
       "5   20160623  60.9  67.1    54.9    12.2     16.055556  -257.094444\n",
       "6   20160624  61.1  68.9    56.7    12.2     16.166667  -256.983333\n",
       "7   20160625  65.7  75.4    57.9    17.5     18.722222  -254.427778\n",
       "8   20160626  69.6  77.7    60.3    17.4     20.888889  -252.261111\n",
       "9   20160627  60.7  70.0 -9999.0 -9999.0     15.944444  -257.205556\n",
       "10  20160628  65.4  73.0    55.8    17.2     18.555556  -254.594444\n",
       "11  20160629  65.8  73.2 -9999.0 -9999.0     18.777778  -254.372222\n",
       "12  20160630  65.7  72.7    59.2    13.5     18.722222  -254.427778"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Fill na values\n",
    "warm_temps.fillna(-9999)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "As a result we now have a DataFrame where NoData values are filled with the value -9999."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "```{warning}\n",
    "In many cases filling the data with a specific value is dangerous because you end up modifying the actual data, which might affect the results of your analysis. For example, in the case above we would have dramatically changed the temperature difference columns because the -9999 values not an actual temperature difference! Hence, use caution when filling missing values. \n",
    "    \n",
    "You might have to fill in no data values for the purposes of saving the data to file in a spesific format. For example, some GIS software don't accept missing values.  Always pay attention to potential no data values when reading in data files and doing further analysis!\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data type conversions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "editable": true
   },
   "source": [
    "There are occasions where you'll need to convert data stored within a Series to another data type, for example, from floating point to integer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Remember, that we already did data type conversions using the [built-in Python functions](https://docs.python.org/3/library/functions.html#built-in-functions) such as `int()` or `str()`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For values in pandas DataFrames and Series, we can use the `astype()` method."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "editable": true
   },
   "source": [
    "```{admonition} Truncating versus rounding up\n",
    "**Be careful with type conversions from floating point values to integers.** The conversion simply drops the stuff to the right of the decimal point, so all values are rounded down to the nearest whole number. For example, 99.99 will be truncated to 99 as an integer, when it should be rounded up to 100.\n",
    "\n",
    "Chaining the round and type conversion functions solves this issue as the `.round(0).astype(int)` command first rounds the values with zero decimals and then converts those values into integers.\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Original values:\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0    65.5\n",
       "1    65.8\n",
       "2    68.4\n",
       "3    57.5\n",
       "4    51.4\n",
       "Name: TEMP, dtype: float64"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(\"Original values:\")\n",
    "data['TEMP'].head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Truncated integer values:\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0    65\n",
       "1    65\n",
       "2    68\n",
       "3    57\n",
       "4    51\n",
       "Name: TEMP, dtype: int64"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(\"Truncated integer values:\")\n",
    "data['TEMP'].astype(int).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Rounded integer values:\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0    66\n",
       "1    66\n",
       "2    68\n",
       "3    58\n",
       "4    51\n",
       "Name: TEMP, dtype: int64"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(\"Rounded integer values:\")\n",
    "data['TEMP'].round(0).astype(int).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Looks correct now."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Unique values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Sometimes it is useful to extract the unique values that you have in your column.\n",
    "We can do that by using `unique()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([65.5, 65.8, 68.4, 57.5, 51.4, 52.2, 56.9, 54.2, 49.4, 49.5, 54. ,\n",
       "       55.4, 58.3, 59.7, 63.4, 57.8, 60.4, 57.3, 56.3, 59.3, 62.6, 61.7,\n",
       "       60.9, 61.1, 65.7, 69.6, 60.7, 65.4])"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Get unique celsius values\n",
    "unique = data['TEMP'].unique()\n",
    "unique"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "As a result we get an array of unique values in that column."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "```{note}\n",
    "Sometimes if you have a long list of unique values, you don't necessarily see all the unique values directly as IPython/Jupyter may hide them with an elipsis `...`. It is, however, possible to see all those values by printing them as a list\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[65.5,\n",
       " 65.8,\n",
       " 68.4,\n",
       " 57.5,\n",
       " 51.4,\n",
       " 52.2,\n",
       " 56.9,\n",
       " 54.2,\n",
       " 49.4,\n",
       " 49.5,\n",
       " 54.0,\n",
       " 55.4,\n",
       " 58.3,\n",
       " 59.7,\n",
       " 63.4,\n",
       " 57.8,\n",
       " 60.4,\n",
       " 57.3,\n",
       " 56.3,\n",
       " 59.3,\n",
       " 62.6,\n",
       " 61.7,\n",
       " 60.9,\n",
       " 61.1,\n",
       " 65.7,\n",
       " 69.6,\n",
       " 60.7,\n",
       " 65.4]"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# unique values as list\n",
    "list(unique)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "How many days with unique mean temperature did we have in June 2016? We can check that!\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There were 28 days with unique mean temperatures in June 2016.\n"
     ]
    }
   ],
   "source": [
    "# Number of unique values\n",
    "unique_temps = len(unique)\n",
    "print(\"There were\", unique_temps, \"days with unique mean temperatures in June 2016.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sorting data\n",
    "\n",
    "Quite often it is useful to be able to sort your data (descending/ascending) based on values in some column\n",
    "This can be easily done with Pandas using `sort_values(by='YourColumnName')` -function.\n",
    "\n",
    "Let's first sort the values on ascending order based on the `TEMP` column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>YEARMODA</th>\n",
       "      <th>TEMP</th>\n",
       "      <th>MAX</th>\n",
       "      <th>MIN</th>\n",
       "      <th>DIFF</th>\n",
       "      <th>TEMP_CELSIUS</th>\n",
       "      <th>TEMP_KELVIN</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>20160609</td>\n",
       "      <td>49.4</td>\n",
       "      <td>54.1</td>\n",
       "      <td>45.7</td>\n",
       "      <td>8.4</td>\n",
       "      <td>9.666667</td>\n",
       "      <td>-263.483333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>20160610</td>\n",
       "      <td>49.5</td>\n",
       "      <td>55.9</td>\n",
       "      <td>43.0</td>\n",
       "      <td>12.9</td>\n",
       "      <td>9.722222</td>\n",
       "      <td>-263.427778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>20160605</td>\n",
       "      <td>51.4</td>\n",
       "      <td>58.3</td>\n",
       "      <td>43.2</td>\n",
       "      <td>15.1</td>\n",
       "      <td>10.777778</td>\n",
       "      <td>-262.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>20160606</td>\n",
       "      <td>52.2</td>\n",
       "      <td>59.7</td>\n",
       "      <td>42.8</td>\n",
       "      <td>16.9</td>\n",
       "      <td>11.222222</td>\n",
       "      <td>-261.927778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>20160611</td>\n",
       "      <td>54.0</td>\n",
       "      <td>62.1</td>\n",
       "      <td>41.7</td>\n",
       "      <td>20.4</td>\n",
       "      <td>12.222222</td>\n",
       "      <td>-260.927778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>20160608</td>\n",
       "      <td>54.2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>47.5</td>\n",
       "      <td>NaN</td>\n",
       "      <td>12.333333</td>\n",
       "      <td>-260.816667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>11</td>\n",
       "      <td>20160612</td>\n",
       "      <td>55.4</td>\n",
       "      <td>64.2</td>\n",
       "      <td>46.0</td>\n",
       "      <td>18.2</td>\n",
       "      <td>13.000000</td>\n",
       "      <td>-260.150000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>18</td>\n",
       "      <td>20160619</td>\n",
       "      <td>56.3</td>\n",
       "      <td>59.2</td>\n",
       "      <td>54.1</td>\n",
       "      <td>5.1</td>\n",
       "      <td>13.500000</td>\n",
       "      <td>-259.650000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>20160607</td>\n",
       "      <td>56.9</td>\n",
       "      <td>65.1</td>\n",
       "      <td>45.9</td>\n",
       "      <td>19.2</td>\n",
       "      <td>13.833333</td>\n",
       "      <td>-259.316667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>17</td>\n",
       "      <td>20160618</td>\n",
       "      <td>57.3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>54.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>14.055556</td>\n",
       "      <td>-259.094444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>20160604</td>\n",
       "      <td>57.5</td>\n",
       "      <td>70.9</td>\n",
       "      <td>47.3</td>\n",
       "      <td>23.6</td>\n",
       "      <td>14.166667</td>\n",
       "      <td>-258.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>15</td>\n",
       "      <td>20160616</td>\n",
       "      <td>57.8</td>\n",
       "      <td>67.5</td>\n",
       "      <td>55.6</td>\n",
       "      <td>11.9</td>\n",
       "      <td>14.333333</td>\n",
       "      <td>-258.816667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>12</td>\n",
       "      <td>20160613</td>\n",
       "      <td>58.3</td>\n",
       "      <td>68.2</td>\n",
       "      <td>47.3</td>\n",
       "      <td>20.9</td>\n",
       "      <td>14.611111</td>\n",
       "      <td>-258.538889</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>19</td>\n",
       "      <td>20160620</td>\n",
       "      <td>59.3</td>\n",
       "      <td>69.1</td>\n",
       "      <td>52.2</td>\n",
       "      <td>16.9</td>\n",
       "      <td>15.166667</td>\n",
       "      <td>-257.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>13</td>\n",
       "      <td>20160614</td>\n",
       "      <td>59.7</td>\n",
       "      <td>67.8</td>\n",
       "      <td>47.8</td>\n",
       "      <td>20.0</td>\n",
       "      <td>15.388889</td>\n",
       "      <td>-257.761111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>16</td>\n",
       "      <td>20160617</td>\n",
       "      <td>60.4</td>\n",
       "      <td>70.7</td>\n",
       "      <td>55.9</td>\n",
       "      <td>14.8</td>\n",
       "      <td>15.777778</td>\n",
       "      <td>-257.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>26</td>\n",
       "      <td>20160627</td>\n",
       "      <td>60.7</td>\n",
       "      <td>70.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>15.944444</td>\n",
       "      <td>-257.205556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>22</td>\n",
       "      <td>20160623</td>\n",
       "      <td>60.9</td>\n",
       "      <td>67.1</td>\n",
       "      <td>54.9</td>\n",
       "      <td>12.2</td>\n",
       "      <td>16.055556</td>\n",
       "      <td>-257.094444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>23</td>\n",
       "      <td>20160624</td>\n",
       "      <td>61.1</td>\n",
       "      <td>68.9</td>\n",
       "      <td>56.7</td>\n",
       "      <td>12.2</td>\n",
       "      <td>16.166667</td>\n",
       "      <td>-256.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>21</td>\n",
       "      <td>20160622</td>\n",
       "      <td>61.7</td>\n",
       "      <td>70.2</td>\n",
       "      <td>55.4</td>\n",
       "      <td>14.8</td>\n",
       "      <td>16.500000</td>\n",
       "      <td>-256.650000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>20</td>\n",
       "      <td>20160621</td>\n",
       "      <td>62.6</td>\n",
       "      <td>71.4</td>\n",
       "      <td>50.4</td>\n",
       "      <td>21.0</td>\n",
       "      <td>17.000000</td>\n",
       "      <td>-256.150000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>14</td>\n",
       "      <td>20160615</td>\n",
       "      <td>63.4</td>\n",
       "      <td>70.3</td>\n",
       "      <td>49.3</td>\n",
       "      <td>21.0</td>\n",
       "      <td>17.444444</td>\n",
       "      <td>-255.705556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>27</td>\n",
       "      <td>20160628</td>\n",
       "      <td>65.4</td>\n",
       "      <td>73.0</td>\n",
       "      <td>55.8</td>\n",
       "      <td>17.2</td>\n",
       "      <td>18.555556</td>\n",
       "      <td>-254.594444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>20160601</td>\n",
       "      <td>65.5</td>\n",
       "      <td>73.6</td>\n",
       "      <td>54.7</td>\n",
       "      <td>18.9</td>\n",
       "      <td>18.611111</td>\n",
       "      <td>-254.538889</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>24</td>\n",
       "      <td>20160625</td>\n",
       "      <td>65.7</td>\n",
       "      <td>75.4</td>\n",
       "      <td>57.9</td>\n",
       "      <td>17.5</td>\n",
       "      <td>18.722222</td>\n",
       "      <td>-254.427778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>29</td>\n",
       "      <td>20160630</td>\n",
       "      <td>65.7</td>\n",
       "      <td>72.7</td>\n",
       "      <td>59.2</td>\n",
       "      <td>13.5</td>\n",
       "      <td>18.722222</td>\n",
       "      <td>-254.427778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>20160602</td>\n",
       "      <td>65.8</td>\n",
       "      <td>80.8</td>\n",
       "      <td>55.0</td>\n",
       "      <td>25.8</td>\n",
       "      <td>18.777778</td>\n",
       "      <td>-254.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>28</td>\n",
       "      <td>20160629</td>\n",
       "      <td>65.8</td>\n",
       "      <td>73.2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>18.777778</td>\n",
       "      <td>-254.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>20160603</td>\n",
       "      <td>68.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>55.6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20.222222</td>\n",
       "      <td>-252.927778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>25</td>\n",
       "      <td>20160626</td>\n",
       "      <td>69.6</td>\n",
       "      <td>77.7</td>\n",
       "      <td>60.3</td>\n",
       "      <td>17.4</td>\n",
       "      <td>20.888889</td>\n",
       "      <td>-252.261111</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    YEARMODA  TEMP   MAX   MIN  DIFF  TEMP_CELSIUS  TEMP_KELVIN\n",
       "8   20160609  49.4  54.1  45.7   8.4      9.666667  -263.483333\n",
       "9   20160610  49.5  55.9  43.0  12.9      9.722222  -263.427778\n",
       "4   20160605  51.4  58.3  43.2  15.1     10.777778  -262.372222\n",
       "5   20160606  52.2  59.7  42.8  16.9     11.222222  -261.927778\n",
       "10  20160611  54.0  62.1  41.7  20.4     12.222222  -260.927778\n",
       "7   20160608  54.2   NaN  47.5   NaN     12.333333  -260.816667\n",
       "11  20160612  55.4  64.2  46.0  18.2     13.000000  -260.150000\n",
       "18  20160619  56.3  59.2  54.1   5.1     13.500000  -259.650000\n",
       "6   20160607  56.9  65.1  45.9  19.2     13.833333  -259.316667\n",
       "17  20160618  57.3   NaN  54.0   NaN     14.055556  -259.094444\n",
       "3   20160604  57.5  70.9  47.3  23.6     14.166667  -258.983333\n",
       "15  20160616  57.8  67.5  55.6  11.9     14.333333  -258.816667\n",
       "12  20160613  58.3  68.2  47.3  20.9     14.611111  -258.538889\n",
       "19  20160620  59.3  69.1  52.2  16.9     15.166667  -257.983333\n",
       "13  20160614  59.7  67.8  47.8  20.0     15.388889  -257.761111\n",
       "16  20160617  60.4  70.7  55.9  14.8     15.777778  -257.372222\n",
       "26  20160627  60.7  70.0   NaN   NaN     15.944444  -257.205556\n",
       "22  20160623  60.9  67.1  54.9  12.2     16.055556  -257.094444\n",
       "23  20160624  61.1  68.9  56.7  12.2     16.166667  -256.983333\n",
       "21  20160622  61.7  70.2  55.4  14.8     16.500000  -256.650000\n",
       "20  20160621  62.6  71.4  50.4  21.0     17.000000  -256.150000\n",
       "14  20160615  63.4  70.3  49.3  21.0     17.444444  -255.705556\n",
       "27  20160628  65.4  73.0  55.8  17.2     18.555556  -254.594444\n",
       "0   20160601  65.5  73.6  54.7  18.9     18.611111  -254.538889\n",
       "24  20160625  65.7  75.4  57.9  17.5     18.722222  -254.427778\n",
       "29  20160630  65.7  72.7  59.2  13.5     18.722222  -254.427778\n",
       "1   20160602  65.8  80.8  55.0  25.8     18.777778  -254.372222\n",
       "28  20160629  65.8  73.2   NaN   NaN     18.777778  -254.372222\n",
       "2   20160603  68.4   NaN  55.6   NaN     20.222222  -252.927778\n",
       "25  20160626  69.6  77.7  60.3  17.4     20.888889  -252.261111"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Sort dataframe, ascending\n",
    "data.sort_values(by='TEMP')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Of course, it is also possible to sort them in descending order with ``ascending=False`` parameter:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>YEARMODA</th>\n",
       "      <th>TEMP</th>\n",
       "      <th>MAX</th>\n",
       "      <th>MIN</th>\n",
       "      <th>DIFF</th>\n",
       "      <th>TEMP_CELSIUS</th>\n",
       "      <th>TEMP_KELVIN</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>25</td>\n",
       "      <td>20160626</td>\n",
       "      <td>69.6</td>\n",
       "      <td>77.7</td>\n",
       "      <td>60.3</td>\n",
       "      <td>17.4</td>\n",
       "      <td>20.888889</td>\n",
       "      <td>-252.261111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>20160603</td>\n",
       "      <td>68.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>55.6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20.222222</td>\n",
       "      <td>-252.927778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>20160602</td>\n",
       "      <td>65.8</td>\n",
       "      <td>80.8</td>\n",
       "      <td>55.0</td>\n",
       "      <td>25.8</td>\n",
       "      <td>18.777778</td>\n",
       "      <td>-254.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>28</td>\n",
       "      <td>20160629</td>\n",
       "      <td>65.8</td>\n",
       "      <td>73.2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>18.777778</td>\n",
       "      <td>-254.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>29</td>\n",
       "      <td>20160630</td>\n",
       "      <td>65.7</td>\n",
       "      <td>72.7</td>\n",
       "      <td>59.2</td>\n",
       "      <td>13.5</td>\n",
       "      <td>18.722222</td>\n",
       "      <td>-254.427778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>24</td>\n",
       "      <td>20160625</td>\n",
       "      <td>65.7</td>\n",
       "      <td>75.4</td>\n",
       "      <td>57.9</td>\n",
       "      <td>17.5</td>\n",
       "      <td>18.722222</td>\n",
       "      <td>-254.427778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>20160601</td>\n",
       "      <td>65.5</td>\n",
       "      <td>73.6</td>\n",
       "      <td>54.7</td>\n",
       "      <td>18.9</td>\n",
       "      <td>18.611111</td>\n",
       "      <td>-254.538889</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>27</td>\n",
       "      <td>20160628</td>\n",
       "      <td>65.4</td>\n",
       "      <td>73.0</td>\n",
       "      <td>55.8</td>\n",
       "      <td>17.2</td>\n",
       "      <td>18.555556</td>\n",
       "      <td>-254.594444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>14</td>\n",
       "      <td>20160615</td>\n",
       "      <td>63.4</td>\n",
       "      <td>70.3</td>\n",
       "      <td>49.3</td>\n",
       "      <td>21.0</td>\n",
       "      <td>17.444444</td>\n",
       "      <td>-255.705556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>20</td>\n",
       "      <td>20160621</td>\n",
       "      <td>62.6</td>\n",
       "      <td>71.4</td>\n",
       "      <td>50.4</td>\n",
       "      <td>21.0</td>\n",
       "      <td>17.000000</td>\n",
       "      <td>-256.150000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>21</td>\n",
       "      <td>20160622</td>\n",
       "      <td>61.7</td>\n",
       "      <td>70.2</td>\n",
       "      <td>55.4</td>\n",
       "      <td>14.8</td>\n",
       "      <td>16.500000</td>\n",
       "      <td>-256.650000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>23</td>\n",
       "      <td>20160624</td>\n",
       "      <td>61.1</td>\n",
       "      <td>68.9</td>\n",
       "      <td>56.7</td>\n",
       "      <td>12.2</td>\n",
       "      <td>16.166667</td>\n",
       "      <td>-256.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>22</td>\n",
       "      <td>20160623</td>\n",
       "      <td>60.9</td>\n",
       "      <td>67.1</td>\n",
       "      <td>54.9</td>\n",
       "      <td>12.2</td>\n",
       "      <td>16.055556</td>\n",
       "      <td>-257.094444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>26</td>\n",
       "      <td>20160627</td>\n",
       "      <td>60.7</td>\n",
       "      <td>70.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>15.944444</td>\n",
       "      <td>-257.205556</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>16</td>\n",
       "      <td>20160617</td>\n",
       "      <td>60.4</td>\n",
       "      <td>70.7</td>\n",
       "      <td>55.9</td>\n",
       "      <td>14.8</td>\n",
       "      <td>15.777778</td>\n",
       "      <td>-257.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>13</td>\n",
       "      <td>20160614</td>\n",
       "      <td>59.7</td>\n",
       "      <td>67.8</td>\n",
       "      <td>47.8</td>\n",
       "      <td>20.0</td>\n",
       "      <td>15.388889</td>\n",
       "      <td>-257.761111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>19</td>\n",
       "      <td>20160620</td>\n",
       "      <td>59.3</td>\n",
       "      <td>69.1</td>\n",
       "      <td>52.2</td>\n",
       "      <td>16.9</td>\n",
       "      <td>15.166667</td>\n",
       "      <td>-257.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>12</td>\n",
       "      <td>20160613</td>\n",
       "      <td>58.3</td>\n",
       "      <td>68.2</td>\n",
       "      <td>47.3</td>\n",
       "      <td>20.9</td>\n",
       "      <td>14.611111</td>\n",
       "      <td>-258.538889</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>15</td>\n",
       "      <td>20160616</td>\n",
       "      <td>57.8</td>\n",
       "      <td>67.5</td>\n",
       "      <td>55.6</td>\n",
       "      <td>11.9</td>\n",
       "      <td>14.333333</td>\n",
       "      <td>-258.816667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>20160604</td>\n",
       "      <td>57.5</td>\n",
       "      <td>70.9</td>\n",
       "      <td>47.3</td>\n",
       "      <td>23.6</td>\n",
       "      <td>14.166667</td>\n",
       "      <td>-258.983333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>17</td>\n",
       "      <td>20160618</td>\n",
       "      <td>57.3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>54.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>14.055556</td>\n",
       "      <td>-259.094444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>20160607</td>\n",
       "      <td>56.9</td>\n",
       "      <td>65.1</td>\n",
       "      <td>45.9</td>\n",
       "      <td>19.2</td>\n",
       "      <td>13.833333</td>\n",
       "      <td>-259.316667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>18</td>\n",
       "      <td>20160619</td>\n",
       "      <td>56.3</td>\n",
       "      <td>59.2</td>\n",
       "      <td>54.1</td>\n",
       "      <td>5.1</td>\n",
       "      <td>13.500000</td>\n",
       "      <td>-259.650000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>11</td>\n",
       "      <td>20160612</td>\n",
       "      <td>55.4</td>\n",
       "      <td>64.2</td>\n",
       "      <td>46.0</td>\n",
       "      <td>18.2</td>\n",
       "      <td>13.000000</td>\n",
       "      <td>-260.150000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>20160608</td>\n",
       "      <td>54.2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>47.5</td>\n",
       "      <td>NaN</td>\n",
       "      <td>12.333333</td>\n",
       "      <td>-260.816667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>20160611</td>\n",
       "      <td>54.0</td>\n",
       "      <td>62.1</td>\n",
       "      <td>41.7</td>\n",
       "      <td>20.4</td>\n",
       "      <td>12.222222</td>\n",
       "      <td>-260.927778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>20160606</td>\n",
       "      <td>52.2</td>\n",
       "      <td>59.7</td>\n",
       "      <td>42.8</td>\n",
       "      <td>16.9</td>\n",
       "      <td>11.222222</td>\n",
       "      <td>-261.927778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>20160605</td>\n",
       "      <td>51.4</td>\n",
       "      <td>58.3</td>\n",
       "      <td>43.2</td>\n",
       "      <td>15.1</td>\n",
       "      <td>10.777778</td>\n",
       "      <td>-262.372222</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>20160610</td>\n",
       "      <td>49.5</td>\n",
       "      <td>55.9</td>\n",
       "      <td>43.0</td>\n",
       "      <td>12.9</td>\n",
       "      <td>9.722222</td>\n",
       "      <td>-263.427778</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>20160609</td>\n",
       "      <td>49.4</td>\n",
       "      <td>54.1</td>\n",
       "      <td>45.7</td>\n",
       "      <td>8.4</td>\n",
       "      <td>9.666667</td>\n",
       "      <td>-263.483333</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    YEARMODA  TEMP   MAX   MIN  DIFF  TEMP_CELSIUS  TEMP_KELVIN\n",
       "25  20160626  69.6  77.7  60.3  17.4     20.888889  -252.261111\n",
       "2   20160603  68.4   NaN  55.6   NaN     20.222222  -252.927778\n",
       "1   20160602  65.8  80.8  55.0  25.8     18.777778  -254.372222\n",
       "28  20160629  65.8  73.2   NaN   NaN     18.777778  -254.372222\n",
       "29  20160630  65.7  72.7  59.2  13.5     18.722222  -254.427778\n",
       "24  20160625  65.7  75.4  57.9  17.5     18.722222  -254.427778\n",
       "0   20160601  65.5  73.6  54.7  18.9     18.611111  -254.538889\n",
       "27  20160628  65.4  73.0  55.8  17.2     18.555556  -254.594444\n",
       "14  20160615  63.4  70.3  49.3  21.0     17.444444  -255.705556\n",
       "20  20160621  62.6  71.4  50.4  21.0     17.000000  -256.150000\n",
       "21  20160622  61.7  70.2  55.4  14.8     16.500000  -256.650000\n",
       "23  20160624  61.1  68.9  56.7  12.2     16.166667  -256.983333\n",
       "22  20160623  60.9  67.1  54.9  12.2     16.055556  -257.094444\n",
       "26  20160627  60.7  70.0   NaN   NaN     15.944444  -257.205556\n",
       "16  20160617  60.4  70.7  55.9  14.8     15.777778  -257.372222\n",
       "13  20160614  59.7  67.8  47.8  20.0     15.388889  -257.761111\n",
       "19  20160620  59.3  69.1  52.2  16.9     15.166667  -257.983333\n",
       "12  20160613  58.3  68.2  47.3  20.9     14.611111  -258.538889\n",
       "15  20160616  57.8  67.5  55.6  11.9     14.333333  -258.816667\n",
       "3   20160604  57.5  70.9  47.3  23.6     14.166667  -258.983333\n",
       "17  20160618  57.3   NaN  54.0   NaN     14.055556  -259.094444\n",
       "6   20160607  56.9  65.1  45.9  19.2     13.833333  -259.316667\n",
       "18  20160619  56.3  59.2  54.1   5.1     13.500000  -259.650000\n",
       "11  20160612  55.4  64.2  46.0  18.2     13.000000  -260.150000\n",
       "7   20160608  54.2   NaN  47.5   NaN     12.333333  -260.816667\n",
       "10  20160611  54.0  62.1  41.7  20.4     12.222222  -260.927778\n",
       "5   20160606  52.2  59.7  42.8  16.9     11.222222  -261.927778\n",
       "4   20160605  51.4  58.3  43.2  15.1     10.777778  -262.372222\n",
       "9   20160610  49.5  55.9  43.0  12.9      9.722222  -263.427778\n",
       "8   20160609  49.4  54.1  45.7   8.4      9.666667  -263.483333"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Sort dataframe, descending\n",
    "data.sort_values(by='TEMP', ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "## Writing data to a file\n",
    "\n",
    "Lastly, it is of course important to be able to write the data that you have analyzed into your computer. This is really handy in Pandas as it [supports many different data formats\n",
    "by default](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html).\n",
    "\n",
    "**The most typical output format by far is CSV file.** Function `to_csv()` can be used to easily save your data in CSV format.\n",
    "Let's first save the data from our `data` DataFrame into a file called `Kumpula_temp_results_June_2016.csv`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# define output filename\n",
    "output_fp = \"Kumpula_temps_June_2016.csv\"\n",
    "\n",
    "# Save dataframe to csv\n",
    "data.to_csv(output_fp, sep=',')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Now we have the data from our DataFrame saved to a file:\n",
    "![Text file output1](img/pandas-save-file-1.png)\n",
    "\n",
    "As you can see, the first value in the datafile contains now the index value of the rows. There are also quite many decimals present in the new columns\n",
    "that we created. Let's deal with these and save the temperature values from `warm_temps` DataFrame without the index and with only 1 decimal in the floating point numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true,
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "outputs": [],
   "source": [
    "# define output filename\n",
    "output_fp2 = \"Kumpula_temps_above15_June_2016.csv\"\n",
    "\n",
    "# Save dataframe to csv\n",
    "warm_temps.to_csv(output_fp2, sep=',', index=False, float_format=\"%.1f\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Omitting the index can be with `index=False` parameter. Specifying how many decimals should be written can be done with `float_format` parameter where text `%.1f` defines Pandas to use 1 decimals\n",
    "in all columns when writing the data to a file (changing the value 1 to 2 would write 2 decimals etc.)\n",
    "\n",
    "![Output after float fomatting](img/pandas-save-file-2.png)\n",
    "\n",
    "As a results you have a \"cleaner\" output file without the index column, and with only 1 decimal for floating point numbers."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's it for this week. We will dive deeper into data analysis with Pandas in the following Lesson."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}