{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# More advanced plotting with pandas/Matplotlib\n", "\n", "```{attention}\n", "Finnish university students are encouraged to use the CSC Notebooks platform.
\n", "\"CSC\n", "\n", "Others can follow the lesson and fill in their student notebooks using Binder.
\n", "\"Binder\n", "```\n", "\n", "At this point you should know the basics of making plots with pandas and the Matplotlib module. Now we will expand on our basic plotting skills to learn how to create more advanced plots. In this part, we will show how to visualize data using pandas/Matplotlib and create plots such as the one below.\n", "\n", "![Subplot example in Matplotlib](img/subplots.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The data\n", "\n", "In this part of the lesson we'll continue working with our weather observation data from the Helsinki-Vantaa airport [downloaded from NOAA](https://www7.ncdc.noaa.gov/CDO/cdopoemain.cmd?datasetabbv=DS3505&countryabbv=&georegionabbv=&resolution=40)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting started\n", "\n", "Let's start again by importing the libraries we'll need." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loading the data\n", "\n", "Now we'll load the data just as we did in the first part of the lesson: \n", "- Set whitespace as the delimiter\n", "- Specify the no data values\n", "- Specify a subset of columns\n", "- Parse the `'YR--MODAHRMN'` column as a datetime index\n", "\n", "Reading in the data might take a few moments." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fp = r\"data/029740.txt\"\n", "\n", "data = pd.read_csv(\n", " fp,\n", " delim_whitespace=True,\n", " na_values=[\"*\", \"**\", \"***\", \"****\", \"*****\", \"******\"],\n", " usecols=[\"YR--MODAHRMN\", \"TEMP\", \"MAX\", \"MIN\"],\n", " parse_dates=[\"YR--MODAHRMN\"],\n", " index_col=\"YR--MODAHRMN\",\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f\"Number of rows: {len(data)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, we are dealing with a relatively large data set.\n", "\n", "Let's have a closer look at the time first rows of data: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's rename the `'TEMP'` column as `TEMP_F`, since we'll later convert our temperatures from Fahrenheit to Celsius." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "new_names = {\"TEMP\": \"TEMP_F\"}\n", "data = data.rename(columns=new_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check again the first rows of data to confirm successful renaming:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preparing the data\n", "\n", "First, we have to deal with no data values. Let's check how many no data values we have:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f\"Number of no data values per column:\\n{data.isna().sum()}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, we have 1644 missing values in the TEMP_F column. Let's get rid of those. We need not worry about the no data values in the `'MAX'` and `'MIN'` columns since we won't be using them for plotting. \n", "\n", "We can remove rows from our DataFrame where `'TEMP_F'` is missing values using the `dropna()` method, as shown below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data.dropna(subset=[\"TEMP_F\"], inplace=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f\"Number of rows after removing no data values: {len(data)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's better." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check your understanding\n", "\n", "What would happen if we removed all rows with any no data values from our data (also considering no data values in the `MAX` and `MIN` columns)?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# After removing all no data values we are left with only a fraction of the original data.\n", "# Note! Here we are not applying .dropna() \"inplace\"\n", "# so we are not making any permanent changes to our DataFrame.\n", "len(data.dropna())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# We continue with the DataFrame where rows with missing TEMP_F values have been removed\n", "len(data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Converting temperatures to Celsius\n", "\n", "Now that we have loaded our data, we can convert the values of temperature in Fahrenheit to Celsius, like we have in earlier lessons." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data[\"TEMP_C\"] = (data[\"TEMP_F\"] - 32.0) / 1.8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's check how our dataframe looks like at this point:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using subplots\n", "\n", "Let's continue working with the weather data and learn how to use *subplots*. Subplots are figures where you have multiple plots in different panels of the same figure, as was shown at the start of the lesson." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extracting seasonal temperatures\n", "\n", "Let's now select data from different seasons of the year in 2012/2013:\n", "\n", "- Winter (December 2012 - February 2013)\n", "- Spring (March 2013 - May 2013)\n", "- Summer (June 2013 - August 2013)\n", "- Autumn (Septempber 2013 - November 2013)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "winter = data.loc[(data.index >= \"201212010000\") & (data.index < \"201303010000\")]\n", "winter_temps = winter[\"TEMP_C\"]\n", "\n", "spring = data.loc[(data.index >= \"201303010000\") & (data.index < \"201306010000\")]\n", "spring_temps = spring[\"TEMP_C\"]\n", "\n", "summer = data.loc[(data.index >= \"201306010000\") & (data.index < \"201309010000\")]\n", "summer_temps = summer[\"TEMP_C\"]\n", "\n", "autumn = data.loc[(data.index >= \"201309010000\") & (data.index < \"201312010000\")]\n", "autumn_temps = autumn[\"TEMP_C\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can plot our data to see how the different seasons look separately." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ax1 = winter_temps.plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ax2 = spring_temps.plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ax3 = summer_temps.plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ax4 = autumn_temps.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, so from these plots we can already see that the temperatures the four seasons are quite different, which is rather obvious of course. It is important to also notice that the scale of the *y*-axis changes in these four plots. If we would like to compare different seasons to each other we need to make sure that the temperature scale is similar in the plots for each season.\n", "\n", "### Finding data bounds\n", "\n", "Let's set our *y*-axis limits so that the upper limit is the maximum temperature + 5 degrees in our data (full year), and the lowest is the minimum temperature - 5 degrees." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Find lower limit for y-axis\n", "min_temp = min(\n", " winter_temps.min(), spring_temps.min(), summer_temps.min(), autumn_temps.min()\n", ")\n", "min_temp = min_temp - 5.0\n", "\n", "# Find upper limit for y-axis\n", "max_temp = max(\n", " winter_temps.max(), spring_temps.max(), summer_temps.max(), autumn_temps.max()\n", ")\n", "max_temp = max_temp + 5.0\n", "\n", "# Print y-axis min, max\n", "print(f\"Min: {min_temp}, Max: {max_temp}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now use this temperature range to standardize the y-axis range on our plot.\n", "\n", "### Creating our first set of subplots\n", "\n", "Let's now continue and see how we can plot all these different plots into the same figure. We can create a 2x2 panel for our visualization using Matplotlib’s `subplots()` function where we specify how many rows and columns we want to have in our figure. We can also specify the size of our figure with `figsize` parameter as we have seen earlier with pandas. As a reminder, `figsize` takes the `width` and `height` values (in inches) as inputs." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))\n", "axs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that as a result we have now a list containing two nested lists where the first one contains the axis for column 1 and 2 on **row 1** and the second list contains the axis for columns 1 and 2 for **row 2**.\n", "\n", "We can parse these axes into their own variables so it is easier to work with them." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ax11 = axs[0][0]\n", "ax12 = axs[0][1]\n", "ax21 = axs[1][0]\n", "ax22 = axs[1][1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we have four axis variables for the different panels in our figure. Next we can use them to plot the seasonal data. Let's begin by plotting the seasons, and give different colors for the lines and specify the *y*-axis range to be the same for all subplots. We can do this using what we know and some parameters below:\n", "\n", "- The `c` parameter changes the color of the line. Matplotlib has a [large list of named colors](https://matplotlib.org/stable/gallery/color/named_colors.html) you can consult to customize your color scheme.\n", "- The `lw` parameter controls the width of the lines\n", "- The `ylim` parameter controls the y-axis range" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Set the plot line width\n", "line_width = 1.5\n", "\n", "# Plot data\n", "winter_temps.plot(ax=ax11, c=\"blue\", lw=line_width, ylim=[min_temp, max_temp])\n", "spring_temps.plot(ax=ax12, c=\"orange\", lw=line_width, ylim=[min_temp, max_temp])\n", "summer_temps.plot(ax=ax21, c=\"green\", lw=line_width, ylim=[min_temp, max_temp])\n", "autumn_temps.plot(ax=ax22, c=\"brown\", lw=line_width, ylim=[min_temp, max_temp])\n", "\n", "# Display the figure\n", "fig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Great, now we have all the plots in same figure! However, we can see that there are some problems with our *x*-axis labels and a few missing items we can add. Let's do that below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create the new figure and subplots\n", "fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))\n", "\n", "# Rename the axes for ease of use\n", "ax11 = axs[0][0]\n", "ax12 = axs[0][1]\n", "ax21 = axs[1][0]\n", "ax22 = axs[1][1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we'll add our seasonal temperatures to the plot commands for each time period." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Set plot line width\n", "line_width = 1.5\n", "\n", "# Plot data\n", "winter_temps.plot(\n", " ax=ax11, c=\"blue\", lw=line_width, ylim=[min_temp, max_temp], grid=True\n", ")\n", "spring_temps.plot(\n", " ax=ax12, c=\"orange\", lw=line_width, ylim=[min_temp, max_temp], grid=True\n", ")\n", "summer_temps.plot(\n", " ax=ax21, c=\"green\", lw=line_width, ylim=[min_temp, max_temp], grid=True\n", ")\n", "autumn_temps.plot(\n", " ax=ax22, c=\"brown\", lw=line_width, ylim=[min_temp, max_temp], grid=True\n", ")\n", "\n", "# Set figure title\n", "fig.suptitle(\"2012-2013 Seasonal temperature observations - Helsinki-Vantaa airport\")\n", "\n", "# Rotate the x-axis labels so they don't overlap\n", "plt.setp(ax11.xaxis.get_majorticklabels(), rotation=20)\n", "plt.setp(ax12.xaxis.get_majorticklabels(), rotation=20)\n", "plt.setp(ax21.xaxis.get_majorticklabels(), rotation=20)\n", "plt.setp(ax22.xaxis.get_majorticklabels(), rotation=20)\n", "\n", "# Axis labels\n", "ax21.set_xlabel(\"Date\")\n", "ax22.set_xlabel(\"Date\")\n", "ax11.set_ylabel(\"Temperature [°C]\")\n", "ax21.set_ylabel(\"Temperature [°C]\")\n", "\n", "# Season label text\n", "ax11.text(pd.to_datetime(\"20130215\"), -25, \"Winter\")\n", "ax12.text(pd.to_datetime(\"20130515\"), -25, \"Spring\")\n", "ax21.text(pd.to_datetime(\"20130815\"), -25, \"Summer\")\n", "ax22.text(pd.to_datetime(\"20131115\"), -25, \"Autumn\")\n", "\n", "# Display plot\n", "fig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Not bad." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check your understading" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Visualize winter and summer temperatures in a 1x2 panel figure.\n", "Save the figure as a .png file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Two subplots side-by-side\n", "fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(12, 4))\n", "\n", "# Set plot line width\n", "line_width = 1.5\n", "\n", "# Plot data\n", "winter_temps.plot(\n", " ax=axs[0], c=\"blue\", lw=line_width, ylim=[min_temp, max_temp], grid=True\n", ")\n", "summer_temps.plot(\n", " ax=axs[1], c=\"green\", lw=line_width, ylim=[min_temp, max_temp], grid=True\n", ")\n", "\n", "# Set figure title\n", "fig.suptitle(\n", " \"2012-2013 Winter and summer temperature observations - Helsinki-Vantaa airport\"\n", ")\n", "\n", "# Rotate the x-axis labels so they don't overlap\n", "plt.setp(axs[0].xaxis.get_majorticklabels(), rotation=20)\n", "plt.setp(axs[1].xaxis.get_majorticklabels(), rotation=20)\n", "\n", "# Axis labels\n", "axs[0].set_xlabel(\"Date\")\n", "axs[1].set_xlabel(\"Date\")\n", "axs[0].set_ylabel(\"Temperature [°C]\")\n", "axs[1].set_ylabel(\"Temperature [°C]\")\n", "\n", "# Season label text\n", "axs[0].text(pd.to_datetime(\"20130215\"), -25, \"Winter\")\n", "axs[1].text(pd.to_datetime(\"20130815\"), -25, \"Summer\")\n", "\n", "plt.savefig(\"HelsinkiVantaa_WinterSummer_2012-2013.png\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extra: pandas/Matplotlib plot style sheets\n", "\n", "One cool thing about plotting using pandas/Matplotlib is that is it possible to change the overall style of your plot to one of several available plot style options very easily. Let's consider an example below using the four-panel plot we produced in this lesson.\n", "\n", "We will start by defining the plot style, using the `'dark_background'` plot style here." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.style.use(\"dark_background\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is no output from this command, but now when we create a plot it will use the `dark_background` styling. Let's see what that looks like." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Create the new figure and subplots\n", "fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(14, 9))\n", "\n", "# Rename the axes for ease of use\n", "ax11 = axs[0][0]\n", "ax12 = axs[0][1]\n", "ax21 = axs[1][0]\n", "ax22 = axs[1][1]\n", "\n", "# Set plot line width\n", "line_width = 1.5\n", "\n", "# Plot data\n", "winter_temps.plot(\n", " ax=ax11, c=\"blue\", lw=line_width, ylim=[min_temp, max_temp], grid=True\n", ")\n", "spring_temps.plot(\n", " ax=ax12, c=\"orange\", lw=line_width, ylim=[min_temp, max_temp], grid=True\n", ")\n", "summer_temps.plot(\n", " ax=ax21, c=\"green\", lw=line_width, ylim=[min_temp, max_temp], grid=True\n", ")\n", "autumn_temps.plot(\n", " ax=ax22, c=\"brown\", lw=line_width, ylim=[min_temp, max_temp], grid=True\n", ")\n", "\n", "# Set figure title\n", "fig.suptitle(\"2012-2013 Seasonal temperature observations - Helsinki-Vantaa airport\")\n", "\n", "# Rotate the x-axis labels so they don't overlap\n", "plt.setp(ax11.xaxis.get_majorticklabels(), rotation=20)\n", "plt.setp(ax12.xaxis.get_majorticklabels(), rotation=20)\n", "plt.setp(ax21.xaxis.get_majorticklabels(), rotation=20)\n", "plt.setp(ax22.xaxis.get_majorticklabels(), rotation=20)\n", "\n", "# Axis labels\n", "ax21.set_xlabel(\"Date\")\n", "ax22.set_xlabel(\"Date\")\n", "ax11.set_ylabel(\"Temperature [°C]\")\n", "ax21.set_ylabel(\"Temperature [°C]\")\n", "\n", "# Season label text\n", "ax11.text(pd.to_datetime(\"20130215\"), -25, \"Winter\")\n", "ax12.text(pd.to_datetime(\"20130515\"), -25, \"Spring\")\n", "ax21.text(pd.to_datetime(\"20130815\"), -25, \"Summer\")\n", "ax22.text(pd.to_datetime(\"20131115\"), -25, \"Autumn\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, the plot format has now changed to use the `dark_background` style. You can find other plot style options in the [complete list of available Matplotlib style sheets](https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html). Have fun!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }