Plotting with Matplotlib

Though there are many options for plotting data in Python, we will be using Matplotlib. In particular, we will be using the pyplot module in Matplotlib, which provides MATLAB-like plotting. The reason for this is simple: Matplotlib is the most common module used for plotting in Python and many examples of plotting you may find online will be using Matplotlib.

Downloading the data and preparing

  1. For our first lesson plotting data using Matplotlib we will again be using the weather data file from Lesson 5.
  2. If you have not already started Spyder you should do so now. You can find directions on how to open Spyder at the start of Lesson 1.
  3. Navigate in Spyder to the directory where you have stored the downloaded data file. You can do this most easily in Spyder by using the filesystem browser above the File/Variable explorer panel. Click on the file folder to select the directory where you would like to work, click Open, then click on the listed file path that is now displated beside the file folder and press Enter.

Spyder plots in separate windows

By default, Spyder plots will be shown in the IPython console, but this can be annoying when saving and interacting with the plots we make. We can change how plots are displayed in Spyder to have them show up in a separate window by changing the preferences.

  1. Start by opening the Spyder preferences.
    • Mac users can go to python -> Preferences… in the menubar
    • Linux/Windows users can go to Tools -> Preferences
  2. In the Preferences window, click on IPython console on the left side of the window, then on the Graphics tab.
  3. Under Graphics backend, select Automatic for the backend.
  4. Restart Spyder.

Plotting data with Matplotlib

  1. To start, we will need to import both Pandas and pyplot.

    In [1]: import pandas as pd
    
    In [2]: import matplotlib.pyplot as plt
    

    Note again that we are renaming the modules when we import them. Perhaps now it is more clear why you might want to rename a module on import. Having to type matplotlib.pyplot every time you use one of its methods would be a pain.

  2. With our modules imported, we now can read in the data file in the same way we had for Lesson 5.

    In [3]: dataFrame = pd.read_csv('Kumpula-June-2016-w-metadata.txt', skiprows=8)
    

    As you may recall, we will now have a Pandas DataFrame with 4 columns.

    In [4]: print(dataFrame.columns)
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    <ipython-input-4-a60faba7eaca> in <module>()
    ----> 1 print(dataFrame.columns)
    
    NameError: name 'dataFrame' is not defined
    
  3. OK, so let’s get to plotting! We can start by using the Matplotlib plt.plot() function.

    In [5]: x = dataFrame['YEARMODA']
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    <ipython-input-5-1903336fb770> in <module>()
    ----> 1 x = dataFrame['YEARMODA']
    
    NameError: name 'dataFrame' is not defined
    
    In [6]: y = dataFrame['TEMP']
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    <ipython-input-6-67865b05ba4e> in <module>()
    ----> 1 y = dataFrame['TEMP']
    
    NameError: name 'dataFrame' is not defined
    
    In [7]: plt.plot(x, y)
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    <ipython-input-7-d0d1dfa5badf> in <module>()
    ----> 1 plt.plot(x, y)
    
    NameError: name 'x' is not defined
    
    In [8]: plt.show()
    
    ../../_images/plt_temp.png

    If all goes well, you should see the plot above.

    OK, so what happened here? Well, first we assigned the values we would like to plot, the year and temperature, to the variables x and y. This isn’t necessary, per se, but does make it easier to see what is plotted. Next, it is perhaps pretty obvious that plt.plot() is a function in pyplot that produces a simple x-y plot. However, just like most variables in Python, creating the plot simply stores the information about the plot in memory. The plot is not displayed on the screen until you type plt.show().

  4. We can make our plot look a bit nicer and provide more information by using a few additional pyplot options.

    In [9]: plt.plot(x, y, 'ro--')
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    <ipython-input-9-7f7aa66b70da> in <module>()
    ----> 1 plt.plot(x, y, 'ro--')
    
    NameError: name 'x' is not defined
    
    In [10]: plt.title('Kumpula temperatures in June 2016')
    Out[10]: Text(0.5, 1.0, 'Kumpula temperatures in June 2016')
    
    In [11]: plt.xlabel('Date')
    Out[11]: Text(0.5, 0, 'Date')
    
    In [12]: plt.ylabel('Temperature [°F]')
    Out[12]: Text(0, 0.5, 'Temperature [°F]')
    
    In [13]: plt.show()
    
    ../../_images/plt_temp_annotated.png

    This should produce the plot above.

    Now we see our temperature data as a red dashed line with circles showing the data points. This comes from the additional ro-- used with plt.plot(). In this case, r tells the plt.plot() function to use red color, o tells it to show circles at the points, and -- says to use a dashed line. You can use help(plt.plot) to find out more about formatting plots. Better yet, check out the documentation for plt.plot() online. We have also added a title and axis labels, but their use is straightforward.

  5. Adding text to plots can be done using plt.text().

    In [14]: plt.text(20160604.0, 68.0, 'High temperature in early June')
    Out[14]: Text(20160604.0, 68.0, 'High temperature in early June')
    

    This will display the text “High temperature in early June” at the location x = 20160604.0 (i.e., June 4, 2016), y = 68.0 on the plot. You would need to type the other plotting commands and plt.show() again to display the plot.

  6. Changing the plot axes can be done using the plt.axis() function.

    In [15]: plt.axis([20160615, 20160630, 55.0, 70.0])
    Out[15]: (20160615.0, 20160630.0, 55.0, 70.0)
    

    The format for plt.axis() is [xmin, xmax, ymin, ymax] enclosed in square brackets (i.e., a Python list). Here, the x range would be changed to the equivalents of June 15, 2016 to June 30, 2016 and the y range would be 55.0-70.0. The complete set of commands to plot would thus be:

    In [16]: plt.plot(x, y, 'ro--')
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    <ipython-input-16-7f7aa66b70da> in <module>()
    ----> 1 plt.plot(x, y, 'ro--')
    
    NameError: name 'x' is not defined
    
    In [17]: plt.title('Kumpula temperatures in June 2016')
    Out[17]: Text(0.5, 1.0, 'Kumpula temperatures in June 2016')
    
    In [18]: plt.xlabel('Date')
    Out[18]: Text(0.5, 0, 'Date')
    
    In [19]: plt.ylabel('Temperature [°F]')
    Out[19]: Text(0, 0.5, 'Temperature [°F]')
    
    In [20]: plt.text(20160604.0, 68.0, 'High temperature in early June')
    Out[20]: Text(20160604.0, 68.0, 'High temperature in early June')
    
    In [21]: plt.axis([20160615, 20160630, 55.0, 70.0])
    Out[21]: (20160615.0, 20160630.0, 55.0, 70.0)
    
    In [22]: plt.show()
    
    ../../_images/plt_temp_annotated_w_text.png

    Note that the text does not appear here because of the axis range.

  7. In addition to line plots, there are many other options for plotting in Matplotlib. Bar charts are one option, which can be used quite similarly to line plots.

    In [23]: plt.bar(x, y)
    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    <ipython-input-23-a6615192c27a> in <module>()
    ----> 1 plt.bar(x, y)
    
    NameError: name 'x' is not defined
    
    In [24]: plt.title('Kumpula temperatures in June 2016')
    Out[24]: Text(0.5, 1.0, 'Kumpula temperatures in June 2016')
    
    In [25]: plt.xlabel('Date')
    Out[25]: Text(0.5, 0, 'Date')
    
    In [26]: plt.ylabel('Temperature [°F]')
    Out[26]: Text(0, 0.5, 'Temperature [°F]')
    
    In [27]: plt.text(20160604.0, 68.0, 'High temperature in early June')
    Out[27]: Text(20160604.0, 68.0, 'High temperature in early June')
    
    In [28]: plt.axis([20160615, 20160630, 55.0, 70.0])
    Out[28]: (20160615.0, 20160630.0, 55.0, 70.0)
    
    In [29]: plt.show()
    
    ../../_images/plt_temp_annotated_w_bars.png

    You can find more about how to format bar charts on the Matplotlib documentation website.

  8. Saving plots created using Matplotlib done several ways, but the easiest is simply to click on the disk icon on the pyplot window when a plot is displayed, as shown below.

    Saving a plot in Python

    This brings up a familiar file saving window. Matplotlib plots can be saved in a number of useful file formats, including JPEG, PNG, PDF, and EPS, as you can see below.

    Matplotlib plot types

    PNG is a nice format for raster images, and EPS is probably easiest to use for vector graphics.

Attention

Plotting like the “pros”

We’re only introducing a tiny amount of what can be done with pyplot. In most cases, when we would like to create some more complicated type of plot, we would search using Google or visit the Matplotlib plot gallery. The great thing about the Matplotlib plot gallery is that not only can you find example plots there, but you can also find the Python commands used to create the plots. This makes it easy to take a working example from the gallery and modify it for your use.

The Matplotlib plot gallery

The Matplotlib plot gallery

Your job in this task is to:

  1. Visit the Matplotlib plot gallery
  2. Find an interesting plot and click on it
  3. Copy the code you find listed beneath the plot on the page that loads
  4. Paste that into an IPython window or the IPython console in Spyder to reproduce the plot.

After you have reproduced the plot, you are welcome to try to make a small change to the plot commands and see what happens. For this, it may be easiest to save a copy of the commands in a .py script file that you can edit and run.

Attention

Task 3: Plotting only part of a dataset

For this task, you should use the values for arrays x and y calculated earlier in this part of the lesson, and use plt.axis() to limit the plot to the following x and y ranges: x = June 7-14, y = 45.0 to 65.0.

  • What do you expect to see in this case?
  • Note: In order to get the plot to display properly, you will need to first type in the plt.plot() command, then plt.axis(), and finally plt.show().