Basic plotting with pandas and Matplotlib

Attention

Finnish university students are encouraged to use the CSC Notebooks platform.
CSC badge

Others can follow the lesson and fill in their student notebooks using Binder.
Binder badge

As we’re now familiar with some of the features of pandas, we will wade into visualizing our data in Python using the built-in plotting options available directly in pandas. Much like the case of pandas being built upon NumPy, plotting in pandas takes advantage of plotting features from the Matplotlib plotting library. Plotting in pandas provides a basic framework for visualizing our data, but as you’ll see we will sometimes need to also use features from Matplotlib to enhance our plots. In particular, we will use features from the the pyplot module in Matplotlib, which provides MATLAB-like plotting.

Toward the end of the lesson we will also briefly explore creating interactive plots using the Pandas-Bokeh plotting backend, which allows us to produce plots similar to those available in the Bokeh plotting library using plotting syntax similar to that used normally in pandas. This is an optional part of the lesson, but will allow you to see an example for further exploration of interactive plotting using Pandas-Bokeh.

Input data

In the lesson this week we are using some of the same weather observation data from Finland downloaded from NOAA that we used in Lesson 6. In this case we’ll focus on weather observation station data from the Helsinki-Vantaa airport.

Downloading the data

Attention

It is recommended to use the Geo-Python Lite blueprint for this lesson.

Just like last week, the first step for today’s lesson is to get the data. Unlike last week, we’ll all download and use the same data.

You can download the data by opening a new terminal window in Jupyter Lab by going to File -> New -> Terminal in the JupyterLab menu bar. Once the terminal is open, you will need to navigate to the directory for Lesson 7 by typing

cd notebooks/L7/

or the equivalent command to navigate to the location of the Lesson 7 files on your computer (for those running Jupyter on their own computers).

You can now confirm you’re in the correct directory by typing

ls

You should see something like the following output:

advanced-plotting.ipynb matplotlib.ipynb
img                     metadata

If so, you’re in the correct directory and you can download the data files by typing

wget https://davewhipp.github.io/data/Finland-weather-data-L7.tar.gz

After the download completes, you can extract the data files by typing

tar zxvf Finland-weather-data-L7.tar.gz

At this stage you should have a new directory called data that contains the data for this week’s lesson. You can confirm this by typing

ls data

You should see something like the following:

029740.txt           6367598020644inv.txt
3505doc.txt          6367598020644stn.txt

Now you should be all set to proceed with the lesson!

Binder users

It is not recommended to complete this lesson using Binder.

About the data

As part of the download there are a number of files that describe the weather data. These metadata files include:

  • A list of stations: data/6367598020644stn.txt

  • Details about weather observations at each station: data/6367598020644inv.txt

  • A data description (i.e., column names): data/3505doc.txt

The input data for this week are separated with varying number of spaces (i.e., fixed width). The first lines and columns of the data look like following:

  USAF  WBAN YR--MODAHRMN DIR SPD GUS CLG SKC L M H  VSB MW MW MW MW AW AW AW AW W TEMP DEWP    SLP   ALT    STP MAX MIN PCP01 PCP06 PCP24 PCPXX SD
029740 99999 195201010000 200  23 ***  15 OVC 7 2 *  5.0 63 ** ** ** ** ** ** ** 6   36   32  989.2 ***** ****** *** *** ***** ***** ***** ***** **
029740 99999 195201010600 220  18 ***   8 OVC 7 2 *  2.2 63 ** ** ** ** ** ** ** 6   37   37  985.9 ***** ****** ***  34 ***** ***** ***** ***** **
029740 99999 195201011200 220  21 ***   5 OVC 7 * *  3.8 59 ** ** ** ** ** ** ** 5   39   36  988.1 ***** ****** *** *** ***** ***** ***** ***** **
029740 99999 195201011800 250  16 *** 722 CLR 0 0 0 12.5 02 ** ** ** ** ** ** ** 5   36   27  991.9 ***** ******  39 *** ***** ***** ***** ***** **

Getting started

Let’s start by importing Pandas and reading our data file.

import pandas as pd

Datetime in Python

For the lesson this week we will be using a datetime index for our weather observations. We did not cover the datetime data type in detail in Lesson 6, but you can find a brief introduction to datetime in the Lesson 6 materials.

Just as we did last week, we’ll read our data file by passing a few parameters to the Pandas read_csv() function. In this case, however, we’ll include a few additional parameters in order to read the data with a datetime index. Let’s read the data first, then see what happened.

fp = r"data/029740.txt"

data = pd.read_csv(
    fp,
    delim_whitespace=True,
    na_values=["*", "**", "***", "****", "*****", "******"],
    usecols=["YR--MODAHRMN", "TEMP", "MAX", "MIN"],
    parse_dates=["YR--MODAHRMN"],
    index_col="YR--MODAHRMN",
)

So what’s different here? Well, we have added two new parameters: parse_dates and index_col.

  • parse_dates takes a Python list of column name(s) containing date data that Pandas will parse and convert to the datetime data type. For many common date formats this parameter will automatically recognize and convert the date data.

  • index_col is used to state a column that should be used to index the data in the DataFrame. In this case, we end up with our date data as the DataFrame index. This is a very useful feature in Pandas as we’ll see below.

Having read in the data, let’s have a quick look at what we have using data.head().

data.head()
TEMP MAX MIN
YR--MODAHRMN
1952-01-01 00:00:00 36.0 NaN NaN
1952-01-01 06:00:00 37.0 NaN 34.0
1952-01-01 12:00:00 39.0 NaN NaN
1952-01-01 18:00:00 36.0 39.0 NaN
1952-01-02 00:00:00 36.0 NaN NaN

As mentioned above, you can now see that the index column for our DataFrame (the first column) contains date values related to each row in the DataFrame.

Basic x-y plot

Now we’re ready for our first plot. We can start by using the basic line plot in Pandas to look at our temperature data.

ax = data.plot()
../../_images/matplotlib_10_0.png

If all goes well, you should see the plot above.

OK, so what happened here?

  1. We first created the plot object using the plot() method of the data DataFrame. Without any parameters given, this makes the plot of all columns in the DataFrame as lines of different color on the y-axis with the index, time in this case, on the x-axis.

  2. In case we want to be able to modify the plot or add anything, we assign the plot object to the variable ax. We can check its type below.

In fact, let’s check the type of the ax variable now.

type(ax)
matplotlib.axes._subplots.AxesSubplot

OK, so it looks like we have some kind of plot data type that is part of Matplotlib. Clearly, pandas is using Matplotlib for generating our plots.

Selecting our plotted data

Now, let’s make a few small changes to our plot and plot the data again. First, let’s only plot the observed temperatures in the data['TEMP'] column, and let’s restrict ourselves to observations from the afternoon of October 1, 2019 (the last day in our dataset). We can do this by selecting the desired data column and date range first, then plotting our selection.

oct1_temps = data["TEMP"].loc[data.index >= "201910011200"]
ax = oct1_temps.plot()
../../_images/matplotlib_14_0.png

So, what did we change?

  1. Well, we selected only the 'TEMP' column now by using data['TEMP'] instead of data.

  2. We’ve added a restriction to the date range using loc[] to select only rows where the index value data.index is greater than '201910011200'. In that case, the number in the string is in the format 'YYYYMMDDHHMM', where YYYY is the year, MM is the month, DD is the day, HH is the hour, and MM is the minute. Now we have all observations from noon onward on October 1, 2019.

  3. By saving this selection to the DataFrame oct1_temps we’re able to now use oct1_temps.plot() to plot only our selection. This is cool, but we can do even better…

Basic plot formatting

We can make our plot look a bit nicer and provide more information by using a few additional plotting options to pandas/Matplotlib.

# Change line and symbol format, and add axis labels/title
ax = oct1_temps.plot(
    style="ro--",
    title="Helsinki-Vantaa temperatures",
    xlabel="Date",
    ylabel="Temperature [°F]",
)
../../_images/matplotlib_17_0.png

Now we see our temperature data as a red dashed line with circles showing the data points. This comes from the additional style='ro--' used with oct1_temps.plot(). In this case, r tells the oct1_temps.plot() function to use red color for the lines and symbols, o tells it to show circles at the points, and -- says to use a dashed line. You can use help(oct1_temps.plot) to find out more about formatting plots or have a look at the documentation on the pandas website. We have also added a title using the title parameter, and axis labels using the xlabel and ylabel parameters.

help(oct1_temps.plot)
Help on PlotAccessor in module pandas.plotting._core object:

class PlotAccessor(pandas.core.base.PandasObject)
 |  PlotAccessor(data)
 |  
 |  Make plots of Series or DataFrame.
 |  
 |  Uses the backend specified by the
 |  option ``plotting.backend``. By default, matplotlib is used.
 |  
 |  Parameters
 |  ----------
 |  data : Series or DataFrame
 |      The object for which the method is called.
 |  x : label or position, default None
 |      Only used if data is a DataFrame.
 |  y : label, position or list of label, positions, default None
 |      Allows plotting of one column versus another. Only used if data is a
 |      DataFrame.
 |  kind : str
 |      The kind of plot to produce:
 |  
 |      - 'line' : line plot (default)
 |      - 'bar' : vertical bar plot
 |      - 'barh' : horizontal bar plot
 |      - 'hist' : histogram
 |      - 'box' : boxplot
 |      - 'kde' : Kernel Density Estimation plot
 |      - 'density' : same as 'kde'
 |      - 'area' : area plot
 |      - 'pie' : pie plot
 |      - 'scatter' : scatter plot (DataFrame only)
 |      - 'hexbin' : hexbin plot (DataFrame only)
 |  ax : matplotlib axes object, default None
 |      An axes of the current figure.
 |  subplots : bool, default False
 |      Make separate subplots for each column.
 |  sharex : bool, default True if ax is None else False
 |      In case ``subplots=True``, share x axis and set some x axis labels
 |      to invisible; defaults to True if ax is None otherwise False if
 |      an ax is passed in; Be aware, that passing in both an ax and
 |      ``sharex=True`` will alter all x axis labels for all axis in a figure.
 |  sharey : bool, default False
 |      In case ``subplots=True``, share y axis and set some y axis labels to invisible.
 |  layout : tuple, optional
 |      (rows, columns) for the layout of subplots.
 |  figsize : a tuple (width, height) in inches
 |      Size of a figure object.
 |  use_index : bool, default True
 |      Use index as ticks for x axis.
 |  title : str or list
 |      Title to use for the plot. If a string is passed, print the string
 |      at the top of the figure. If a list is passed and `subplots` is
 |      True, print each item in the list above the corresponding subplot.
 |  grid : bool, default None (matlab style default)
 |      Axis grid lines.
 |  legend : bool or {'reverse'}
 |      Place legend on axis subplots.
 |  style : list or dict
 |      The matplotlib line style per column.
 |  logx : bool or 'sym', default False
 |      Use log scaling or symlog scaling on x axis.
 |      .. versionchanged:: 0.25.0
 |  
 |  logy : bool or 'sym' default False
 |      Use log scaling or symlog scaling on y axis.
 |      .. versionchanged:: 0.25.0
 |  
 |  loglog : bool or 'sym', default False
 |      Use log scaling or symlog scaling on both x and y axes.
 |      .. versionchanged:: 0.25.0
 |  
 |  xticks : sequence
 |      Values to use for the xticks.
 |  yticks : sequence
 |      Values to use for the yticks.
 |  xlim : 2-tuple/list
 |      Set the x limits of the current axes.
 |  ylim : 2-tuple/list
 |      Set the y limits of the current axes.
 |  xlabel : label, optional
 |      Name to use for the xlabel on x-axis. Default uses index name as xlabel, or the
 |      x-column name for planar plots.
 |  
 |      .. versionadded:: 1.1.0
 |  
 |      .. versionchanged:: 1.2.0
 |  
 |         Now applicable to planar plots (`scatter`, `hexbin`).
 |  
 |  ylabel : label, optional
 |      Name to use for the ylabel on y-axis. Default will show no ylabel, or the
 |      y-column name for planar plots.
 |  
 |      .. versionadded:: 1.1.0
 |  
 |      .. versionchanged:: 1.2.0
 |  
 |         Now applicable to planar plots (`scatter`, `hexbin`).
 |  
 |  rot : int, default None
 |      Rotation for ticks (xticks for vertical, yticks for horizontal
 |      plots).
 |  fontsize : int, default None
 |      Font size for xticks and yticks.
 |  colormap : str or matplotlib colormap object, default None
 |      Colormap to select colors from. If string, load colormap with that
 |      name from matplotlib.
 |  colorbar : bool, optional
 |      If True, plot colorbar (only relevant for 'scatter' and 'hexbin'
 |      plots).
 |  position : float
 |      Specify relative alignments for bar plot layout.
 |      From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5
 |      (center).
 |  table : bool, Series or DataFrame, default False
 |      If True, draw a table using the data in the DataFrame and the data
 |      will be transposed to meet matplotlib's default layout.
 |      If a Series or DataFrame is passed, use passed data to draw a
 |      table.
 |  yerr : DataFrame, Series, array-like, dict and str
 |      See :ref:`Plotting with Error Bars <visualization.errorbars>` for
 |      detail.
 |  xerr : DataFrame, Series, array-like, dict and str
 |      Equivalent to yerr.
 |  stacked : bool, default False in line and bar plots, and True in area plot
 |      If True, create stacked plot.
 |  sort_columns : bool, default False
 |      Sort column names to determine plot ordering.
 |  secondary_y : bool or sequence, default False
 |      Whether to plot on the secondary y-axis if a list/tuple, which
 |      columns to plot on secondary y-axis.
 |  mark_right : bool, default True
 |      When using a secondary_y axis, automatically mark the column
 |      labels with "(right)" in the legend.
 |  include_bool : bool, default is False
 |      If True, boolean values can be plotted.
 |  backend : str, default None
 |      Backend to use instead of the backend specified in the option
 |      ``plotting.backend``. For instance, 'matplotlib'. Alternatively, to
 |      specify the ``plotting.backend`` for the whole session, set
 |      ``pd.options.plotting.backend``.
 |  
 |      .. versionadded:: 1.0.0
 |  
 |  **kwargs
 |      Options to pass to matplotlib plotting method.
 |  
 |  Returns
 |  -------
 |  :class:`matplotlib.axes.Axes` or numpy.ndarray of them
 |      If the backend is not the default matplotlib one, the return value
 |      will be the object returned by the backend.
 |  
 |  Notes
 |  -----
 |  - See matplotlib documentation online for more on this subject
 |  - If `kind` = 'bar' or 'barh', you can specify relative alignments
 |    for bar plot layout by `position` keyword.
 |    From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5
 |    (center)
 |  
 |  Method resolution order:
 |      PlotAccessor
 |      pandas.core.base.PandasObject
 |      pandas.core.accessor.DirNamesMixin
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __call__(self, *args, **kwargs)
 |      Make plots of Series or DataFrame.
 |      
 |      Uses the backend specified by the
 |      option ``plotting.backend``. By default, matplotlib is used.
 |      
 |      Parameters
 |      ----------
 |      data : Series or DataFrame
 |          The object for which the method is called.
 |      x : label or position, default None
 |          Only used if data is a DataFrame.
 |      y : label, position or list of label, positions, default None
 |          Allows plotting of one column versus another. Only used if data is a
 |          DataFrame.
 |      kind : str
 |          The kind of plot to produce:
 |      
 |          - 'line' : line plot (default)
 |          - 'bar' : vertical bar plot
 |          - 'barh' : horizontal bar plot
 |          - 'hist' : histogram
 |          - 'box' : boxplot
 |          - 'kde' : Kernel Density Estimation plot
 |          - 'density' : same as 'kde'
 |          - 'area' : area plot
 |          - 'pie' : pie plot
 |          - 'scatter' : scatter plot (DataFrame only)
 |          - 'hexbin' : hexbin plot (DataFrame only)
 |      ax : matplotlib axes object, default None
 |          An axes of the current figure.
 |      subplots : bool, default False
 |          Make separate subplots for each column.
 |      sharex : bool, default True if ax is None else False
 |          In case ``subplots=True``, share x axis and set some x axis labels
 |          to invisible; defaults to True if ax is None otherwise False if
 |          an ax is passed in; Be aware, that passing in both an ax and
 |          ``sharex=True`` will alter all x axis labels for all axis in a figure.
 |      sharey : bool, default False
 |          In case ``subplots=True``, share y axis and set some y axis labels to invisible.
 |      layout : tuple, optional
 |          (rows, columns) for the layout of subplots.
 |      figsize : a tuple (width, height) in inches
 |          Size of a figure object.
 |      use_index : bool, default True
 |          Use index as ticks for x axis.
 |      title : str or list
 |          Title to use for the plot. If a string is passed, print the string
 |          at the top of the figure. If a list is passed and `subplots` is
 |          True, print each item in the list above the corresponding subplot.
 |      grid : bool, default None (matlab style default)
 |          Axis grid lines.
 |      legend : bool or {'reverse'}
 |          Place legend on axis subplots.
 |      style : list or dict
 |          The matplotlib line style per column.
 |      logx : bool or 'sym', default False
 |          Use log scaling or symlog scaling on x axis.
 |          .. versionchanged:: 0.25.0
 |      
 |      logy : bool or 'sym' default False
 |          Use log scaling or symlog scaling on y axis.
 |          .. versionchanged:: 0.25.0
 |      
 |      loglog : bool or 'sym', default False
 |          Use log scaling or symlog scaling on both x and y axes.
 |          .. versionchanged:: 0.25.0
 |      
 |      xticks : sequence
 |          Values to use for the xticks.
 |      yticks : sequence
 |          Values to use for the yticks.
 |      xlim : 2-tuple/list
 |          Set the x limits of the current axes.
 |      ylim : 2-tuple/list
 |          Set the y limits of the current axes.
 |      xlabel : label, optional
 |          Name to use for the xlabel on x-axis. Default uses index name as xlabel, or the
 |          x-column name for planar plots.
 |      
 |          .. versionadded:: 1.1.0
 |      
 |          .. versionchanged:: 1.2.0
 |      
 |             Now applicable to planar plots (`scatter`, `hexbin`).
 |      
 |      ylabel : label, optional
 |          Name to use for the ylabel on y-axis. Default will show no ylabel, or the
 |          y-column name for planar plots.
 |      
 |          .. versionadded:: 1.1.0
 |      
 |          .. versionchanged:: 1.2.0
 |      
 |             Now applicable to planar plots (`scatter`, `hexbin`).
 |      
 |      rot : int, default None
 |          Rotation for ticks (xticks for vertical, yticks for horizontal
 |          plots).
 |      fontsize : int, default None
 |          Font size for xticks and yticks.
 |      colormap : str or matplotlib colormap object, default None
 |          Colormap to select colors from. If string, load colormap with that
 |          name from matplotlib.
 |      colorbar : bool, optional
 |          If True, plot colorbar (only relevant for 'scatter' and 'hexbin'
 |          plots).
 |      position : float
 |          Specify relative alignments for bar plot layout.
 |          From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5
 |          (center).
 |      table : bool, Series or DataFrame, default False
 |          If True, draw a table using the data in the DataFrame and the data
 |          will be transposed to meet matplotlib's default layout.
 |          If a Series or DataFrame is passed, use passed data to draw a
 |          table.
 |      yerr : DataFrame, Series, array-like, dict and str
 |          See :ref:`Plotting with Error Bars <visualization.errorbars>` for
 |          detail.
 |      xerr : DataFrame, Series, array-like, dict and str
 |          Equivalent to yerr.
 |      stacked : bool, default False in line and bar plots, and True in area plot
 |          If True, create stacked plot.
 |      sort_columns : bool, default False
 |          Sort column names to determine plot ordering.
 |      secondary_y : bool or sequence, default False
 |          Whether to plot on the secondary y-axis if a list/tuple, which
 |          columns to plot on secondary y-axis.
 |      mark_right : bool, default True
 |          When using a secondary_y axis, automatically mark the column
 |          labels with "(right)" in the legend.
 |      include_bool : bool, default is False
 |          If True, boolean values can be plotted.
 |      backend : str, default None
 |          Backend to use instead of the backend specified in the option
 |          ``plotting.backend``. For instance, 'matplotlib'. Alternatively, to
 |          specify the ``plotting.backend`` for the whole session, set
 |          ``pd.options.plotting.backend``.
 |      
 |          .. versionadded:: 1.0.0
 |      
 |      **kwargs
 |          Options to pass to matplotlib plotting method.
 |      
 |      Returns
 |      -------
 |      :class:`matplotlib.axes.Axes` or numpy.ndarray of them
 |          If the backend is not the default matplotlib one, the return value
 |          will be the object returned by the backend.
 |      
 |      Notes
 |      -----
 |      - See matplotlib documentation online for more on this subject
 |      - If `kind` = 'bar' or 'barh', you can specify relative alignments
 |        for bar plot layout by `position` keyword.
 |        From 0 (left/bottom-end) to 1 (right/top-end). Default is 0.5
 |        (center)
 |  
 |  __init__(self, data)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  area(self, x=None, y=None, **kwargs)
 |      Draw a stacked area plot.
 |      
 |      An area plot displays quantitative data visually.
 |      This function wraps the matplotlib area function.
 |      
 |      Parameters
 |      ----------
 |      x : label or position, optional
 |          Coordinates for the X axis. By default uses the index.
 |      y : label or position, optional
 |          Column to plot. By default uses all columns.
 |      stacked : bool, default True
 |          Area plots are stacked by default. Set to False to create a
 |          unstacked plot.
 |      **kwargs
 |          Additional keyword arguments are documented in
 |          :meth:`DataFrame.plot`.
 |      
 |      Returns
 |      -------
 |      matplotlib.axes.Axes or numpy.ndarray
 |          Area plot, or array of area plots if subplots is True.
 |      
 |      See Also
 |      --------
 |      DataFrame.plot : Make plots of DataFrame using matplotlib / pylab.
 |      
 |      Examples
 |      --------
 |      Draw an area plot based on basic business metrics:
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> df = pd.DataFrame({
 |          ...     'sales': [3, 2, 3, 9, 10, 6],
 |          ...     'signups': [5, 5, 6, 12, 14, 13],
 |          ...     'visits': [20, 42, 28, 62, 81, 50],
 |          ... }, index=pd.date_range(start='2018/01/01', end='2018/07/01',
 |          ...                        freq='M'))
 |          >>> ax = df.plot.area()
 |      
 |      Area plots are stacked by default. To produce an unstacked plot,
 |      pass ``stacked=False``:
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> ax = df.plot.area(stacked=False)
 |      
 |      Draw an area plot for a single column:
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> ax = df.plot.area(y='sales')
 |      
 |      Draw with a different `x`:
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> df = pd.DataFrame({
 |          ...     'sales': [3, 2, 3],
 |          ...     'visits': [20, 42, 28],
 |          ...     'day': [1, 2, 3],
 |          ... })
 |          >>> ax = df.plot.area(x='day')
 |  
 |  bar(self, x=None, y=None, **kwargs)
 |      Vertical bar plot.
 |      
 |      A bar plot is a plot that presents categorical data with
 |      rectangular bars with lengths proportional to the values that they
 |      represent. A bar plot shows comparisons among discrete categories. One
 |      axis of the plot shows the specific categories being compared, and the
 |      other axis represents a measured value.
 |      
 |      Parameters
 |      ----------
 |      x : label or position, optional
 |          Allows plotting of one column versus another. If not specified,
 |          the index of the DataFrame is used.
 |      y : label or position, optional
 |          Allows plotting of one column versus another. If not specified,
 |          all numerical columns are used.
 |      color : str, array-like, or dict, optional
 |          The color for each of the DataFrame's columns. Possible values are:
 |      
 |          - A single color string referred to by name, RGB or RGBA code,
 |              for instance 'red' or '#a98d19'.
 |      
 |          - A sequence of color strings referred to by name, RGB or RGBA
 |              code, which will be used for each column recursively. For
 |              instance ['green','yellow'] each column's bar will be filled in
 |              green or yellow, alternatively. If there is only a single column to
 |              be plotted, then only the first color from the color list will be
 |              used.
 |      
 |          - A dict of the form {column name : color}, so that each column will be
 |              colored accordingly. For example, if your columns are called `a` and
 |              `b`, then passing {'a': 'green', 'b': 'red'} will color bars for
 |              column `a` in green and bars for column `b` in red.
 |      
 |          .. versionadded:: 1.1.0
 |      
 |      **kwargs
 |          Additional keyword arguments are documented in
 |          :meth:`DataFrame.plot`.
 |      
 |      Returns
 |      -------
 |      matplotlib.axes.Axes or np.ndarray of them
 |          An ndarray is returned with one :class:`matplotlib.axes.Axes`
 |          per column when ``subplots=True``.
 |      
 |              See Also
 |              --------
 |              DataFrame.plot.barh : Horizontal bar plot.
 |              DataFrame.plot : Make plots of a DataFrame.
 |              matplotlib.pyplot.bar : Make a bar plot with matplotlib.
 |      
 |              Examples
 |              --------
 |              Basic plot.
 |      
 |              .. plot::
 |                  :context: close-figs
 |      
 |                  >>> df = pd.DataFrame({'lab':['A', 'B', 'C'], 'val':[10, 30, 20]})
 |                  >>> ax = df.plot.bar(x='lab', y='val', rot=0)
 |      
 |              Plot a whole dataframe to a bar plot. Each column is assigned a
 |              distinct color, and each row is nested in a group along the
 |              horizontal axis.
 |      
 |              .. plot::
 |                  :context: close-figs
 |      
 |                  >>> speed = [0.1, 17.5, 40, 48, 52, 69, 88]
 |                  >>> lifespan = [2, 8, 70, 1.5, 25, 12, 28]
 |                  >>> index = ['snail', 'pig', 'elephant',
 |                  ...          'rabbit', 'giraffe', 'coyote', 'horse']
 |                  >>> df = pd.DataFrame({'speed': speed,
 |                  ...                    'lifespan': lifespan}, index=index)
 |                  >>> ax = df.plot.bar(rot=0)
 |      
 |              Plot stacked bar charts for the DataFrame
 |      
 |              .. plot::
 |                  :context: close-figs
 |      
 |                  >>> ax = df.plot.bar(stacked=True)
 |      
 |              Instead of nesting, the figure can be split by column with
 |              ``subplots=True``. In this case, a :class:`numpy.ndarray` of
 |              :class:`matplotlib.axes.Axes` are returned.
 |      
 |              .. plot::
 |                  :context: close-figs
 |      
 |                  >>> axes = df.plot.bar(rot=0, subplots=True)
 |                  >>> axes[1].legend(loc=2)  # doctest: +SKIP
 |      
 |              If you don't like the default colours, you can specify how you'd
 |              like each column to be colored.
 |      
 |              .. plot::
 |                  :context: close-figs
 |      
 |                  >>> axes = df.plot.bar(
 |                  ...     rot=0, subplots=True, color={"speed": "red", "lifespan": "green"}
 |                  ... )
 |                  >>> axes[1].legend(loc=2)  # doctest: +SKIP
 |      
 |              Plot a single column.
 |      
 |              .. plot::
 |                  :context: close-figs
 |      
 |                  >>> ax = df.plot.bar(y='speed', rot=0)
 |      
 |              Plot only selected categories for the DataFrame.
 |      
 |              .. plot::
 |                  :context: close-figs
 |      
 |                  >>> ax = df.plot.bar(x='lifespan', rot=0)
 |  
 |  barh(self, x=None, y=None, **kwargs)
 |      Make a horizontal bar plot.
 |      
 |      A horizontal bar plot is a plot that presents quantitative data with
 |      rectangular bars with lengths proportional to the values that they
 |      represent. A bar plot shows comparisons among discrete categories. One
 |      axis of the plot shows the specific categories being compared, and the
 |      other axis represents a measured value.
 |      
 |      Parameters
 |      ----------
 |      x : label or position, optional
 |          Allows plotting of one column versus another. If not specified,
 |          the index of the DataFrame is used.
 |      y : label or position, optional
 |          Allows plotting of one column versus another. If not specified,
 |          all numerical columns are used.
 |      color : str, array-like, or dict, optional
 |          The color for each of the DataFrame's columns. Possible values are:
 |      
 |          - A single color string referred to by name, RGB or RGBA code,
 |              for instance 'red' or '#a98d19'.
 |      
 |          - A sequence of color strings referred to by name, RGB or RGBA
 |              code, which will be used for each column recursively. For
 |              instance ['green','yellow'] each column's bar will be filled in
 |              green or yellow, alternatively. If there is only a single column to
 |              be plotted, then only the first color from the color list will be
 |              used.
 |      
 |          - A dict of the form {column name : color}, so that each column will be
 |              colored accordingly. For example, if your columns are called `a` and
 |              `b`, then passing {'a': 'green', 'b': 'red'} will color bars for
 |              column `a` in green and bars for column `b` in red.
 |      
 |          .. versionadded:: 1.1.0
 |      
 |      **kwargs
 |          Additional keyword arguments are documented in
 |          :meth:`DataFrame.plot`.
 |      
 |      Returns
 |      -------
 |      matplotlib.axes.Axes or np.ndarray of them
 |          An ndarray is returned with one :class:`matplotlib.axes.Axes`
 |          per column when ``subplots=True``.
 |      
 |              See Also
 |              --------
 |              DataFrame.plot.bar: Vertical bar plot.
 |              DataFrame.plot : Make plots of DataFrame using matplotlib.
 |              matplotlib.axes.Axes.bar : Plot a vertical bar plot using matplotlib.
 |      
 |              Examples
 |              --------
 |              Basic example
 |      
 |              .. plot::
 |                  :context: close-figs
 |      
 |                  >>> df = pd.DataFrame({'lab': ['A', 'B', 'C'], 'val': [10, 30, 20]})
 |                  >>> ax = df.plot.barh(x='lab', y='val')
 |      
 |              Plot a whole DataFrame to a horizontal bar plot
 |      
 |              .. plot::
 |                  :context: close-figs
 |      
 |                  >>> speed = [0.1, 17.5, 40, 48, 52, 69, 88]
 |                  >>> lifespan = [2, 8, 70, 1.5, 25, 12, 28]
 |                  >>> index = ['snail', 'pig', 'elephant',
 |                  ...          'rabbit', 'giraffe', 'coyote', 'horse']
 |                  >>> df = pd.DataFrame({'speed': speed,
 |                  ...                    'lifespan': lifespan}, index=index)
 |                  >>> ax = df.plot.barh()
 |      
 |              Plot stacked barh charts for the DataFrame
 |      
 |              .. plot::
 |                  :context: close-figs
 |      
 |                  >>> ax = df.plot.barh(stacked=True)
 |      
 |              We can specify colors for each column
 |      
 |              .. plot::
 |                  :context: close-figs
 |      
 |                  >>> ax = df.plot.barh(color={"speed": "red", "lifespan": "green"})
 |      
 |              Plot a column of the DataFrame to a horizontal bar plot
 |      
 |              .. plot::
 |                  :context: close-figs
 |      
 |                  >>> speed = [0.1, 17.5, 40, 48, 52, 69, 88]
 |                  >>> lifespan = [2, 8, 70, 1.5, 25, 12, 28]
 |                  >>> index = ['snail', 'pig', 'elephant',
 |                  ...          'rabbit', 'giraffe', 'coyote', 'horse']
 |                  >>> df = pd.DataFrame({'speed': speed,
 |                  ...                    'lifespan': lifespan}, index=index)
 |                  >>> ax = df.plot.barh(y='speed')
 |      
 |              Plot DataFrame versus the desired column
 |      
 |              .. plot::
 |                  :context: close-figs
 |      
 |                  >>> speed = [0.1, 17.5, 40, 48, 52, 69, 88]
 |                  >>> lifespan = [2, 8, 70, 1.5, 25, 12, 28]
 |                  >>> index = ['snail', 'pig', 'elephant',
 |                  ...          'rabbit', 'giraffe', 'coyote', 'horse']
 |                  >>> df = pd.DataFrame({'speed': speed,
 |                  ...                    'lifespan': lifespan}, index=index)
 |                  >>> ax = df.plot.barh(x='lifespan')
 |  
 |  box(self, by=None, **kwargs)
 |      Make a box plot of the DataFrame columns.
 |      
 |      A box plot is a method for graphically depicting groups of numerical
 |      data through their quartiles.
 |      The box extends from the Q1 to Q3 quartile values of the data,
 |      with a line at the median (Q2). The whiskers extend from the edges
 |      of box to show the range of the data. The position of the whiskers
 |      is set by default to 1.5*IQR (IQR = Q3 - Q1) from the edges of the
 |      box. Outlier points are those past the end of the whiskers.
 |      
 |      For further details see Wikipedia's
 |      entry for `boxplot <https://en.wikipedia.org/wiki/Box_plot>`__.
 |      
 |      A consideration when using this chart is that the box and the whiskers
 |      can overlap, which is very common when plotting small sets of data.
 |      
 |      Parameters
 |      ----------
 |      by : str or sequence
 |          Column in the DataFrame to group by.
 |      **kwargs
 |          Additional keywords are documented in
 |          :meth:`DataFrame.plot`.
 |      
 |      Returns
 |      -------
 |      :class:`matplotlib.axes.Axes` or numpy.ndarray of them
 |      
 |      See Also
 |      --------
 |      DataFrame.boxplot: Another method to draw a box plot.
 |      Series.plot.box: Draw a box plot from a Series object.
 |      matplotlib.pyplot.boxplot: Draw a box plot in matplotlib.
 |      
 |      Examples
 |      --------
 |      Draw a box plot from a DataFrame with four columns of randomly
 |      generated data.
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> data = np.random.randn(25, 4)
 |          >>> df = pd.DataFrame(data, columns=list('ABCD'))
 |          >>> ax = df.plot.box()
 |  
 |  density = kde(self, bw_method=None, ind=None, **kwargs)
 |  
 |  hexbin(self, x, y, C=None, reduce_C_function=None, gridsize=None, **kwargs)
 |      Generate a hexagonal binning plot.
 |      
 |      Generate a hexagonal binning plot of `x` versus `y`. If `C` is `None`
 |      (the default), this is a histogram of the number of occurrences
 |      of the observations at ``(x[i], y[i])``.
 |      
 |      If `C` is specified, specifies values at given coordinates
 |      ``(x[i], y[i])``. These values are accumulated for each hexagonal
 |      bin and then reduced according to `reduce_C_function`,
 |      having as default the NumPy's mean function (:meth:`numpy.mean`).
 |      (If `C` is specified, it must also be a 1-D sequence
 |      of the same length as `x` and `y`, or a column label.)
 |      
 |      Parameters
 |      ----------
 |      x : int or str
 |          The column label or position for x points.
 |      y : int or str
 |          The column label or position for y points.
 |      C : int or str, optional
 |          The column label or position for the value of `(x, y)` point.
 |      reduce_C_function : callable, default `np.mean`
 |          Function of one argument that reduces all the values in a bin to
 |          a single number (e.g. `np.mean`, `np.max`, `np.sum`, `np.std`).
 |      gridsize : int or tuple of (int, int), default 100
 |          The number of hexagons in the x-direction.
 |          The corresponding number of hexagons in the y-direction is
 |          chosen in a way that the hexagons are approximately regular.
 |          Alternatively, gridsize can be a tuple with two elements
 |          specifying the number of hexagons in the x-direction and the
 |          y-direction.
 |      **kwargs
 |          Additional keyword arguments are documented in
 |          :meth:`DataFrame.plot`.
 |      
 |      Returns
 |      -------
 |      matplotlib.AxesSubplot
 |          The matplotlib ``Axes`` on which the hexbin is plotted.
 |      
 |      See Also
 |      --------
 |      DataFrame.plot : Make plots of a DataFrame.
 |      matplotlib.pyplot.hexbin : Hexagonal binning plot using matplotlib,
 |          the matplotlib function that is used under the hood.
 |      
 |      Examples
 |      --------
 |      The following examples are generated with random data from
 |      a normal distribution.
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> n = 10000
 |          >>> df = pd.DataFrame({'x': np.random.randn(n),
 |          ...                    'y': np.random.randn(n)})
 |          >>> ax = df.plot.hexbin(x='x', y='y', gridsize=20)
 |      
 |      The next example uses `C` and `np.sum` as `reduce_C_function`.
 |      Note that `'observations'` values ranges from 1 to 5 but the result
 |      plot shows values up to more than 25. This is because of the
 |      `reduce_C_function`.
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> n = 500
 |          >>> df = pd.DataFrame({
 |          ...     'coord_x': np.random.uniform(-3, 3, size=n),
 |          ...     'coord_y': np.random.uniform(30, 50, size=n),
 |          ...     'observations': np.random.randint(1,5, size=n)
 |          ...     })
 |          >>> ax = df.plot.hexbin(x='coord_x',
 |          ...                     y='coord_y',
 |          ...                     C='observations',
 |          ...                     reduce_C_function=np.sum,
 |          ...                     gridsize=10,
 |          ...                     cmap="viridis")
 |  
 |  hist(self, by=None, bins=10, **kwargs)
 |      Draw one histogram of the DataFrame's columns.
 |      
 |      A histogram is a representation of the distribution of data.
 |      This function groups the values of all given Series in the DataFrame
 |      into bins and draws all bins in one :class:`matplotlib.axes.Axes`.
 |      This is useful when the DataFrame's Series are in a similar scale.
 |      
 |      Parameters
 |      ----------
 |      by : str or sequence, optional
 |          Column in the DataFrame to group by.
 |      bins : int, default 10
 |          Number of histogram bins to be used.
 |      **kwargs
 |          Additional keyword arguments are documented in
 |          :meth:`DataFrame.plot`.
 |      
 |      Returns
 |      -------
 |      class:`matplotlib.AxesSubplot`
 |          Return a histogram plot.
 |      
 |      See Also
 |      --------
 |      DataFrame.hist : Draw histograms per DataFrame's Series.
 |      Series.hist : Draw a histogram with Series' data.
 |      
 |      Examples
 |      --------
 |      When we draw a dice 6000 times, we expect to get each value around 1000
 |      times. But when we draw two dices and sum the result, the distribution
 |      is going to be quite different. A histogram illustrates those
 |      distributions.
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> df = pd.DataFrame(
 |          ...     np.random.randint(1, 7, 6000),
 |          ...     columns = ['one'])
 |          >>> df['two'] = df['one'] + np.random.randint(1, 7, 6000)
 |          >>> ax = df.plot.hist(bins=12, alpha=0.5)
 |  
 |  kde(self, bw_method=None, ind=None, **kwargs)
 |      Generate Kernel Density Estimate plot using Gaussian kernels.
 |      
 |      In statistics, `kernel density estimation`_ (KDE) is a non-parametric
 |      way to estimate the probability density function (PDF) of a random
 |      variable. This function uses Gaussian kernels and includes automatic
 |      bandwidth determination.
 |      
 |      .. _kernel density estimation:
 |          https://en.wikipedia.org/wiki/Kernel_density_estimation
 |      
 |      Parameters
 |      ----------
 |      bw_method : str, scalar or callable, optional
 |          The method used to calculate the estimator bandwidth. This can be
 |          'scott', 'silverman', a scalar constant or a callable.
 |          If None (default), 'scott' is used.
 |          See :class:`scipy.stats.gaussian_kde` for more information.
 |      ind : NumPy array or int, optional
 |          Evaluation points for the estimated PDF. If None (default),
 |          1000 equally spaced points are used. If `ind` is a NumPy array, the
 |          KDE is evaluated at the points passed. If `ind` is an integer,
 |          `ind` number of equally spaced points are used.
 |      **kwargs
 |          Additional keyword arguments are documented in
 |          :meth:`pandas.%(this-datatype)s.plot`.
 |      
 |      Returns
 |      -------
 |      matplotlib.axes.Axes or numpy.ndarray of them
 |      
 |      See Also
 |      --------
 |      scipy.stats.gaussian_kde : Representation of a kernel-density
 |          estimate using Gaussian kernels. This is the function used
 |          internally to estimate the PDF.
 |      
 |      Examples
 |      --------
 |      Given a Series of points randomly sampled from an unknown
 |      distribution, estimate its PDF using KDE with automatic
 |      bandwidth determination and plot the results, evaluating them at
 |      1000 equally spaced points (default):
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> s = pd.Series([1, 2, 2.5, 3, 3.5, 4, 5])
 |          >>> ax = s.plot.kde()
 |      
 |      A scalar bandwidth can be specified. Using a small bandwidth value can
 |      lead to over-fitting, while using a large bandwidth value may result
 |      in under-fitting:
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> ax = s.plot.kde(bw_method=0.3)
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> ax = s.plot.kde(bw_method=3)
 |      
 |      Finally, the `ind` parameter determines the evaluation points for the
 |      plot of the estimated PDF:
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> ax = s.plot.kde(ind=[1, 2, 3, 4, 5])
 |      
 |      For DataFrame, it works in the same way:
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> df = pd.DataFrame({
 |          ...     'x': [1, 2, 2.5, 3, 3.5, 4, 5],
 |          ...     'y': [4, 4, 4.5, 5, 5.5, 6, 6],
 |          ... })
 |          >>> ax = df.plot.kde()
 |      
 |      A scalar bandwidth can be specified. Using a small bandwidth value can
 |      lead to over-fitting, while using a large bandwidth value may result
 |      in under-fitting:
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> ax = df.plot.kde(bw_method=0.3)
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> ax = df.plot.kde(bw_method=3)
 |      
 |      Finally, the `ind` parameter determines the evaluation points for the
 |      plot of the estimated PDF:
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> ax = df.plot.kde(ind=[1, 2, 3, 4, 5, 6])
 |  
 |  line(self, x=None, y=None, **kwargs)
 |      Plot Series or DataFrame as lines.
 |      
 |      This function is useful to plot lines using DataFrame's values
 |      as coordinates.
 |      
 |      Parameters
 |      ----------
 |      x : label or position, optional
 |          Allows plotting of one column versus another. If not specified,
 |          the index of the DataFrame is used.
 |      y : label or position, optional
 |          Allows plotting of one column versus another. If not specified,
 |          all numerical columns are used.
 |      color : str, array-like, or dict, optional
 |          The color for each of the DataFrame's columns. Possible values are:
 |      
 |          - A single color string referred to by name, RGB or RGBA code,
 |              for instance 'red' or '#a98d19'.
 |      
 |          - A sequence of color strings referred to by name, RGB or RGBA
 |              code, which will be used for each column recursively. For
 |              instance ['green','yellow'] each column's line will be filled in
 |              green or yellow, alternatively. If there is only a single column to
 |              be plotted, then only the first color from the color list will be
 |              used.
 |      
 |          - A dict of the form {column name : color}, so that each column will be
 |              colored accordingly. For example, if your columns are called `a` and
 |              `b`, then passing {'a': 'green', 'b': 'red'} will color lines for
 |              column `a` in green and lines for column `b` in red.
 |      
 |          .. versionadded:: 1.1.0
 |      
 |      **kwargs
 |          Additional keyword arguments are documented in
 |          :meth:`DataFrame.plot`.
 |      
 |      Returns
 |      -------
 |      matplotlib.axes.Axes or np.ndarray of them
 |          An ndarray is returned with one :class:`matplotlib.axes.Axes`
 |          per column when ``subplots=True``.
 |      
 |              See Also
 |              --------
 |              matplotlib.pyplot.plot : Plot y versus x as lines and/or markers.
 |      
 |              Examples
 |              --------
 |      
 |              .. plot::
 |                  :context: close-figs
 |      
 |                  >>> s = pd.Series([1, 3, 2])
 |                  >>> s.plot.line()
 |      
 |              .. plot::
 |                  :context: close-figs
 |      
 |                  The following example shows the populations for some animals
 |                  over the years.
 |      
 |                  >>> df = pd.DataFrame({
 |                  ...    'pig': [20, 18, 489, 675, 1776],
 |                  ...    'horse': [4, 25, 281, 600, 1900]
 |                  ...    }, index=[1990, 1997, 2003, 2009, 2014])
 |                  >>> lines = df.plot.line()
 |      
 |              .. plot::
 |                 :context: close-figs
 |      
 |                 An example with subplots, so an array of axes is returned.
 |      
 |                 >>> axes = df.plot.line(subplots=True)
 |                 >>> type(axes)
 |                 <class 'numpy.ndarray'>
 |      
 |              .. plot::
 |                 :context: close-figs
 |      
 |                 Let's repeat the same example, but specifying colors for
 |                 each column (in this case, for each animal).
 |      
 |                 >>> axes = df.plot.line(
 |                 ...     subplots=True, color={"pig": "pink", "horse": "#742802"}
 |                 ... )
 |      
 |              .. plot::
 |                  :context: close-figs
 |      
 |                  The following example shows the relationship between both
 |                  populations.
 |      
 |                  >>> lines = df.plot.line(x='pig', y='horse')
 |  
 |  pie(self, **kwargs)
 |      Generate a pie plot.
 |      
 |      A pie plot is a proportional representation of the numerical data in a
 |      column. This function wraps :meth:`matplotlib.pyplot.pie` for the
 |      specified column. If no column reference is passed and
 |      ``subplots=True`` a pie plot is drawn for each numerical column
 |      independently.
 |      
 |      Parameters
 |      ----------
 |      y : int or label, optional
 |          Label or position of the column to plot.
 |          If not provided, ``subplots=True`` argument must be passed.
 |      **kwargs
 |          Keyword arguments to pass on to :meth:`DataFrame.plot`.
 |      
 |      Returns
 |      -------
 |      matplotlib.axes.Axes or np.ndarray of them
 |          A NumPy array is returned when `subplots` is True.
 |      
 |      See Also
 |      --------
 |      Series.plot.pie : Generate a pie plot for a Series.
 |      DataFrame.plot : Make plots of a DataFrame.
 |      
 |      Examples
 |      --------
 |      In the example below we have a DataFrame with the information about
 |      planet's mass and radius. We pass the 'mass' column to the
 |      pie function to get a pie plot.
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> df = pd.DataFrame({'mass': [0.330, 4.87 , 5.97],
 |          ...                    'radius': [2439.7, 6051.8, 6378.1]},
 |          ...                   index=['Mercury', 'Venus', 'Earth'])
 |          >>> plot = df.plot.pie(y='mass', figsize=(5, 5))
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> plot = df.plot.pie(subplots=True, figsize=(11, 6))
 |  
 |  scatter(self, x, y, s=None, c=None, **kwargs)
 |      Create a scatter plot with varying marker point size and color.
 |      
 |      The coordinates of each point are defined by two dataframe columns and
 |      filled circles are used to represent each point. This kind of plot is
 |      useful to see complex correlations between two variables. Points could
 |      be for instance natural 2D coordinates like longitude and latitude in
 |      a map or, in general, any pair of metrics that can be plotted against
 |      each other.
 |      
 |      Parameters
 |      ----------
 |      x : int or str
 |          The column name or column position to be used as horizontal
 |          coordinates for each point.
 |      y : int or str
 |          The column name or column position to be used as vertical
 |          coordinates for each point.
 |      s : str, scalar or array-like, optional
 |          The size of each point. Possible values are:
 |      
 |          - A string with the name of the column to be used for marker's size.
 |      
 |          - A single scalar so all points have the same size.
 |      
 |          - A sequence of scalars, which will be used for each point's size
 |            recursively. For instance, when passing [2,14] all points size
 |            will be either 2 or 14, alternatively.
 |      
 |            .. versionchanged:: 1.1.0
 |      
 |      c : str, int or array-like, optional
 |          The color of each point. Possible values are:
 |      
 |          - A single color string referred to by name, RGB or RGBA code,
 |            for instance 'red' or '#a98d19'.
 |      
 |          - A sequence of color strings referred to by name, RGB or RGBA
 |            code, which will be used for each point's color recursively. For
 |            instance ['green','yellow'] all points will be filled in green or
 |            yellow, alternatively.
 |      
 |          - A column name or position whose values will be used to color the
 |            marker points according to a colormap.
 |      
 |      **kwargs
 |          Keyword arguments to pass on to :meth:`DataFrame.plot`.
 |      
 |      Returns
 |      -------
 |      :class:`matplotlib.axes.Axes` or numpy.ndarray of them
 |      
 |      See Also
 |      --------
 |      matplotlib.pyplot.scatter : Scatter plot using multiple input data
 |          formats.
 |      
 |      Examples
 |      --------
 |      Let's see how to draw a scatter plot using coordinates from the values
 |      in a DataFrame's columns.
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> df = pd.DataFrame([[5.1, 3.5, 0], [4.9, 3.0, 0], [7.0, 3.2, 1],
 |          ...                    [6.4, 3.2, 1], [5.9, 3.0, 2]],
 |          ...                   columns=['length', 'width', 'species'])
 |          >>> ax1 = df.plot.scatter(x='length',
 |          ...                       y='width',
 |          ...                       c='DarkBlue')
 |      
 |      And now with the color determined by a column as well.
 |      
 |      .. plot::
 |          :context: close-figs
 |      
 |          >>> ax2 = df.plot.scatter(x='length',
 |          ...                       y='width',
 |          ...                       c='species',
 |          ...                       colormap='viridis')
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from pandas.core.base.PandasObject:
 |  
 |  __repr__(self) -> 'str'
 |      Return a string representation for a particular object.
 |  
 |  __sizeof__(self) -> 'int'
 |      Generates the total memory usage for an object that returns
 |      either a value or Series of values
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes inherited from pandas.core.base.PandasObject:
 |  
 |  __annotations__ = {'_cache': 'dict[str, Any]'}
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from pandas.core.accessor.DirNamesMixin:
 |  
 |  __dir__(self) -> 'list[str]'
 |      Provide method name lookup and completion.
 |      
 |      Notes
 |      -----
 |      Only provide 'public' methods.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from pandas.core.accessor.DirNamesMixin:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)

Embiggening* the plot

While the plot sizes we’re working with are OK, it would be nice to have them displayed a bit larger. Fortunately, there is an easy way to make the plots larger in pandas/Matplotlib. We can simply add the figsize parameter with the desired figure size listed as a tuple (a set of values in normal parentheses) that lists the width and height of the figure (in inches!).

# Adjust the figure size
ax = oct1_temps.plot(
    style="ro--",
    title="Helsinki-Vantaa temperatures",
    xlabel="Date",
    ylabel="Temperature [°F]",
    figsize=(12, 6),
)
../../_images/matplotlib_21_0.png

The cell above sets the default plot size to be 12 inches wide by 6 inches tall. Feel free to change these values if you prefer.

Note

It is also possible to change the default figure size for all figures in a Jupyter Notebook by importing the pyplot module from matplotlib (i.e., import matplotlib.pyplot as plt)and then defining the default figure size using plt.rcParams['figure.figsize'] = [12, 6]. In this case the figure size should be given as a Python list.

* To embiggen means to enlarge. It’s a perfectly cromulent word.

Other common plot formatting operations

Adding text to the plot

Adding text to plots can be done using ax.text().

ax.text(x, y, 'Text to display')

This would display “Text to display” at the location x, y on the plot. We’ll see how to do this in a live example in just a second.

Changing the axis ranges

Changing the plot axes can be done using the xlim and ylim parameters of the plot() function

df.plot(xlim=[xmin, xmax], ylim=[ymin, ymax])

where xmin should be the minimum bound of the x-axis, xmax should be the maximum bound, and the same goes for the y-axis with ymin and ymax.

Dealing with datetime axes

One issue we will encounter with both placing text on the plot and changing the axis ranges is our datetime index for our DataFrame. In order to do either thing, we need to define x-values using a datetime object. The easiest way to do this is to use the Pandas pd.to_datetime() function, which converts a character string date to a datetime object. For example, we can convert 13:00 on October 1, 2019 from the character string '201910011300' to a datetime equivalent by typing

pd.to_datetime("201910011300")
Timestamp('2019-10-01 13:00:00')

With this datetime issue in mind, let’s now consider a modified version of the plot above, we can

  1. Limit our time range to 12:00 to 15:00 on October 1, 2019

  2. Only look at temperatures between 40-46° Fahrenheit

  3. Add text to note the coldest part of the early afternoon.

# Define the start, end, and cold times
start_time = pd.to_datetime("201910011200")
end_time = pd.to_datetime("201910011500")
cold_time = pd.to_datetime("201910011205")

# Create the plot, including the axis limits
ax = oct1_temps.plot(
    style="ro--",
    title="Helsinki-Vantaa temperatures",
    xlabel="Date",
    ylabel="Temperature [°F]",
    figsize=(12, 6),
    xlim=[start_time, end_time],
    ylim=[40.0, 46.0],
)

# Add text to display the coldest temperature
ax.text(cold_time, 42.0, "<- Coldest temperature in early afternoon")
Text(2019-10-01 12:05:00, 42.0, '<- Coldest temperature in early afternoon')
../../_images/matplotlib_26_1.png

Check your understanding

Create a line plot similar to our examples above with the following attributes:

  • Temperature data from 18:00-24:00 on October 1, 2019

  • A dotted black line connecting the observations (do not show the data points)

  • A title that reads “Evening temperatures on October 1, Helsinki-Vantaa”

  • A text label indicating the warmest temperature in the evening

# Solution
# Define start, end, and cold times
start_time = pd.to_datetime("201910011800")
end_time = pd.to_datetime("201910020000")
warm_time = pd.to_datetime("201910012120")

# Create the plot, including the axis limits

ax = oct1_temps.plot(
    style="k:",
    title="Evening temperatures on October 1, Helsinki-Vantaa",
    xlabel="Date",
    ylabel="Temperature [°F]",
    figsize=(12, 6),
    xlim=[start_time, end_time],
    ylim=[35.0, 44.0],
)

# Display text on plot
ax.text(warm_time, 43.0, "Warmest time of the evening ->")
Text(2019-10-01 21:20:00, 43.0, 'Warmest time of the evening ->')
../../_images/matplotlib_28_1.png

Bar plots in pandas

In addition to line plots, there are many other options for plotting in pandas. Bar plots are one option, which can be used quite similarly to line plots with the addition of the kind=bar parameter. Note that it is easiest to plot our selected time range for a bar plot by selecting the dates in our data series first, rather than adjusting the plot limits. pandas sees bar plot data as categorical, so the date range is more difficult to define for x-axis limits. For the y-axis, we can still define its range using the ylim=[ymin, ymax] parameter. Similarly, text placement on a bar plot is more difficult, and most easily done using the index value of the bar where the text should be placed.

# Define time range
oct1_afternoon = oct1_temps.loc[oct1_temps.index <= "201910011500"]

# Create bar plot
ax = oct1_afternoon.plot(
    kind="bar",
    title="Helsinki-Vantaa temperatures",
    xlabel="Date",
    ylabel="Temperature [°F]",
    figsize=(12, 6),
    ylim=[40, 46],
)

# Add plot text
ax.text(0, 42.1, "Coldest \ntemp \nv")
Text(0, 42.1, 'Coldest \ntemp \nv')
../../_images/matplotlib_30_1.png

You can find more about how to format bar charts on the pandas documentation website.

Saving your plots as image files

Saving plots created using pandas can be done in several ways. The recommendation for use outside of Jupyter notebooks is to use Matplotlib’s plt.savefig() function. When using plt.savefig(), you simply give a list of commands to generate a plot and include plt.savefig() with some parameters as the last command in the Python cell. The file name is required, and the image format will be determined based on the listed file extension. Note that because we have not used Matplotlib for any of the earlier plots, we need to import it here first.

Matplotlib plots can be saved in a number of useful file formats, including PNG, PDF, and EPS. PNG is a nice format for raster images, and EPS is probably easiest to use for vector graphics. Let’s check out an example and save our lovely bar plot.

# Import matplotlib
import matplotlib.pyplot as plt

# Create bar plot
ax = oct1_afternoon.plot(
    kind="bar",
    title="Helsinki-Vantaa temperatures",
    xlabel="Date",
    ylabel="Temperature [°F]",
    figsize=(12, 6),
    ylim=[40, 46],
)

# Add plot text
ax.text(0, 42.1, "Coldest \ntemp \nv")

# Save plot to file
plt.savefig("bar-plot.png")
../../_images/matplotlib_33_0.png

If you refresh your Files tab on the left side of the JupyterLab window you should now see bar-plot.png listed. We could try to save another version in higher resolution with a minor change to our plot commands above.

# Create bar plot
ax = oct1_afternoon.plot(
    kind="bar",
    title="Helsinki-Vantaa temperatures",
    xlabel="Date",
    ylabel="Temperature [°F]",
    figsize=(12, 6),
    ylim=[40, 46],
)

# Add plot text
ax.text(0, 42.1, "Coldest \ntemp \nv")

# Save plot to file (high resolution, PDF)
plt.savefig("bar-plot-hi-res.pdf", dpi=600)
../../_images/matplotlib_35_0.png

Interactive plotting with Pandas-Bokeh (optional)

One of the cool things in Jupyter notebooks is that our plots need not be static. We can easily create plots that are interactive, allowing us to view data values by mousing over them, or click to enable/disable plotting of some data. There are several ways we can do this, but we’ll utilize the Pandas-Bokeh plotting backend, which allows us to create interactive plots with little additional effort.

To get started, we need to import Pandas-Bokeh and configure our notebook to use it for plotting out Pandas data.

import pandas_bokeh

pandas_bokeh.output_notebook()
pd.set_option("plotting.backend", "pandas_bokeh")
Loading BokehJS ...

In the cell above, we import Pandas-Bokeh, and the configure two options: (1) Setting the output to be displayed in a notebook rather than in a separate window, and (2) setting the plotting backend software to use Pandas-Bokeh rather than Matplotlib.

Now, we can consider an example plot similar to the one we started with, but with data for three days (September 29-October 1, 2019). Pandas-Bokeh expects a DataFrame as the source for the plot data, so we’ll need to create a time slice of the data DataFrame containing the desired date range before making the plot. Let’s generate the Pandas-Bokeh plot and the see what is different.

# Define data time slice
sept29_oct1_df = data.loc[data.index >= "201909290000"]

# Define start, end times
start_time = pd.to_datetime("201909290000")
end_time = pd.to_datetime("201910020000")

# Create plot
ax = sept29_oct1_df.plot(
    title="Helsinki-Vantaa temperatures",
    xlabel="Date",
    ylabel="Temperature [°F]",
    figsize=(800, 500),
    xlim=[start_time, end_time],
    ylim=[35.0, 60.0],
)

So now we have a similar plot to those generated previously, but when you move the mouse along the curve you can see the temperature values at each time. We can also hide any of the lines by clicking on them in the legend, as well as use the scroll wheel/trackpad to zoom.

But we did also have to make a few small changes to generate this plot:

  1. We need to use a DataFrame as the data source for the plot, rather than a pandas Series. Thus, data['TEMP'].plot() will not work with Pandas-Bokeh.

  2. The line color and plotting of points are not specified using the style keyword. Instead, the line colors could be specified using the color or colormap parameters. Plotting of the points is enabled using the plot_data_points parameter (see below). More information about formatting the lines can be found on the Pandas-Bokeh website.

  3. The figsize parameter used here is expecting values in pixels, not inches!

  4. We have not included a text label on the plot, as it may not be possible to do so with Pandas-Bokeh.

But otherwise, we are able to produce these cool interactive plots with minimal effort, and directly within our notebooks!

ax = sept29_oct1_df.plot(
    title="Helsinki-Vantaa temperatures",
    xlabel="Date",
    ylabel="Temperature [°F]",
    figsize=(800, 500),
    xlim=[start_time, end_time],
    ylim=[35.0, 60.0],
    plot_data_points=True,
)