A matplotlib Primer in Simple Words
Start creating data visualizations with Python and matplotlib, and have fun!
Hello matplotlib!
Matplotlib is one of the great open-source libraries for data visualization with Python. This library works great with Jupyter Notebook, a great Pyhton IDE. You can create complex, meaningful, beutiful and even animated data visualiztions with this library.
For those newcomers and just started learning Python, creating data visualizations with matplotlib may not be very straightforward and as easy as in spreadsheets software. But fear not! This primer article will show you how to build data visualizations with Python and matplotlib in Jupyter Notebook and explains each step in simple words. After reading this, you’ll have a solid ground to firmly stand on and build upon. And, I promise you, it will be fun!
Your first data visualization with matplotlib!
To understand the basic concepts and to be able to ‘stand firmly on a solid ground’, let’s first learn about the components of a data viz (short for data visualization) with Python and matplotlib:
Once you fired up your Jupyter Notebook, you’ll need to import the matplotlib library with the following code:
import matplotlib.pyplot as plt
The plot() function in the matplotlib library can parse lists, NumPy arrays, and pandas series and dataframes. NumPy and pandas are two indispensable libraries in data analysis with Python but we won’t cover them in this article.
We’ll need to use NumPy to prepare data for some vizes later, so let’s import NumPy as well with the additional line of code:
import matplotlib.pyplot as plt
import numpy as np
Now, with those two libraries imported, lets try something simple and fun. In the next cell in your Jupyter Notebook, try the following code:
data = [3, 5, 6, 5, 6, 8, 7, 6, 8, 9]
plt.plot(data)
plt.show()
In the code above, data is a variable containing a list of 10 numbers. We call matplotlib’s plot function with plt.plot() and we supply the numbers in the list variable as data for plotting. Finally, we call matplotlib’s show() function to display the chart as an output in our Jupyter Notebook.
And the output of the code above looks like this:
From the resulting plot, you can see that there’s only one set of data and matplotlib’s plot function is smart enough to use those data as y values. As for the values on the x-axis, they are automatically generated as indices starting from 0 and ending with n-1 where n is the number of members in the list variable. The function’s assumption is very reasonable, don’t you think?
You can see now that we have to code to create a plot. It’s not dragging and dropping and selecting columns and cells of data from a ready-made table. For those who are used to creating vizes with spreadsheets and software with GUI (graphic-user interface), this may not be very straightforward. But it sure isn’t too difficult. And it’s fun! Read on!
The anatomy of a viz created with matplotlib
From the simple plot above, there are three components according to matplotlib’s terminology:
- Figure: the space that contains everything in it. You can think of figure as the white, blank space in the background.
- Axes: the plot, which composes of the x and y axes and the blue line
- Subplot(s): the container of the axes. The concept of having a subplot or subplots as container(s) for the axes is so that you can create multiple data vizes as a set in one figure, which you’ll see (and create them by yourself) later.
Let’s make the meaning of those three terms above even clearer. In the next cell, try the following code:
fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 1)
ax4 = fig.add_subplot(2, 2, 4)
plt.show()
And the output of the code above looks like this:
If you scroll back up to study the code above again, you’ll see that we first created an empty figure (the background) called fig with matplotlib’s function figure(). Then, we add two subplots to the figure to contain the axes with the function add_subplot() notice that we use fig.add_subplot() and not plt.add_subplot(). This is because the fuction add_subplot() is a method of a figure object which is already created. The axes are in the variables ax1 and ax4, respectively.
The code above gives us a figure with two empty axes in it. If you imagine the figure as a big white rectangle in the background (as it has always been), the two axes are put at different corners of this rectangle, diagonally opposite to each other.
Why it’s like that? What determines the position of the subplots (and axes) in them?
The numbers that we supply to the function add_subplot() determines their positions inside the figure. As you might have already guessed by now, the numbers (2, 2, 1) in ax1 = fig.add_subplot(2, 2, 1) tell the function to create an axes called ax1 inside a subplot at the first position (position 1, top left) of a 2 x 2 matrix which is the entire figure. In the same way, the code ax4 = add_subplot(2, 2, 4) creates an axes inside a subplot at position #4 (bottom right) of the figure.
Let’s complete the picture with the following code:
fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 1)
ax2 = fig.add_subplot(2, 2, 2)
ax3 = fig.add_subplot(2, 2, 3)
ax4 = fig.add_subplot(2, 2, 4)
plt.show()
The result of the code above looks like this:
We get the other two axes in two additional subplots according to the numbers indicating positions we supplied to the add_subplot() functions.
Creating data visualizations
Now, let’s add some more stuff to the code in the last section and create some charts in those empty axes with the following code:
np.random.seed(123456789)
fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 1)
ax1.plot(data)
ax2 = fig.add_subplot(2, 2, 2)
ax2.plot(np.random.randn(50).cumsum())
ax3 = fig.add_subplot(2, 2, 3)
ax3.plot(np.sin(np.linspace(0,2*np.pi**np.random.randint(100),100)))
ax4 = fig.add_subplot(2, 2, 4)
ax4.hist(np.random.randn(500), bins=20)
plt.show()
The result of the code above looks like this:
From the code above, we get all the mock-up data from NumPy’s randn() and randint() functions. The function plot(), just as mentioned before at the beginning of this article, gives you a simple line plot. There’s a new hist() function here. You can easily create a histogram chart using hist() function and supplying it with the data and the number of bins. In this case, there are 20 bins.
The first line of the code above is not necessary to produce the resulting viz. It’s just that if you try to create the viz yourself by following me and use the exact same number in the seed() function, you’ll get the exact same viz in each axes as shown in the picture above. This is because we use NumPy’s random number generator to create the datasets. If we set a seed number, it will generate the same sets of data whenever we call those random number generator functions.
Customization
So far so good. But there’re still many things missing. From the viz above, we still don’t know what they are about, hence we don’t know what information they convey. This can even render the viz totally useless, no matter how beautiful it looks. We can give the viz more meanings by adding the following items:
- title for the entire figure
- titles for each chart in the figure
- titles for each axis of each chart in each figure
Let’s add those items to our figure with the following code:
np.random.seed(123456789)
fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 1)
ax1.plot(data)
ax1.set_title('Subplot 1')
ax1.set_xlabel('s1_x')
ax1.set_ylabel('s1_y')
ax2 = fig.add_subplot(2, 2, 2)
ax2.plot(np.random.randn(50).cumsum())
ax2.set_title('Subplot 2')
ax2.set_xlabel('s2_x')
ax2.set_ylabel('s2_y')
ax3 = fig.add_subplot(2, 2, 3)
ax3.plot(np.sin(np.linspace(0,2*np.pi**np.random.randint(100),100)))
ax3.set_title('Subplot 3')
ax3.set_xlabel('s3_x')
ax3.set_ylabel('s3_y')
ax4 = fig.add_subplot(2, 2, 4)
ax4.hist(np.random.randn(500), bins=20)
ax4.set_title('Subplot 4')
ax4.set_xlabel('s4_x')
ax4.set_ylabel('s4_y')
fig.suptitle('Hello Matplotlib!')
plt.show()
The result of the code above looks like this:
Oh no! Something went wrong here!
As you can see, we can add the figure title with suptitle() function, and titles for each axes and their x and y axes with set_title(), set_xlable() and set_ylabel() functions. But they are overlapping! 😟
Fret not! We can fix this easily and quickly with the following code:
np.random.seed(123456789)
fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 1)
ax1.plot(data)
ax1.set_title('Subplot 1')
ax1.set_xlabel('s1_x')
ax1.set_ylabel('s1_y')
ax2 = fig.add_subplot(2, 2, 2)
ax2.plot(np.random.randn(50).cumsum())
ax2.set_title('Subplot 2')
ax2.set_xlabel('s2_x')
ax2.set_ylabel('s2_y')
ax3 = fig.add_subplot(2, 2, 3)
ax3.plot(np.sin(np.linspace(0,2*np.pi**np.random.randint(100),100)))
ax3.set_title('Subplot 3')
ax3.set_xlabel('s3_x')
ax3.set_ylabel('s3_y')
ax4 = fig.add_subplot(2, 2, 4)
ax4.hist(np.random.randn(500), bins=20)
ax4.set_title('Subplot 4')
ax4.set_xlabel('s4_x')
ax4.set_ylabel('s4_y')
fig.suptitle('Hello Matplotlib!')
fig.tight_layout()
plt.show()
The result of the code above looks like this:
Voilà! The code above just added one more line to call matplolib’s tight_layout() function just before the show() function. The tight_layout() function, as you can see from the result, automatically adjusts the padding between subplots so that nothing overlaps and everything inside the figure is displayed beautifully at positions we determine for each of them in the code.
What about colors and styles? Let’s give our viz some more ‘flares’ 😎📈 with the following code:
np.random.seed(123456789)
fig = plt.figure()
ax1 = fig.add_subplot(2, 2, 1)
ax1.plot(data,'b-*')
ax1.set_title('Subplot 1')
ax1.set_xlabel('s1_x')
ax1.set_ylabel('s1_y')
ax2 = fig.add_subplot(2, 2, 2)
ax2.plot(np.random.randn(50).cumsum(), 'k--')
ax2.set_title('Subplot 2')
ax2.set_xlabel('s2_x')
ax2.set_ylabel('s2_y')
ax3 = fig.add_subplot(2, 2, 3)
ax3.plot(np.sin(np.linspace(0,2*np.pi**np.random.randint(100),100)), 'r')
ax3.set_title('Subplot 3')
ax3.set_xlabel('s3_x')
ax3.set_ylabel('s3_y')
ax4 = fig.add_subplot(2, 2, 4)
ax4.hist(np.random.randn(500), bins=20, color= 'g',alpha = 0.3)
ax4.set_title('Subplot 4')
ax4.set_xlabel('s4_x')
ax4.set_ylabel('s4_y')
fig.suptitle('Hello Matplotlib!')
fig.tight_layout()
plt.show()
And the result of the code above looks like this:
From the code above, we just add string parameters to the plot functions such as ‘b — *’ for color = blue, linestyle = — , and marker = *
There are truly myriads of ways you can customize your vizes with matplotlib and this article naturally can’t cover everything. The best place to start learning about this is the official matplotlib documentation website.
There you have it! Our figure is complete with different axes in all the subplots available and all of them are given meaningful titles in their suitable positions.
Creating a group of vizes in a figure with only a few line of codes
Because adding multiple subplots into a single figure is very common in creating vizes, matplotlib can help you reduce the work of adding each individual axes into each subplot with the subplots() function. For example, if you want to display multiple line plots in a figure, you can use the combination of Python’s for loop and matplotlib’s subplots() function as the following code shows:
fig, axes = plt.subplots(2, 2, sharex=True, sharey=True)
for i in range(2):
for j in range(2):
axes[i, j].plot(np.random.randn(50).cumsum(), 'k--')
plt.subplots_adjust(wspace=0, hspace=0)
fig.suptitle('A goup of line plots')
fig.text(0.5, 0.04, 'common x-axis', ha='center', va='center')
fig.text(0.06, 0.5, 'common y-axis', ha='center', va='center', rotation='vertical')
And the result of the code above looks like this:
The code above combines the power of Python’s for loop and matplotlib’s subplots() function to create the resulting figure with 4 different line plots in each subplot with only a few line of codes. In this case we use matplotlib’s text() function to put the common x and y axis title in the figure. If you try and experiment more with the arguments in the text() function, you’ll see that you can use this function to annotate any point of the figure with any meaningful text you see fit.
At this point, you’ve got all the basics down and are ready to create and experiment with different types of visualizations that matplotlib’s capable to help you with.
To conclude, although it’s not very straightforward and imple as in the spreadsheets software, you can use Python and matplotlib to quickly create beautiful and meaningful data visualizations. And with all the customizations available to you, you are even in more control to tweak the appearances of your vizes.
Keep learning more about the library and all its function and you can have more fun each time you use them to create a viz. Thank you for reading this. Keep on creating! 🧑🚀