Fundamentals of Matplotlib Library for Data Science
This article will discuss the Matplotlib library,” in the data scientist’s toolbox on Python. Matplotlib is a library very commonly used by data scientists….
In addition to “Matplotlib,” “Pandas,” and “NumPy” are important parts of the data scientist’s toolbox.
Introduction to Matplotlib
Is it possible to know your data’s trend or pattern without visualization? In my view, the answer is definitely a “NO.” It is quite challenging to know the trend or pattern of the data without visualization.
Matplotlib is the quintessential graphics library for data visualization in Python. Matplotlib visualization library can plot simple graphics out of complex data. It offers an extensive array of plotting functions to create static, interactive, and animated visualizations that are publication-quality.
Born from the need for a more accessible and flexible tool to represent data graphically, Matplotlib has become the go-to library for many data scientists and researchers, valued for its versatility and ease of use.
Whether it’s simple bar charts, complex 3D plots, or layered statistical graphs, Matplotlib can translate data into informative visuals with just a few lines of code.
Rooted in the idea that understanding data should not be limited to numerical representations alone, this powerful tool invites us to explore and interpret the stories behind the numbers, turning abstract datasets into tangible insights.
Origin of Matplotlib
Matplotlib was created by John D. Hunter, who was a neurobiologist and needed a plotting library for his research. When it came to naming the library, the “Mat” in Matplotlib actually stands for “MATLAB”, and not “matrix” as might be commonly assumed. John Hunter was originally using MATLAB to visualize his data but found it expensive and not flexible enough for his needs, particularly in terms of integrating with web-based applications.
Thus, he started working on Matplotlib as a free and open-source alternative that would replicate MATLAB’s plotting capabilities. In MATLAB, plotting and visualization are central features, and Matplotlib was designed to bring a similar experience to Python. The goal was to provide a library that could produce complex plots with relatively simple code, which is why MATLAB’s plotting functionality heavily influenced Matplotlib’s design and API.
So, the name “Matplotlib” is a combination of “MATLAB” and “plotting library,” indicating its original purpose as a Python alternative for users familiar with MATLAB’s plotting capabilities.
Using Matplotlib for Data Visualization
This article will discuss the code for using Matplotlib to visualize data by plotting Scatter plots and Histograms. We will also discuss the code to label these charts. Let’s see how:
Installing Matplotlib
To install Matplotlib, we need to use ‘pip’.
pip is a package management system used to install and manage software packages written in Python. When you are installing Matplotlib, pip is the tool that allows you to download and install the Matplotlib package from the Python Package Index (PyPI), which is a repository for Python packages.
When someone uses pip to install Matplotlib, they would typically enter a command like this into their command-line interface (CLI):
pip install matplotlib
This command tells pip to download the latest version of Matplotlib and all of its dependencies from PyPI and install them in the user’s Python environment.
pip can also be used to manage (i.e., upgrade or uninstall) packages and to install specific versions of a package. It’s a crucial tool for Python development and data analysis because it simplifies managing external libraries.
Using pip3 Vs. pip for Installing Matplotlib
pip3 is a version of the ‘pip’ installer for Python 3. In systems where both Python 2.x and Python 3.x are installed and maintained separately, ‘pip3’ is used to install packages specifically for Python 3.
This distinction is made to avoid confusion and potential conflicts between Python 2 and Python 3 packages since they are not compatible with each other.
For example, on a system with both Python versions running:
pip install matplotlib
might install the package for Python 2 by default, whereas running:
pip3 install matplotlib
will ensure that the package is installed for Python 3.
Please note that Python 2 has reached the end of its life and is no longer supported, so ‘pip’ usually refers to the Python 3 version of the tool on most systems. However, pip3 is still used in environments where both versions of Python are being used or for clarity to ensure that packages are installed for Python 3.
Importing Matplotlib for Charting
After installing, we need to import it before using it. Let’s check the following code to import Matplotlib:
import matplotlib.pyplot as pyt
Line Chart
“pyt.plot()” function creates a line chart. Let’s see the code to plot a Line chart from the data for year-wise population:
import matplotlib.pyplot as pyt
population = [123, 243, 456, 690]
# in million
year = [1995, 1996, 1997, 1998]
pyt.plot(year, population) # year on x-axis and population on y-axis
pyt.show() # To show the plot
Scatter Plot
Similarly, there are other types of charts that we can create. For example, let’s try a scatter chart.
import matplotlib.pyplot as pyt
population = [123, 243, 456, 690]
# in million
year = [1995, 1996, 1997, 1998]
pyt.scatter(year, population)
pyt.grid(True) # To show grid in graphs
pyt.show()
It will show the scatter chart.
Histogram Plot
What about the histogram? This type of chart is the best way to visualize the data. So, let’s get started and create a histogram for the data of age:
import matplotlib.pyplot as pyt
age = [20, 56, 67, 40, 89, 90, 23, 45, 68, 23, 11, 18]
pyt.hist(age, bins=3)
pyt.show()
Histograms work based on the bins. The number of bins we define is the number of bars we get.
As shown above, we can see that three bars are generated. That is how the histogram works in Python
We can pass additional arguments to histograms. To know more, try the following:
help(pyt.hist)
NOTE: Python will take 10 by default if we do not mention bins.
Making Charts More Expressive
Now, we know how to create the visualization of the data. Let’s see how we can make our charts more expressive.
We can label and title our chart before showing them as shown below:
import matplotlib.pyplot as pyt
population = [123, 243, 456, 690] # in million
year = [1995, 1996, 1997, 1998]
pyt.plot(year, population)
pyt.xlabel('year')
pyt.ylabel('population')
pyt.title('Graph showing population over year')
pyt.show()
There is some other customization as well that we can do to our visualizations. To learn more about Matplotlib, check official documentation, and learn more about various visualizations, refer to Data Visualization using Python article.
You can also check our guides for Pandas for data science and NumPy for data science.
Tavish lives in Hyderabad, India, and works as a result-oriented data scientist specializing in improving the major key performance business indicators.
He understands how data can be used for business excellence and is a focused learner who enjoys sharing knowledge.