Data Visualization With Matplotlib
Updated: Nov 29, 2021
“A picture speaks louder than words”'
According to the World Economic Forum, the world produces 2.5 quintillion bytes of data every day. Have you ever thought, how this much amount of data is managed and analyzed?
Data visualization refers to the representation of the data using the graphical or pictorial format. Visualization such as graphs, pie charts, histograms, time series plots, etc. is being used to find the insights from both small scale and large scale data. If you want to learn how visualizations are made and create your very own beautiful visualizations for your company’s data, we are here with one of the visualization techniques. There are many techniques and methods which provide a variety of functionalities to convert your statistical data into a picture. We are going to start with a library in python called Matplotlib.
Matplotlib. pyplot is an easy-to-use and powerful library in python for plotting 2D visuals. It allows you to create almost any visualization you could imagine. The plus point is that it is open source and other advanced visualization tools use matplotlib as the base library.
Matplotlib Interfaces: There are two types of interfaces in Matplotlib for visualization.
It is easy to generate plots using the pyplot module which is provided by the matplotlib library. You just need to import the matplotlib.pyplot module. It has a lot of similarities in syntax and methodology with Matlab.
plt.plot() - It is used for plotting the line chart. In place of ‘plot’, you can use other functions like bar, barh, scatter, hist to plot the respective plots.
plt.xlabel , plt.ylabel - This labels the x and y coordinate axis respectively.
plt.xticks, plt.yticks - For labeling x and y-axis observation tick points respectively.
plt.legend() - It signifies the observation variables and makes it easy to understand them.
plt.title() - plt.title is used to set the name or title of the plot.
plt.show() - This is used to display the plots on the screen.
Object-Oriented Method :
The use of an object-oriented approach is recommended as it gives more flexibility, control, and customization to your plots. The Object-Oriented approach is better when there is a need to draw multiple plots on the canvas.
Figure: You can consider it as a canvas or main container on which the plots will be displayed.
Axes: Axes is the coordinate system. There may be multiple axes objects in one figure.
Axis: Imagine it as the limit lines for our plot. It may contain an x-axis, y-axis, and z-axis as well in the case of a 3d plot.
To create a plot using matplotlib’s object-oriented approach lets first create the figure and axis we using the subplots() from pyplot module
fig,ax = plt.subplots() plt.grid()
Changing the figure size
You can change the size of the figure using figsize() and passing the width and height as parameters.
# Resize figure
fig, ax = plt.subplots(figsize = (10, 6))
Matplotlib’s object-oriented approach provides a way to include more than one plot in a figure by creating additional ‘axis’ objects.
# Figure with two plots
Matplotlib provides a facility to save the figures using matplotlib,pyplot. savefig()
x = [0, 2, 4, 6] y = [1, 3, 4, 8] plt.plot(x,y) plt.xlabel('x values') plt.ylabel('y values') plt.title('plotted x and y values') plt.legend(['line 1']) # save the figure plt.savefig('plot.png', dpi=300, bbox_inches='tight') plt.show()
Installation and Importing
Let’s start with installing and importing the library. To install Matplotlib on your local machine, open the python command prompt, and type the following commands:
python -m pip install -U pip python -m pip install -U matplotlib
You can either use the Anaconda Distribution Package which installs python, jupyter notebook, and important libraries like pandas, NumPy, Matplotlib, etc. Or you can use the Google Colab which is the free cloud-based version of jupyter notebook.
Once you are done with installing the library, you need to import it into your code so as to work with it. Using the import command, import the library in your code as shown below.
import matplotlib.pyplot as plt
Now let’s see how to plot simple plots using matplotlib.
1. Line plot :
It is a type of chart or graph that displays information as a series of data points called ‘markers’ connected by a straight line segment. It can be plotted using matplotlib.pyplot.plot() function.
View different types of markers here.
sales1 = [1, 7, 10, 3, 7, 12, 8, 11, 19, 9, 8] sales2 = [3, 7, 9, 5, 4, 7, 12, 7, 6, 12, 11] #plotting line plots line_chart1 = plt.plot(range(1,12), sales1,color='blue', marker='o') line_chart2 = plt.plot(range(1,12), sales2,color='red', marker = 'o') plt.title('Monthly sales of 2018 and 2019') plt.ylabel('Sales') plt.xlabel('Month') plt.legend(['year 2018', 'year 2019'], loc=4) plt.show()
2. Bar plot
A bar plot represents categorical data with rectangular bars with lengths proportional to the values that they represent. There are different types of bar graphs such as simple, joint, stacked. It can be plotted by matplotlib.pyplot.bar()
i) Simple bar graph
ii) Joint bar graph
data = [[20, 29, 20, 40], [40, 33, 41, 19], [35, 28, 45, 39]] label=['Test-1','Test-2','Test-3','Test-4'] X = np.arange(len(label)) y=[0.25,1.25,2.25,3.25] fig = plt.figure() ax = fig.add_axes([0,0,1,1]) plt.xlabel('Tests') plt.ylabel('Marks') plt.title('Comparing marks of three students in different tests') plt.xticks(y,label) #creating bar plots ax.bar(X + 0.00, data, color = 'black', width = 0.25,alpha=0.5,label='Student 1') ax.bar(X + 0.25, data, color = 'orange', width = 0.25,alpha=0.6,label='Student 2') ax.bar(X + 0.50, data, color = 'red', width = 0.25,alpha=0.5,label='Studnet 3') plt.legend() plt.show()
iii) Stacked bar graph
male = (20, 25, 35, 40, 37) female = (22, 30, 35, 30, 25) ind = np.arange(5) # the x locations for the groups plt.bar(ind, male, width= 0.35 ,color='black',alpha=0.8,label='Male') plt.bar(ind, female, width= 0.35 ,bottom=male,color='orange',label='Feamle') plt.ylabel('Age(in Years)') plt.title('Age of employees by group and gender') plt.xticks(ind, ('G1', 'G2', 'G3', 'G4', 'G5')) plt.yticks(np.arange(0, 81, 10)) plt.legend() plt.show()
A histogram displays numerical data by grouping data into “bins” of equal width. It is the probability distribution of a continuous variable. It can be plotted using matplotlib.pyplot.hist()
data = np.random.randn(1000) plt.hist(data, bins=30, alpha=0.5, histtype='stepfilled', color='green', edgecolor='none');
series1 = np.random.randn(700, 1) series2 = np.random.randn(500, 1) # plotting first histogram plt.hist(series1, label='series1', alpha=.8, edgecolor='black') # plotting second histogram plt.hist(series2, label='series2', alpha=0.7, edgecolor='black') plt.legend() # Showing the plot using plt.show() plt.show()
4. Pie plot
The pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical data. It can be plotted using matplotlib.pyplot.pie()
labels = 'Football', 'Basketball', 'Cricket', 'Badminton' people = [200, 100, 225, 120] colors = ['gold', 'yellowgreen', 'lightcoral', 'lightskyblue'] explode = (0.1, 0, 0, 0) # explode 1st slice # Plot plt.pie(people, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%', shadow=True, startangle=140) plt.axis('equal') plt.show()
5. Box plot
A boxplot is the standard way of showing the distribution of data based on a five-number summary (minimum, first quartile, median, third quartile, and maximum). It is used in Exploratory Data Analysis to show the distribution of numerical data.
It can be plotted using matplotlib.pyplot.boxplot()
range1=[82,76,24,40,67,62,75,78,71,32,98,89,78,67,72,82,87,66,56,52] range2= [62,5,91,25,36,32,96,95,3,90,95,32,27,55,100,15,71,11,37,21] range3=[23,89,12,78,72,89,25,69,68,86,19,49,15,16,16,75,65,31,25,52] range4=[59,73,70,16,81,61,88,98,10,87,29,72,16,23,72,88,78,99,75,30] box_plot_data=[range1,range2,range3,range4] box=plt.boxplot(box_plot_data,patch_artist=True,labels=['task1','task2','task3','task4'], ) colors = ['orange', 'cyan', 'black', 'red'] for patch, color in zip(box['boxes'], colors): patch.set_facecolor(color) plt.show()
6. Violin Plot
A violin plot describes the distribution of numeric data for one or more groups using density curves. It can be plotted using matplotlib.pyplot.violinplot()
np.random.seed(10) collectn_1 = np.random.normal(100, 10, 200) collectn_2 = np.random.normal(80, 30, 200) collectn_3 = np.random.normal(90, 20, 200) collectn_4 = np.random.normal(70, 25, 200) ## combine these different collections into a list data_to_plot = [collectn_1, collectn_2, collectn_3, collectn_4] # Create a figure instance fig = plt.figure() # Create an axes instance ax = fig.add_axes([0,0,1,1]) # Create the violinplot bp = ax.violinplot(data_to_plot) plt.show()
Heatmaps are versatile and efficient in plotting trends. It visualizes data through variations in coloring. It is used to analyze multivariate data.
vegetables = ["cucumber", "tomato", "lettuce", "asparagus", "potato", "wheat", "barley"] farmers = ["Farmer Joe", "Upland Bros.", "Smith Gardening", "Agrifun", "Organiculture", "BioGoods Ltd.", "Cornylee Corp."] harvest = np.array([[0.8, 2.4, 2.5, 3.9, 0.0, 4.0, 0.0], [2.4, 0.0, 4.0, 1.0, 2.7, 0.0, 0.0], [1.1, 2.4, 0.8, 4.3, 1.9, 4.4, 0.0], [0.6, 0.0, 0.3, 0.0, 3.1, 0.0, 0.0], [0.7, 1.7, 0.6, 2.6, 2.2, 6.2, 0.0], [1.3, 1.2, 0.0, 0.0, 0.0, 3.2, 5.1], [0.1, 2.0, 0.0, 1.4, 0.0, 1.9, 6.3]]) fig, ax = plt.subplots() im = ax.imshow(harvest) # We want to show all ticks... ax.set_xticks(np.arange(len(farmers))) ax.set_yticks(np.arange(len(vegetables))) # ... and label them with the respective list entries ax.set_xticklabels(farmers) ax.set_yticklabels(vegetables) # Rotate the tick labels and set their alignment. plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor") # Loop over data dimensions and create text annotations. for i in range(len(vegetables)): for j in range(len(farmers)): text = ax.text(j, i, harvest[i, j], ha="center", va="center", color="w") ax.set_title("Harvest of local farmers (in tons/year)") fig.tight_layout() plt.show()
“The greatest value of a picture is when it forces us to notice what we never expected to see.” - John Tukey
We covered the syntax and overall structure of creating matplotlib plot. We saw how the Matlab-like syntax and object-oriented approach is used for various numerical and categorical data. If you want to explore more, you can use Matplotlib's Official Documentation. Happy Learning!