• Sayali Deodikar

Data Visualization With Seaborn

Updated: Nov 29, 2021

Seaborn is a matplotlib based data visualization library with great functionalities and the capability to work with the entire data set at once. Matplotlib provides a way to create basic plots such as line plots, bar graphs, pie charts, etc. whereas seaborn gives you the facility to create statistical graphs that are used for data science, machine learning applications.


Univariate data contains only one variable and doesn’t deal with the effect of relationships and data dependency. Bivariate analysis is slightly complex as compared to univariate as it deals with two variables. The prospect of bivariate data is to compare and analyze the data for correlations. Seaborn can handle both univariate and bivariate data to create beautiful graphical illustrations.


Key features of seaborn include:

  • Seaborn comes with themes that help to make the graphs that appear more aesthetically appealing.

  • Smooth plotting of time-series data

  • Works well with NumPy and Pandas


As seaborn compliments and extends Matplotlib, the learning curve is quite gradual: if you know Matplotlib, it’s easy to get hands-on seaborn. If you want to know about matplotlib, check out our article on matplotlib here


Getting started with seaborn


Let’s start working with the Seaborn library.

Requirement: Python 2.7 or 3.4+ version


Installing Seaborn

If you have Python and PIP already installed on a system, install it using this command:


pip install seaborn 

If you are using Jupyter notebook install Seaborn using this command:


!pip install seaborn 

Importing Seaborn

Import the library in your code using this command:


import seaborn as sns 

sns is the alias given for the seaborn so as to make it easy to use.


Overview of seaborn plotting functions


Most of your interactions with seaborn will happen through a set of plotting functions.

Seaborn is hierarchically structured, with modules of functions that achieve similar visualization goals through different means. The modules are classified as relational, distributional, and categorical.


Image source


Moving on, now we will create the plots using seaborn. Seaborn has preloaded datasets. We will use some of them to create the plots.


1. Line plot

Line plots describe the relationship between continuous and categorical data points.

Syntax :


seaborn.lineplot(*, x=None, y=None, hue=None, size=None, style=None, data=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, dashes=True, markers=None, style_order=None, units=None, estimator='mean', ci=95, n_boot=1000, seed=None, sort=True, err_style='band', err_kws=None, legend='auto', ax=None, **kwargs)



flights = sns.load_dataset("flights")
may_flights = flights.query("month == 'May'")
sns.set_style("darkgrid")
sns.lineplot(data=may_flights, x="year", y="passengers")



2. Scatter Plot

A scatter plot displays the relationship between 2 numeric variables. Each data point is represented as a circle.


Syntax :


seaborn.scatterplot(*, x=None, y=None, hue=None, style=None, size=None, data=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=True, style_order=None, x_bins=None, y_bins=None, units=None, estimator=None, ci=95, n_boot=1000, alpha=None, x_jitter=None, y_jitter=None, legend='auto', ax=None, **kwargs)



tips = sns.load_dataset("tips")
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="size", palette="deep")

  • Hue specifies variables according to which the data is grouped and colored.

  • Palette specifies methods for choosing the colors.




3. Histplot

A histogram is a visualization tool that represents the distribution of one or more variables by counting the number of observations that fall within discrete bins.


Syntax:


seaborn.histplot(data=None, *, x=None, y=None, hue=None, weights=None, stat='count', bins='auto', binwidth=None, binrange=None, discrete=None, cumulative=False, common_bins=True, common_norm=True, multiple='layer', element='bars', fill=True, shrink=1, kde=False, kde_kws=None, line_kws=None, thresh=0, pthresh=None, pmax=None, cbar=False, cbar_ax=None, cbar_kws=None, palette=None, hue_order=None, hue_norm=None, color=None, log_scale=None, legend=True, ax=None, **kwargs)


penguins = sns.load_dataset("penguins")
sns.histplot(data=penguins, x="flipper_length_mm", binwidth=3)

  • Binwidth specifies the width of each bin




4. Kernel Density Estimate ( kdeplot )

A kernel density estimate (KDE) plot is a method for visualizing the distribution of observations in a dataset, analogous to a histogram. KDE represents the data using a continuous probability density curve in one or more dimensions.


Syntax :


seaborn.kdeplot(x=None, *, y=None, shade=None, vertical=False, kernel=None, bw=None, gridsize=200, cut=3, clip=None, legend=True, cumulative=False, shade_lowest=None, cbar=False, cbar_ax=None, cbar_kws=None, ax=None, weights=None, hue=None, palette=None, hue_order=None, hue_norm=None, multiple='layer', common_norm=True, common_grid=False, levels=10, thresh=0.05, bw_method='scott', bw_adjust=1, log_scale=None, color=None, fill=None, data=None, data2=None, warn_singular=True, **kwargs)


iris = sns.load_dataset("iris")
sns.kdeplot(data=iris)



5. Strip plot

It is basically a scatter plot that differentiates different categories


Syntax:


seaborn.stripplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, jitter=True, dodge=False, orient=None, color=None, palette=None, size=5, edgecolor='gray', linewidth=0, ax=None, **kwargs)


tips = sns.load_dataset("tips")
ax = sns.stripplot(x="day", y="total_bill", data=tips)



6. Boxplot

A boxplot is a method for graphically depicting groups of numerical data through their quartiles.


Syntax:


seaborn.boxplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=0.75, width=0.8, dodge=True, fliersize=5, linewidth=None, whis=1.5, ax=None, **kwargs)



sns.set_theme(style="whitegrid")
iris = sns.load_dataset("iris")
ax = sns.boxplot(data=iris, orient="h", palette="Set2")

  • Orient specifies the orientation of the plot (vertical or horizontal).




7. Violin plot

A violin plot depicts distributions of numeric data for one or more groups using density curves. The width of each curve corresponds with the approximate frequency of data points in each region.


Syntax:

seaborn.violinplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, bw='scott', cut=2, scale='area', scale_hue=True, gridsize=100, width=0.8, inner='box', split=False, dodge=True, orient=None, linewidth=None, color=None, palette=None, saturation=0.75, ax=None, **kwargs)


sns.set_theme(style="whitegrid")
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", hue="smoker", data=tips, palette="muted")


8. Barplot

Barplot is used to visualize the one data with respect to other data.


Syntax:


seaborn.barplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, estimator=<function mean at 0x7ff320f315e0>, ci=95, n_boot=1000, units=None, seed=None, orient=None, color=None, palette=None, saturation=0.75, errcolor='.26', errwidth=None, capsize=None, dodge=True, ax=None, **kwargs)



sns.set_theme(style="whitegrid")
tips = sns.load_dataset("tips")
ax = sns.barplot(x="day", y="total_bill", data=tips)


9. Heatmap

Syntax:


seaborn.heatmap(data, *, vmin=None, vmax=None, cmap=None, center=None, robust=False, annot=None, fmt='.2g', annot_kws=None, linewidths=0, linecolor='white', cbar=True, cbar_kws=None, cbar_ax=None, square=False, xticklabels='auto', yticklabels='auto', mask=None, ax=None, **kwargs)


Here we are using Churn_Modelling.csv as a dataset. We are working with data frames with pandas. Let’s try to plot the correlation matrix of the dataset.


data = pd.read_csv('Churn_Modelling.csv')
data.drop(['RowNumber','Surname'],axis = 1,inplace = True)
data['IsBalance'] = data['Balance'].where(data['Balance'] == 0, 1)
data['Gender'].replace({'Male':0, 'Female':1}, inplace=True)
corr = data.corr()
plt.figure(figsize = (10,6))
sns.heatmap(corr, cmap='YlGnBu',annot = True)



Conclusion :

This shows that how seaborn is effective as compared to matplotlib. In this article we covered, where we can use seaborn, what is the overall function model of seaborn and how to plot simple plots. Now you can create your own visualizations using seaborn. For more information, you can explore the official documentation of seaborn. Happy learning!!


35 views0 comments

Recent Posts

See All