Python-Seaborn

A high-level plotting library built on top of matplotlab. Seaborn helps resolve the two major problems faced by Matplotlib; the problems are:

  • Default Matplotlib parameters
  • Working with data frames

Seaborn Quick Guide

Dataset

1
2
import seaborn as sns
print(sns.get_dataset_names())
1
2
3
[u'anscombe', u'attention', u'brain_networks', u'car_crashes', u'dots',
u'exercise', u'flights', u'fmri', u'gammas', u'iris', u'planets', u'tips',
u'titanic']

Figure Aesthetic

Basically, Seaborn splits the Matplotlib parameters into two groups−

  • Plot styles
  • Plot scale

Figure style

Seaborn provides five preset themes: white grid, dark grid, white, dark, and ticks. The interface for manipulating the styles is set_style().

Background

Darkgrid

It is the default one.

1
2
3
4
5
6
7
8
9
10
from __future__ import division
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats, integrate
import seaborn as sns
sns.set()
x = np.random.randint(0,10,100)
sns.distplot(x)
plt.show()

image-20190419163334245

whitegrid

1
2
3
4
sns.set_style("whitegrid")
x = np.random.randint(0,10,100)
sns.distplot(x)
plt.show()

image-20190419163402652

dark

1
2
3
4
sns.set_style("dark")
x = np.random.randint(0,10,100)
sns.distplot(x)
plt.show()

image-20190419163435391

white

image-20190419163501257

ticks

image-20190419163705284

Axes

Remove axes spines

You can call despine function to remove them:

1
2
3
4
5
sns.set_style("ticks")
x = np.random.randint(0,10,100)
sns.distplot(x)
sns.despine()
plt.show()

image-20190419163841209

You can also control which spines are removed with additional arguments to despine:

image-20190419163953491

Scaling plot elements

We also have control on the plot elements and can control the scale of plot using the set_context() function. We have four preset templates for contexts, based on relative size, the contexts are named as follows

  • Paper
  • Notebook
  • Talk
  • Poster

By default, context is set to notebook;

Color Palette

Seaborn provides a function called color_palette(), which can be used to give colors to plots and adding more aesthetic value to it.

1
seaborn.color_palette(palette = None, n_colors = None, desat = None)

Return refers to the list of RGB tuples. Following are the readily available Seaborn palettes −

  • Deep
  • Muted
  • Bright
  • Pastel
  • Dark
  • Colorblind

Qualitative Color Palettes

Qualitative or categorical palettes are best suitable to plot the categorical data.

1
2
3
current_palette = sns.color_palette()
sns.palplot(current_palette)
plt.show()

mage-20190421201538

Here, the palplot() is used to plot the array of colors horizontally.

Sequential Color Palettes

Sequential plots are suitable to express the distribution of data ranging from relative lower values to higher values within a range.

Appending an additional character ‘s’ to the color passed to the color parameter will plot the Sequential plot.

1
2
3
current_palette = sns.color_palette("Greens")
sns.palplot(current_palette)
plt.show()

mage-20190421201720

Diverging Color Palette

Figures

Histograms, KDE, and densities

In matplotlib,

1
2
3
4
5
6
7
8
9
10
11
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats, integrate
import seaborn as sns
data = np.random.multivariate_normal(mean=[0,0],cov=[[5,2],[2,2]],size=2000) #shape = [2000,2]
data = pd.DataFrame(data,columns=['x','y'])
for col in 'xy':
plt.hist(data[col],alpha=0.5) #alpha define the of transparency of the common area
plt.show()

image-20190419195725160

Rather than a histogram, we can get a smooth estimate of the distribution using a kernel density estimation, which Seaborn does with sns.kdeplot:

1
2
3
for col in 'xy':
sns.kdeplot(data[col],shade=True)
plt.show()

image-20190419195851654

Histograms and KDE can be combined using distplot:

1
2
3
for col in 'xy':
sns.distplot(data[col])
plt.show()

image-20190419200042425

If we pass the full two-dimensional dataset to kdeplot, we will get a two-dimensional visualization of the data:

1
sns.kdeplot(data)

image-20190419200208063

We can see the joint distribution and the marginal distributions together using sns.jointplot. For this plot, we’ll set the style to a white background:

1
2
with sns.axes_style('white'):
sns.jointplot("x", "y", data, kind='kde');

image-20190419200409974

There are other parameters that can be passed to jointplot—for example, we can use a hexagonally based histogram instead:

1
2
with sns.axes_style('dark'):
sns.jointplot("x", "y", data, kind='hex')

image-20190419200524568

Pairwise plots

When you generalize joint plots to datasets of larger dimensions, you end up with pair plots. This is very useful for exploring correlations between multidimensional data, when you’d like to plot all pairs of values against each other.

We’ll demo this with the well-known Iris dataset, which lists measurements of petals and sepals of three iris species:

1
2
iris = sns.load_dataset('iris')
print(iris.head())

creen Shot 2019-04-20 at 8.47.30 P

Visualizing the multidimensional relationships among the samples is as easy as calling sns.pairplot:

1
sns.pairplot(iris,hue='species')

mage-20190420204928

We can see that if we want to do classification, we can use petal_width and petal_length because from figure 15, these three species are distinguished.

categorical data

stripplot()

stripplot() is used when one of the variable under study is categorical. It represents the data in sorted order along any one of the axis.

1
2
3
4
5
6
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
df = sb.load_dataset('iris')
sns.stripplot(x = "species", y = "petal_length", data = df)
plt.show()

mage-20190429210150

In the above plot, we can clearly see the difference of petal_length in each species. But, the major problem with the above scatter plot is that the points on the scatter plot are overlapped. We use the ‘Jitter’ parameter to handle this kind of scenario.

Jitter adds some random noise to the data. This parameter will adjust the positions along the categorical axis.

1
sns.stripplot(x = "species", y = "petal_length", data = df,jitter=True)

mage-20190429210245

We can see that the x-axis of point is changed but not y_axis. So that we can see the petal_length of each point without any overlay.

swarmplot()

Another option which can be used as an alternate to ‘Jitter’ is function swarmplot(). This function positions each point of scatter plot on the categorical axis and thereby avoids overlapping points

1
sns.swarmplot(x = "species", y = "petal_length", data = df)

mage-20190429210509

Distribution of observations

boxplot()

1
sns.boxplot(x = "species", y = "petal_length", data = df)

mage-20190429210813

violinplot()

Violin Plots are a combination of the box plot with the kernel density estimates. So, these plots are easier to analyze and understand the distribution of the data.

Let us use tips dataset called to learn more into violin plots. This dataset contains the information related to the tips given by the customers in a restaurant.

1
2
3
4
5
6
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
df = sb.load_dataset('tips')
sns.violinplot(x = "day", y = "total_bill", data=df)
plt.show()

mage-20190429210926

The quartile and whisker values from the boxplot are shown inside the violin. As the violin plot uses KDE, the wider portion of violin indicates the higher density and narrow region represents relatively lower density. The Inter-Quartile range in boxplot and higher density portion in kde fall in the same region of each category of violin plot.

The above plot shows the distribution of total_bill on four days of the week. But, in addition to that, if we want to see how the distribution behaves with respect to sex, lets explore it in below example.

1
sns.violinplot(x = "day", y = "total_bill",hue = 'sex', data = df)

mage-20190429211123

Now we can clearly see the spending behavior between male and female. We can easily say that, men make more bill than women by looking at the plot.

And, if the hue variable has only two classes, we can beautify the plot by splitting each violin into two instead of two violins on a given day. Either parts of the violin refer to each class in the hue variable.

1
sb.violinplot(x = "day", y="total_bill",hue = 'sex', data = df,split=True)

mage-20190429211433

Matplotlib

Reference

Seaborn Tutorial

Visualization with Seaborn

The Ultimate Python Seaborn Tutorial: Gotta Catch ‘Em All