Sunday, 1 March 2020

My Matplotlib Cheatsheet (#2)

Previously, I wrote about some matplotlib methods I often use. In this post I will show two more.

Scatter plots with color for density

Scatter plots with many points can become very uninformative. A very easy solution is to color the points according to their local density. This can be done with the gaussian_kde from scipy.stats. Suppose that we want to make a scatter plot from xs and ys in subplot ax, then we just have to add two lines to color the points:
xy = numpy.vstack([xs, ys])
z = scipy.stats.gaussian_kde(xy)(xy)
ax.scatter(xs, ys, s=5, c=z)
Here's an example:



This is the code I used to make the figure, making use of inset_axes for a color bar:

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as sts
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
## sample from a mixture of Gaussians
n, m, k = 2, 10, 10000
mus = sts.uniform.rvs(size=(m,n))
ps = sts.dirichlet.rvs(3*np.ones(m))[0]
xss = [sts.multivariate_normal.rvs(mean=mu, cov=0.02*np.eye(n), size=int(p*k))
for p, mu in zip(ps, mus)]
xs = np.concatenate(xss).T
## make a figure with two subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,5), sharex=True, sharey=True)
## but remove some space between the panels
fig.subplots_adjust(wspace=0.05)
## make a simple scatter plot
ax1.scatter(xs[0], xs[1], s=5)
## use the density to color the points
z = sts.gaussian_kde(xs)(xs)
C = ax2.scatter(xs[0], xs[1], s=5, c=z)
## we can even add a colorbar to be overly precise
cx = inset_axes(ax2, width='5%', height='50%', loc='lower right', borderpad=0)
fig.colorbar(C, cax=cx, orientation='vertical')
## titles etcetera
ax1.set_title("a blue blob")
ax2.set_title("a glowing blob")
cx.set_ylabel("density")
fig.savefig("scatter-with-density.png", bbox_inches='tight', dpi=200)


Recycling bins for multiple histograms

When you want to plot multiple histograms in one plot, the width of the bins can be very different, which does not look so good. I recently found a simple method to fix this. The second value that the hist function returns specifies the boundaries of the bins used for the histogram. This value can be used directly in the next call to the hist function using the bins keyword. The following figure shows the result:



I used the following code to plot this figure:

import matplotlib.pyplot as plt
import scipy.stats as sts
## generate some data
p, k1, k2, n = 0.3, 5, 50, 1000
xs1 = sts.beta.rvs(a=p*k1, b=(1-p)*k1, size=n)
xs2 = sts.beta.rvs(a=p*k2, b=(1-p)*k2, size=n)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,4), sharex=True, sharey=True)
alpha, nbins = 0.7, 40
## make two histograms with the same number of bins
ax1.hist(xs1, bins=nbins, density=True, alpha=alpha, label="$x_1$")
ax1.hist(xs2, bins=nbins, density=True, alpha=alpha, label="$x_2$")
## the second return value of hist specifies the bin borders
_, bins, _ = ax2.hist(xs1, bins=nbins, density=True, alpha=alpha, label="$x_1$")
## and canbe used instead of the number of bins
ax2.hist(xs2, bins=bins, density=True, alpha=alpha, label="$x_2$")
ax1.set_xlim(0,1)
ax1.set_title("not so nice")
ax2.set_title("that's better")
for ax in (ax1, ax2):
ax.set_xlabel("$x_1$, $x_2$")
ax.legend()
fig.savefig("recycling-bins.png", dpi=200)