matplotlib
methods I often use.
In this post I will show two more.
Scatter plots with color for density
Scatter plots with many points can become very uninformative. A very easy solution is to color the points according to their local density. This can be done with thegaussian_kde
from scipy.stats
.
Suppose that we want to make a scatter plot from xs
and ys
in subplot ax
, then we just have to add two lines to color the points:
xy = numpy.vstack([xs, ys]) z = scipy.stats.gaussian_kde(xy)(xy) ax.scatter(xs, ys, s=5, c=z)Here's an example:
This is the code I used to make the figure, making use of
inset_axes
for a color bar:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import matplotlib.pyplot as plt | |
import numpy as np | |
import scipy.stats as sts | |
from mpl_toolkits.axes_grid1.inset_locator import inset_axes | |
## sample from a mixture of Gaussians | |
n, m, k = 2, 10, 10000 | |
mus = sts.uniform.rvs(size=(m,n)) | |
ps = sts.dirichlet.rvs(3*np.ones(m))[0] | |
xss = [sts.multivariate_normal.rvs(mean=mu, cov=0.02*np.eye(n), size=int(p*k)) | |
for p, mu in zip(ps, mus)] | |
xs = np.concatenate(xss).T | |
## make a figure with two subplots | |
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,5), sharex=True, sharey=True) | |
## but remove some space between the panels | |
fig.subplots_adjust(wspace=0.05) | |
## make a simple scatter plot | |
ax1.scatter(xs[0], xs[1], s=5) | |
## use the density to color the points | |
z = sts.gaussian_kde(xs)(xs) | |
C = ax2.scatter(xs[0], xs[1], s=5, c=z) | |
## we can even add a colorbar to be overly precise | |
cx = inset_axes(ax2, width='5%', height='50%', loc='lower right', borderpad=0) | |
fig.colorbar(C, cax=cx, orientation='vertical') | |
## titles etcetera | |
ax1.set_title("a blue blob") | |
ax2.set_title("a glowing blob") | |
cx.set_ylabel("density") | |
fig.savefig("scatter-with-density.png", bbox_inches='tight', dpi=200) |
Recycling bins for multiple histograms
When you want to plot multiple histograms in one plot, the width of the bins can be very different, which does not look so good. I recently found a simple method to fix this. The second value that thehist
function returns specifies the boundaries of the bins used for the histogram.
This value can be used directly in the next call to the hist
function using the bins
keyword.
The following figure shows the result:
I used the following code to plot this figure:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import matplotlib.pyplot as plt | |
import scipy.stats as sts | |
## generate some data | |
p, k1, k2, n = 0.3, 5, 50, 1000 | |
xs1 = sts.beta.rvs(a=p*k1, b=(1-p)*k1, size=n) | |
xs2 = sts.beta.rvs(a=p*k2, b=(1-p)*k2, size=n) | |
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10,4), sharex=True, sharey=True) | |
alpha, nbins = 0.7, 40 | |
## make two histograms with the same number of bins | |
ax1.hist(xs1, bins=nbins, density=True, alpha=alpha, label="$x_1$") | |
ax1.hist(xs2, bins=nbins, density=True, alpha=alpha, label="$x_2$") | |
## the second return value of hist specifies the bin borders | |
_, bins, _ = ax2.hist(xs1, bins=nbins, density=True, alpha=alpha, label="$x_1$") | |
## and canbe used instead of the number of bins | |
ax2.hist(xs2, bins=bins, density=True, alpha=alpha, label="$x_2$") | |
ax1.set_xlim(0,1) | |
ax1.set_title("not so nice") | |
ax2.set_title("that's better") | |
for ax in (ax1, ax2): | |
ax.set_xlabel("$x_1$, $x_2$") | |
ax.legend() | |
fig.savefig("recycling-bins.png", dpi=200) |