Interpolation: the DFInterpolator

Linear interpolation between gridded datapoints lies at the heart of much of what isochrones does. The custom DFInterpolator object manages this interpolation, implemented to optimize speed and convenience for large grids. A DFInterpolator is built on top of a pandas multi-indexed dataframe, and while designed with stellar model grids in mind, it can be used with any similarly structured data.

Let’s demonstrate with a small example of data on a 2-dimensional grid.

[1]:
import itertools
import numpy as np
import pandas as pd

x = np.arange(1, 4)
y = np.arange(1, 6)

index = pd.MultiIndex.from_product((x, y), names=['x', 'y'])
df = pd.DataFrame(index=index)

df['sum'] = [x + y for x, y in itertools.product(x, y)]
df['product'] = [x * y for x, y in itertools.product(x, y)]
df['power'] = [x**y for x, y in itertools.product(x, y)]

df
[1]:
sum product power
x y
1 1 2 1 1
2 3 2 1
3 4 3 1
4 5 4 1
5 6 5 1
2 1 3 2 2
2 4 4 4
3 5 6 8
4 6 8 16
5 7 10 32
3 1 4 3 3
2 5 6 9
3 6 9 27
4 7 12 81
5 8 15 243

The DFInterpolator is initialized with this dataframe and then can interpolate the values of the columns at any location within the grid defined by the multiindex.

[2]:
from isochrones.interp import DFInterpolator

interp = DFInterpolator(df)
interp([1.4, 2.1])
[2]:
array([3.5 , 2.94, 2.36])

Individual columns may also be accessed by name:

[3]:
interp([2.2, 4.6], ['product'])
[3]:
array([10.12])

This object is very similar to the linear interpolation objects available in scipy, but it is significantly faster for single interpolation evaluations:

[4]:
from scipy.interpolate import RegularGridInterpolator

nx, ny = len(x), len(y)
grid = np.reshape(df['sum'].values, (nx, ny))
scipy_interp = RegularGridInterpolator([x, y], grid)

# Values are the same
assert(scipy_interp([1.3, 2.2])==interp([1.3, 2.2], ['sum']))

# Timings are different
%timeit scipy_interp([1.3, 2.2])
%timeit interp([1.3, 2.2])
10000 loops, best of 3: 176 µs per loop
The slowest run took 7.10 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.71 µs per loop

The DFInterpolator is about 30x faster than the scipy regular grid interpolation, for a single point. However, for vectorized calculations, scipy is indeed faster:

[5]:
N = 10000
pts = [1.3 * np.ones(N), 2.2 * np.ones(N)]
%timeit scipy_interp(np.array(pts).T)
%timeit interp(pts, ['sum'])
The slowest run took 7.51 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 1.52 ms per loop
The slowest run took 30.75 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 15.1 ms per loop

However, the DFInterpolator has an additional advantage of being able to manage missing data—that is, the grid doesn’t have to be completely filled to construct the interpolator, as it does with scipy:

[6]:
df_missing = df.drop([(3, 3), (3, 4)])
df_missing
[6]:
sum product power
x y
1 1 2 1 1
2 3 2 1
3 4 3 1
4 5 4 1
5 6 5 1
2 1 3 2 2
2 4 4 4
3 5 6 8
4 6 8 16
5 7 10 32
3 1 4 3 3
2 5 6 9
5 8 15 243
[7]:
interp_missing = DFInterpolator(df_missing)
interp_missing([1.3, 2.2])
[7]:
array([3.5 , 2.86, 2.14])

However, if the grid cell that the requested point is in is adjacent to one of these missing points, the interpolation will return nans:

[8]:
interp_missing([2.3, 3])
[8]:
array([nan, nan, nan])

In other words, the interpolator can be constructed with an incomplete grid, but it does not fill values for the missing points.