The Fast Multipole Method and Radial Basis Function Interpolation

I collaborated with Felipe Cruz on the PetFMM code for simulation using the fast multipole method. The main idea is that a very simple serial implementation of FMM can be combined with optimization techniques from graph partitioning to produce an efficient and scalable parallel FMM. PetFMM completely reuses the serial code, and also uses templating to optimize the implementation. It templates over the structure used to describe source and targets, the neighborhood stencil, the expansion order, and the dimension. We are able to guide the optimization routine with good theoretical estimates of the communication and computation complexity. In fact, we hope to show that optimal communication complexity is achievable with PetFMM.

I also worked with Rio Yokota and Andreas Klöckner on a version of FMM which would autogenerate expansions given the kernel function. We believe this is the step that most severely limits the wide adoption of the method. The implementation uses sympy for symbolic manipulation and code generation, and PyCUDA for kernel execution and organization.

I also collaborated with Rio Yokota and Lorena Barba on the PetRBF code for parallel, radial basis function interpolation. PetRBF uses a scalable preconditioning strategy which uses small ASM blocks tailored to the function overlap. This code can interface with FMM libraries, or other fast evaluation schemes, to support large scale simulation.