August 26, 2010

Solving the nonlinear Schrödinger equation on GPU

PyOFTK can now perform the Split-Step Fourier algorithm on the GPU via the PyOFTK.ssfgpu() function. The first draft of the code do not take into accound the Raman response time. The implementation was made very easy with the pyfft python package done by Bogdan Opanchuk which is the Apple’s FFT implementation ported for PyCuda and PyOpenCL.

Here some performance results. The first chart shows the execution time for a Soliton propagation simulation on 1000 steps for different problem sizes (temporal resolution). The performance of ssfgpu() running on a NVIDIA GTX 260 was compared with the plain SSF version done in C and the numpy version running on a intel core @ 3.2Ghz. The second chart shows the overall speedup for the different problem sizes.

Like the underlying FFT gpu implementation, the ssfgpu() start to really scream on large problem. I manage to keep the data on the device memory between each step in order to avoid the transfer bottleneck so the performance scaling is essentially the same as pyfft.

PyOFTK.ssfgpu() is not avalaible yet. I will push it to the github repository soon.

Here is an example of similariton interaction computed using ssfgpu().

from numpy import *
from scipy import *
import matplotlib.pyplot as plt
from numpy.fft import fftshift
import PyOFTK
 
from pycuda.tools import make_default_context
import pycuda.driver as cuda
 
cuda.init()
context = make_default_context()
 
lambdaZero = 1.03e-6
T = 60.0
nt = pow(2,12)
dt = T/float(nt)
t = linspace(-T/2, T/2, nt)
C = 2.99792458e-4
nz = 4000
dz = 0.01
betap = array([0.0 , 0.0 ,0.024])
gamma = 0.065
 
# Gain profil
omega = PyOFTK.wspace(T,nt)
vs = fftshift(omega/(2*pi))
wavelength = (1/((vs/C)+1/(lambdaZero)))*1e9
alpha = array([-0.2])
gainBandwidth = 50.0
alphaVec = PyOFTK.gainProfil(wavelength, alpha[0], gainBandwidth)
 
dureePulse = 0.5
puissanceCrete = 2.0
u_ini_g = PyOFTK.gaussianPulse(t[0:nt/2],dureePulse,-7.5,puissanceCrete,1,0)
u_ini_d = PyOFTK.gaussianPulse(t[nt/2:nt],dureePulse,7.5,puissanceCrete,1,0)
u_ini = r_[u_ini_g, u_ini_d]
 
[u_out, i_arch] = PyOFTK.ssfgpuFull(u_ini, dt, dz, nz, alpha, betap, gamma,
context, 500, 1e-5)
 
context.pop()
 
plt.imshow(i_arch, aspect='normal')
plt.show()