August 26, 2010

Solving the nonlinear Schrödinger equation on GPU

PyOFTK can now perform the Split-Step Fourier algorithm on the GPU via the PyOFTK.ssfgpu() function. The first draft of the code do not take into accound the Raman response time. The implementation was made very easy with the pyfft python package done by Bogdan Opanchuk which is the Apple’s FFT implementation ported for PyCuda and PyOpenCL.

Here some performance results. The first chart shows the execution time for a Soliton propagation simulation on 1000 steps for different problem sizes (temporal resolution). The performance of ssfgpu() running on a NVIDIA GTX 260 was compared with the plain SSF version done in C and the numpy version running on a intel core @ 3.2Ghz. The second chart shows the overall speedup for the different problem sizes.

Like the underlying FFT gpu implementation, the ssfgpu() start to really scream on large problem. I manage to keep the data on the device memory between each step in order to avoid the transfer bottleneck so the performance scaling is essentially the same as pyfft.

PyOFTK.ssfgpu() is not avalaible yet. I will push it to the github repository soon.

Here is an example of similariton interaction computed using ssfgpu().

from numpy import *
from scipy import *
import matplotlib.pyplot as plt
from numpy.fft import fftshift
import PyOFTK
 
from pycuda.tools import make_default_context
import pycuda.driver as cuda
 
cuda.init()
context = make_default_context()
 
lambdaZero = 1.03e-6
T = 60.0
nt = pow(2,12)
dt = T/float(nt)
t = linspace(-T/2, T/2, nt)
C = 2.99792458e-4
nz = 4000
dz = 0.01
betap = array([0.0 , 0.0 ,0.024])
gamma = 0.065
 
# Gain profil
omega = PyOFTK.wspace(T,nt)
vs = fftshift(omega/(2*pi))
wavelength = (1/((vs/C)+1/(lambdaZero)))*1e9
alpha = array([-0.2])
gainBandwidth = 50.0
alphaVec = PyOFTK.gainProfil(wavelength, alpha[0], gainBandwidth)
 
dureePulse = 0.5
puissanceCrete = 2.0
u_ini_g = PyOFTK.gaussianPulse(t[0:nt/2],dureePulse,-7.5,puissanceCrete,1,0)
u_ini_d = PyOFTK.gaussianPulse(t[nt/2:nt],dureePulse,7.5,puissanceCrete,1,0)
u_ini = r_[u_ini_g, u_ini_d]
 
[u_out, i_arch] = PyOFTK.ssfgpuFull(u_ini, dt, dz, nz, alpha, betap, gamma,
context, 500, 1e-5)
 
context.pop()
 
plt.imshow(i_arch, aspect='normal')
plt.show()

June 18, 2010

githubGraph + ubigraph server = pure awesomeness

Example of a GitHub user’s social graph. This undirected graph of users which mutually follow each other, contains 7243 nodes and 14484 edges. The video shows the convergence of the Force-based algorithm used to visualize the graph.

httpvhd://www.youtube.com/watch?v=HAmfZ58C6Qo

April 15, 2010

Weekend project: compute a 68 gigapixels fractal image: Part II

So, as promised, here is the source code for the gigapixels fractal project: gmpano.tar.gz. The archive contain the following files:

mandelbrot.cpp: The actual C++ code for the computation
numpy.i and PyMandelbrot.i: The SWIG descriptions files used for the function wrapper (C++ to Python)
gigaMandelbrot.py: This one contain the Python class used for the image tile rendering
gmPano.py: The final script that putting it all together and write the final image in the Deep Zoom Format. You will need to write the xml file by hand :-( . You can use HDView SL or HDView for viewing the image.

You will also need SWIG and the following python package: numpy, matplotlib, python parallel.

Here is a example of compilation process on a 64-bits linux machine:
swig -c++ -python PyMandelbrot.i
g++ -c -O3 -fPIC -fomit-frame-pointer -funroll-loops -fstrict-aliasing -m64 mandelbrot.cpp -I/usr/include/python2.6
g++ -c -O3 -fPIC -fomit-frame-pointer -funroll-loops -fstrict-aliasing -m64 PyMandelbrot_wrap.cxx -I/usr/include/python2.6
g++ -shared mandelbrot.o PyMandelbrot_wrap.o -o _PyMandelbrot.so -lm -lstdc+

March 27, 2010

Weekend project: compute a 68 gigapixels fractal image

Here is a project I made in order to test HDView SL. I’m not a big fan of the Flash/Silverlight thing, but I have to admit that the Seadragon zooming technique include in Silverlight is pretty neat. Put it in Fullscreen and have fun !

Quick facts:
- The final image size is around 3.2 GB.
- A physical print @ 300 dpi would be around 72.75 feet x 72.75 feet ! Which is equivalent to a 8-story building
- It took around 8 hrs to compute on a 4 cores CPU @ 2.4 Ghz.
- The main loop of the code was done in C.
- The final multithreaded code was done in Python with the help of Parallel Python
- The function wrapper (C -> Python) was done with SWIG and numpy.i.
- I’ll publish the code in the next posts

March 2, 2010

Fractal & Python: Part II

After the Lorenz attractor, here is a little code for exploring the Mandelbrot Set. Of course, you could compute the same thing in CUDA with a 1000X speedup… but  it wouldn’t be pythonish as this !

from numpy import *
import matplotlib.pylab as pl
 
maxIteration = 128
z_min = -2-1j
z_max = 1+1j
 
# Set the image size here
imageSize = [512,512]
image = zeros(imageSize,int)
xValue = linspace(z_min.real, z_max.real, imageSize[0])
yValue = linspace(z_min.imag, z_max.imag, imageSize[1])
 
for i in arange(imageSize[0]):
	for k in arange(imageSize[1]):
		iteration = 0
		x = xValue[i]
		y = yValue[k]
		while (x*x+y*y < (2*2)) & (iteration < maxIteration):
			xtemp = x*x - y*y + xValue[i]
 			y = 2*x*y + yValue[k]
			x = xtemp
			iteration = iteration + 1
		if iteration == maxIteration:
			image[i,k] = 0
		else:
			image[i,k] = 255-iteration
 
pl.imshow(image)
pl.prism()
pl.show()