A GPU version of the pressure projection solver using OpenCL is implemented. Then it has been compared with CPU version which is accelerated with OpenMP. The GPU version shows a sensible reduction in time despite using a simple algorithm in the kernel. The nal code is plugged into a commercial uid simulator software. Dierent kinds of algorithms and data transfer methods have been investigated. Overlapping the computation and communication showed a more than 3 times speed-up versus the serial communication-computation pattern. Finally we exploit methods for partitioning data and writing kernels to use many of the bene ts of computation on a heterogeneous system. We ran all the simulations on a machine with an Intel core i7-2600 cpu and 16 GB main memory coupled with a GeForce GTX 560 Ti graphic processing unit on a windows OS.