Supercomputers with a performence of a trillion floating-point operations per second, or more, can be produced in state-of-the-art MOS technologies. Such computers will have tens of thousands of processors interconnected by a network of bounded degree. Reducing the requried data motion trough a careful choice of data allocation and computational and routing algorithms is critical for performance. The management of thousands of processors can only be accomplished trough programming languages with suitable abstractions.
We use Connection Machine as a model architecture for future supercomputers, and Fortran 8X as an example of a language with some of the abstractions suitable for programming thousands of processors. Some of the communication primitives suitable for expressing structured scientific computations are discussed, and their benefit with respect to performance illustrated. With thousands of processors engaged in the solution of a single scientific problem, several subtasks are often treaten concurrently in addition to the concurrent execution of each subtask. Some issues in constructing scientific libraries for such enviroments are discussed. Concurrent algorithms and performance data for matrix multiplication and the Fast Fourier Transformer are presented. The solution of the compressible Navier-Stokes equation in three spatial dimensions by an explicit finite difference method, and the solution of a paralbolic approximation of the Helmholtz equation by an implict method are two examples of applications for which data parallel implementations are described briefly. The Helmholtz equations models three dimensional acoustic waves in the ocean