129.tera_tf
SPEC MPI2007 Benchmark Description

Benchmark Name

129.tera_tf


Benchmark Author

Bertrand Meltz bertrand.meltz@cea.fr


Benchmark Program General Category

3D eulerian hydrodynamics application

Benchmark Description

3D eulerian hydrodynamics application 2nd godunov-type scheme, 3rd order remapping requires only a Fortran 90 compiler, and an MPI (1.2) implementation uses mostly point-to-point messages, and some reductions uses non-blocking messages. The global domain is a cube, with N cells in each direction, which amounts to a total number of N^3 cells. To set up the problem, one needs to define the number of cells in each direction, and the number of blocks in each direction. Each block corresponds to an MPI task. For each direction, the product of the number of cells per block by the number of blocks must be equal to the number of cells per direction, as defined in the grid definition part of the input file. Although the global number of cells should be the same in each direction (it can differ slightly), the number of blocks could be different. For example, with 60 processors, the problem could be run with 240 cells per direction, and 3x4x5 MPI blocks. In that case, each block would have size 80x60x48, which amounts to a little less than 1/4 million cells per procesor. All included datasets are defined to have between 200 and 240 thousand cells per processor.
The initial timestep must vary according to the number of cells. I found that 0.14/number_of_cells_in_each_direction is a good estimate. This is a completely empirical formula, although it can be traced to the CFL condition.

SOME COMMENTS ON THE ALGORITHM


3D ARRAYS

For performance reason, the global arrays are arranged so that the index for the working direction is always the first index of the arrays, thus permitting to perform the whole 1D hydro in cache. This requires the use of 3 copies of the 3D arrays. At the end of the remap phase (TF_PROJECTION), the global arrays are written back to the 3D array appropriate for next sub-cycle.

EXACT_RIEMANN_SOLVER

The "EXACT" version of the Riemann solver uses iterations to solve the problem. It has more branching, and uses more costly operations (divide, square root and power function).

MPI Usage

MPI usage is primarily MPI_SEND with corresponding MPI_WAIT and MPI_IRECV. Typical message sizes are 120000 or 60000. The code has been modified to run at rank counts that show generally increasing performance. What this means is that if you ask for 35 ranks, 3 of the ranks will idle and 32 will continue on processing. The optimal rank counts are: 1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 27 30 32 36 40 45 48 50 54 60 64 72 75 80 90 96 100 108 120 125 128 144 150 160 180 192 200 216 240 250 256 384 480 512 576 720 768 960 1024 1152 1344 1536 1728 1920 2048 2112 2304 4096 8192

Input Description

TF_input_file_description describes the contents of the input file TF_input. TF_input describes the grid size.

Output Description

Two files are produced, the first tera_tf.out is a text file which summarizes the computations. The second file TF_plot_1D contains a list of values computed by the program and could be used to generate a plot.

Programming Language

All Fortran90

Known portability issues


Version and Licensing

*     TERA_TF Benchmark
*     Copyright (c) 2003
*
*  This software and all files related to it, is part of the CEA/DAM Benchmark 
*  suite. Permission to use it is hereby granted to the company to which 
*  it was provided. This software should not be used for any other purpose 
*  than benchmarking, evaluation, performance and functionality assessment.
*  The authors and CEA/DAM make no representations as to the suitability of this
*  software for any other purpose, and will not be liable for any 
*  damage resulting from its use.
*  It is provided "as is", without any expressed or implied warranty.
*

References


Last updated: February 5, 2007