To set up the problem, one needs to define the number of cells in each
direction, and the number of blocks in each direction. Each block corresponds
to an MPI task. For each direction, the product of the number of cells per block
by the number of blocks must be equal to the number of cells per direction, as defined
in the grid definition part of the input file. Although the global number of cells
should be the same in each direction (it can differ slightly), the number of blocks
could be different. For example, with 60 processors, the problem could be run with
240 cells per direction, and 3x4x5 MPI blocks. In that case, each block would have
size 80x60x48, which amounts to a little less than 1/4 million cells per procesor.
All included datasets are defined to have between 200 and 240 thousand cells per
processor.
The initial timestep must vary according to the number of cells. I found that
0.14/number_of_cells_in_each_direction is a good estimate. This is a completely
empirical formula, although it can be traced to the CFL condition.
SOME COMMENTS ON THE ALGORITHM
3D ARRAYS
For performance reason, the global arrays are arranged so that the index for
the working direction is always the first index of the arrays, thus permitting
to perform the whole 1D hydro in cache. This requires the use of 3 copies of
the 3D arrays. At the end of the remap phase (TF_PROJECTION), the global arrays
are written back to the 3D array appropriate for next sub-cycle.
EXACT_RIEMANN_SOLVER
The "EXACT" version of the Riemann solver uses iterations to
solve the problem. It has more branching, and uses more costly operations
(divide, square root and power function).
MPI Usage
MPI usage is primarily MPI_SEND with corresponding MPI_WAIT and MPI_IRECV.
Typical message sizes are 120000 or 60000.
The code has been modified to run at rank counts that show generally increasing performance.
What this means is that if you ask for 35 ranks,
3 of the ranks will idle and 32 will continue on processing.
The optimal rank counts are:
1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 27 30 32 36 40 45 48 50
54 60 64 72 75 80 90 96 100 108 120 125 128
144 150 160 180 192 200 216 240 250 256 384 480 512 576 720 768
960 1024 1152 1344 1536 1728 1920 2048 2112 2304 4096 8192
Input Description
TF_input_file_description describes the contents of the input file TF_input.
TF_input describes the grid size.
Output Description
Two files are produced, the first
tera_tf.out is a text file which summarizes the computations.
The second file TF_plot_1D contains a list of values computed by the program and could be used to generate a plot.
Programming Language
All Fortran90
Known portability issues
Version and Licensing
* TERA_TF Benchmark
* Copyright (c) 2003
*
* This software and all files related to it, is part of the CEA/DAM Benchmark
* suite. Permission to use it is hereby granted to the company to which
* it was provided. This software should not be used for any other purpose
* than benchmarking, evaluation, performance and functionality assessment.
* The authors and CEA/DAM make no representations as to the suitability of this
* software for any other purpose, and will not be liable for any
* damage resulting from its use.
* It is provided "as is", without any expressed or implied warranty.
*
References
Last updated: February 5, 2007