549.fotonik3d_r
Ulf Andersson ulfa [at] pdc [dot] kth [dot] se
Computational Electromagnetics (CEM)
Fotonik3D computes the transmission coefficient of a photonic waveguide using the finite-difference time-domain (FDTD) method for the Maxwell equations. UPML for dielectric materials is used to terminate the computational domain.
The core of the FDTD method is second-order accurate central-difference approximations of the Faraday's and Ampere's laws. These central-differences are employed on a staggered Cartesian grid resulting in an explicit finite-difference method. The FDTD method is also referred to as the Yee scheme. It is the standard time-domain method within CEM.
The code consists of three steps, initialization, time-stepping and wrap-up. More than 99% of the time is spent in the time-stepping. Each time step is identical to all the others. The majority of the time is spent in five routines:
The FDTD-updates of the electric field during the time-stepping is done in the module material_mod while the FDTD-updates of the magnetic fields are done in the module update_mod.
The excitation of the code is a 2D (x and z) cross section of the computational domain. A precomputed Single TE mode is read from file and multiplied with a time dependent pulse:
Ex(:,y_index,:) = Ex(:,y_index,:) + pulse(t)*Single_TE_mode(:,:)
The computation of the excitation takes very little time.
The module power_mod performs the computation of the power flow.
During the initialization, two files containing a list of twinkles are read. A twinkles is one side of an FDTD-cell. These lists defines the two power planes through which we will compute the power flow. The input files also defines for which frequencies the power flow shall be computed.
During the time-stepping a DFT is computed for the perpendicular components (x and z) of the interpolated electric and magnetic fields at the midpoint of each twinkle in the power plane.
After time-stepping the power flow, i.e., Poynting's vector, is computed for each twinkle. Then the contribution from all the twinkles are summed for each frequency. This is written to an output file for both power planes. The transmission coefficient for each frequency can then be computed by a post-processing program. (In the real application we need to take more time steps in order to compute the power flow accurately.)
All input files are ASCII files.
yee.dat is the main input file. Inputs in this file must come in a specific order. For full details, see the source to 'init.F90'. Among the fields are:
power1.dat and power2.dat define the two power planes were the power flow shall be computed:
These files define the frequencies for which to compute the power flow. For details on these two files, see the source file 'power.F90'. Among the fields are:
OBJ.dat describes the photonic waveguide. This file starts with a short list of values for relative permittivity (epsilon_r). It then assigns one of these epsilon_r-values to each electric component of every cell.
PSI.dat contains the definition of the single TE mode used for the Plane Source excitation. This file defines the location (its y-value) of the Plane Source and contains a pointer to a file, TEwaveguide.m, containing the description of the single TE mode. This TE mode has been precomputed by another code. It contains (nx+1)*(nz+1) values.
SPEC® provides 4 workloads: test, train, refrate, and refspeed, with these characteristics:
The output ASCII-file, 'pscyee.out', contains the power values for each frequency for the two power planes. It also contains the values of Ec which can be used for normalization. The values are validated by comparing them to a SPEC-provided set of expected outputs.
Various progress information is written to standard output, which may be useful when debugging, especially if the benchmark is run directly from the command line. When run under the control of the SPEC tools, standard out is captured to <benchmark_name>.log. This file is not validated.
Fortran 95 + OpenMP
Some calculations generate 'subnormal' numbers (wikipedia) which may cause slower operation than normal numbers on some hardware platforms. On such platforms, performance may be improved if "flush to zero on underflow" (FTZ) is enabled. During SPEC's testing of Fotonik3d, the output validated correctly whether or not FTZ was enabled.
Verification errors with GCC -Ofast -march=native
It has been reported that with gfortran -Ofast -march=native verification errors may be seen, for example:
**************************************** *** Miscompare of pscyee.out; for details see /data2/johnh/out.v1.1.5/benchspec/CPU/549.fotonik3d_r/run/run_base_refrate_Ofastnative.0000/pscyee.out.mis 0646: -1.91273086037953E-17, -1.46491401919706E-15, -1.91273086057460E-17, -1.46491401919687E-15, ^ 0668: -1.91251317582607E-17, -1.42348205527085E-15, -1.91251317602571E-17, -1.42348205527068E-15, ^
The errors may occur with other compilers as well, depending on your particular compiler version, hardware platform, and optimization options.
The problem arises when a compiler chooses to vectorize a particular loop from power.F90 line number 369
369 do ifreq = 1, tmppower%nofreq 370 frequency(ifreq,ipower) = freq 371 freq = freq + freqstep 372 end do
Unfortunately, the vectorized loop produces slightly different values than are allowed by the SPEC-defined tolerances.
Workaround: You will need to specify optimization options that do not cause this loop to be vectorized. For example, on a particular platform studied in mid-2020 using GCC 10.2, these results were seen:
failed -Ofast -march=native OK -Ofast OK -O3 -march=native OK -Ofast -march=native -fno-tree-loop-vectorize OK -Ofast -march=native -fno-unsafe-math-optimization
If you apply one of the above workarounds in base, be sure to obey the same-for-all rule which requires that all benchmarks in a suite of a given language must use the same flags. For example, the sections below turn off unsafe math optimizations for all Fortran modules in the floating point rate and floating point speed benchmark suites:
default=base: OPTIMIZE = -Ofast -flto -march=native fprate,fpspeed=base: FOPTIMIZE = -fno-unsafe-math-optimizations
Ulf Andersson, Min Qiu, and Ziyang Zhang, Parallel Power Computation for Photonic Crystal Devices, Methods and Applications of Analysis, 07/2006; 13(2):149-156. DOI: 10.4310/MAA.2006.v13.n2.a3, PDF available from: www.researchgate.net/publication/228405514
Torleif Martin, Broadband Electromagnetic Scattering and Shielding Analysis using the Finite Difference Time Domain Method, Linköping 2001, ISBN 91-7219-914-8.
S.D. Gedney (1996), An anisotropic perfectly matched layer absorbing media for the truncation of FDTD latices, IEEE Transactions on Antennas and Propagation, Vol. 44 (12): pp. 1630-1639. Bibcode: 1996ITAP...44.1630G. DOI:10.1109/8.546249.
Taflove (ed.), Advances in Computational Electrodynamics, Sect. 5.4-5.9, 1998
A. Taflove and S. C. Hagness, Computational Electrodynamics: The Finite-Difference Time-Domain Method, 3rd ed., Norwood, MA: Artech House, 2005.
A. Taflove, A. Oskooi, and S. G. Johnson, eds., Advances in FDTD Computational Electrodynamics: Photonics and Nanotechnology. Norwood, MA: Artech House, 2013.
Copyright © 2017-2019 Standard Performance Evaluation Corporation (SPEC®)