% vmd -dispdev text -e myscript.vmd Info) VMD for LINUXAMD64, version 1.9.3 (November 30, 2016) Info) http://www.ks.uiuc.edu/Research/vmd/ Info) Email questions and bug reports to vmd@ks.uiuc.edu Info) Please include this reference in published work using VMD: Info) Humphrey, W., Dalke, A. and Schulten, K., `VMD - Visual Info) Molecular Dynamics', J. Molec. Graphics 1996, 14.1, 33-38. Info) ------------------------------------------------------------- Info) Multithreading available, 64 CPUs detected. Info) CPU features: SSE2 AVX FMA Info) Free system memory: 61GB (96%) Info) No CUDA accelerator devices available. Info) Dynamically loaded 2 plugins in directory: Info) /Projects/vmd/pub/linux64/lib/vmd193/plugins/LINUXAMD64/molfile vmd >
The GLX-based Pbuffer feature is normally available in standard graphically-enabled compilations of VMD for Linux/Unix. VMD GLX off-screen rendering requires that an X-Windows server be running and that the DISPLAY environment variable is set to the correct server hostname and display. The GLX pbuffer feature is most appropriate when running VMD on a user's own desktop workstation, since the same user would likely control the active windowing system.
To use VMD with off-screen graphics, the -dispdev openglpbuffer flag is added to the VMD launch command, as shown below:
% vmd -dispdev openglpbuffer -e myscript.vmd Info) VMD for LINUXAMD64, version 1.9.3 (November 30, 2016) Info) http://www.ks.uiuc.edu/Research/vmd/ Info) Email questions and bug reports to vmd@ks.uiuc.edu Info) Please include this reference in published work using VMD: Info) Humphrey, W., Dalke, A. and Schulten, K., `VMD - Visual Info) Molecular Dynamics', J. Molec. Graphics 1996, 14.1, 33-38. Info) ------------------------------------------------------------- Info) Multithreading available, 40 CPUs detected. Info) CPU features: SSE2 AVX AVX2 FMA Info) Free system memory: 411GB (81%) Info) Creating CUDA device pool and initializing hardware... Info) Detected 3 available CUDA accelerators: Info) [0] Quadro M6000 24GB 24 SM_5.2 @ 1.11 GHz, 24GB RAM, AE2, ZCP Info) [1] Quadro M6000 24GB 24 SM_5.2 @ 1.11 GHz, 24GB RAM, AE2, ZCP Info) [2] Quadro M6000 24GB 24 SM_5.2 @ 1.11 GHz, 24GB RAM, AE2, ZCP Info) OpenGL Pbuffer size: 4096x2400 Info) OpenGL renderer: Quadro M6000 24GB/PCIe/SSE2 Info) Features: MSAA(4) MDE CVA MTX NPOT PP PS GLSL(OVFGS) Info) Full GLSL rendering mode is available. Info) Textures: 2-D (16384x16384), 3-D (4096x4096x4096), Multitexture (4) Info) Created GLX OpenGL Pbuffer for off-screen rendering Info) Detected 3 available TachyonL/OptiX ray tracing accelerators Info) Compiling 1 OptiX shaders on 3 target GPUs...
In cases where it is inconvenient or not possible to launch a windowing system such as on large-scale HPC systems, VMD supports the use of EGL-based Pbuffer rendering through compilation with a special OpenGL runtime dispatch library. As a result of the current need for special compilation, EGL-enabled versions of VMD are made available separaely from conventional VMD builds, and in many cases the user may need to compile VMD from source since EGL is often used in conjunction with MPI-enabled builds of VMD for parallel rendering, as outlined below.
% vmd -dispdev openglpbuffer -e myscript.vmd Info) VMD for LINUXAMD64, version 1.9.3 (December 1, 2016) Info) http://www.ks.uiuc.edu/Research/vmd/ Info) Email questions and bug reports to vmd@ks.uiuc.edu Info) Please include this reference in published work using VMD: Info) Humphrey, W., Dalke, A. and Schulten, K., `VMD - Visual Info) Molecular Dynamics', J. Molec. Graphics 1996, 14.1, 33-38. Info) ------------------------------------------------------------- Info) Multithreading available, 40 CPUs detected. Info) CPU features: SSE2 AVX AVX2 FMA Info) Free system memory: 411GB (81%) Info) Creating CUDA device pool and initializing hardware... Info) Detected 3 available CUDA accelerators: Info) [0] Quadro M6000 24GB 24 SM_5.2 @ 1.11 GHz, 24GB RAM, AE2, ZCP Info) [1] Quadro M6000 24GB 24 SM_5.2 @ 1.11 GHz, 24GB RAM, AE2, ZCP Info) [2] Quadro M6000 24GB 24 SM_5.2 @ 1.11 GHz, 24GB RAM, AE2, ZCP Info) EGL: node[0] bound to display[0], 3 displays total Info) EGL version 1.4 Info) OpenGL Pbuffer size: 4096x2400 Info) OpenGL renderer: Quadro M6000 24GB/PCIe/SSE2 Info) Features: STENCIL MSAA(4) MDE CVA MTX NPOT PP PS GLSL(OVFGS) Info) Full GLSL rendering mode is available. Info) Textures: 2-D (16384x16384), 3-D (4096x4096x4096), Multitexture (4) Info) Created EGL OpenGL Pbuffer for off-screen rendering Info) Detected 3 available TachyonL/OptiX ray tracing accelerators Info) Compiling 256 OptiX shaders on 3 target GPUs...
Both GLX- and EGL-based off-screen OpenGL Pbuffer rendering support all of the advanced OpenGL features used by VMD, such as programmable shading, multisample antialiasing, and 3-D texture mapping. One area where they behave differently from traditional windowed-OpenGL is that they have a fixed maximum framebuffer resolution, which defaults to 4096 2400. The maximum framebuffer size can be increased beyond this resolution by setting the VMDSCRSIZE environment variable to the maximum framebuffer resolution that might be required during a VMD run. In Bourne/bash shells, this would be done with the command: export VMDSCRSIZE="8192 4096" In C-shell/tcsh shells, this would be done with the command: setenv VMDSCRSIZE "8192 4096"
When compiled with MPI support and launched with the platform-dependent mpirun or site-specific launch commands (e.g., aprun, jsrun, or similar), VMD will automatically initialize MPI internally, and each parallel VMD instance will be assigned a unique MPI rank. During startup, a parallel launch of VMD will print hardware information about each of the participating compute nodes from node 0. When a parallel VMD run exits, all nodes are expected to call exit at the same time so that they shutdown MPI together.
Due to the fact that MPI does not (yet) include a standarized binary interface, MPI support requires that VMD be compiled from source code on the target platform for each MPI implementation to be supported. For example, VMD would have to be compiled separately for each MPI to be supported on the system, e.g., MPICH, OpenMPI, and/or other MPI versions. This means that in the general case, unlike the approach taken by the VMD development team where binary VMD distributions are provided for all mainstream computer and operating system platforms, this is not possible in the context of MPI. Users wishing to use VMD with MPI must compile VMD from source code.
Some MPI implementations require special interactions with batch queueing systems or storage systems, and in such cases it is necessary to modify the standard VMD launcher scripts to perform any extra steps or to invoke any platform-specific parallel launch commands. By modifying the VMD launch script, users can continue to use familiar VMD launch syntax while gaining the benefits of parallel analysis with MPI. The VMD launch script has been modified so that it can automatically recognize cases where VMD has been launched within batch schedulers used on Cray XK and XC supercomputers such as NCSA Blue Waters, ORNL Titan, CSCS Piz Daint, and related systems, where the VMD executable must be launched using the 'aprun' or 'srun' utilities, depending on the scheduling system in use.
MPI-enabled builds of VMD can often also be run on login nodes or interactive visualization nodes that may not managed by the batch scheduler and/or may not support MPI, so long as MPI and other shared libraries used on the compute nodes are also available on the login or interactive visualization nodes. To run an MPI-enabled VMD build outside of MPI, i.e., without 'mpirun', the environment variable VMDNOMPI can be set, which will prevent VMD from calling any MPI APIs during the run, allowing it to behave like a normal non-MPI build for convenience. In most cases, this makes it possible to use a single VMD build on all of the different compute node types on a system.
At present, the interactive console handling used in interactive text interpreters conflicts with the behavior of some mainstream MPI implementations, so when VMD is run in parallel using MPI, the interactive console is disabled and VMD instead reads commands only from script files specified with the "-e" command line argument.
Aside from the special launch behavior and lack of the interactive text console, MPI runs of VMD support high performance graphics with full support for OpenGL via GLX or EGL, and ray tracing with Tachyon, OptiX, and OSPRay.
Here is an example session showing a VMD run performed on the CSCS Piz Daint Cray XC50 supercomputer with NVIDIA Tesla P100 GPU accelerators:
stonej@daint103> srun -C gpu -n 256 --ntasks-per-node=1 \ /users/stonej/local/bin/vmd193 -dispdev text -e rendermovie.tcl on daint103 srun: job 50274 queued and waiting for resources srun: job 50274 has been allocated resources Info) VMD for CRAY_XC, version 1.9.3 (December 15, 2016) Info) http://www.ks.uiuc.edu/Research/vmd/ Info) Email questions and bug reports to vmd@ks.uiuc.edu Info) Please include this reference in published work using VMD: Info) Humphrey, W., Dalke, A. and Schulten, K., `VMD - Visual Info) Molecular Dynamics', J. Molec. Graphics 1996, 14.1, 33-38. Info) ------------------------------------------------------------- Info) Creating CUDA device pool and initializing hardware... Info) Initializing parallel VMD instances via MPI... Info) Found 256 VMD MPI nodes containing a total of 6144 CPUs and 256 GPUs: Info) 0: 24 CPUs, 60.8GB (96%) free mem, 1 GPUs, Name: nid03072 Info) 1: 24 CPUs, 60.8GB (96%) free mem, 1 GPUs, Name: nid03073 Info) 2: 24 CPUs, 60.8GB (96%) free mem, 1 GPUs, Name: nid03074 [...example output omitted...] Info) 253: 24 CPUs, 60.9GB (96%) free mem, 1 GPUs, Name: nid03375 Info) 254: 24 CPUs, 60.9GB (96%) free mem, 1 GPUs, Name: nid03376 Info) 255: 24 CPUs, 60.9GB (96%) free mem, 1 GPUs, Name: nid03377
The parallel gather command allows VMD analysis scripts to gather results from all of the nodes, returning the complete set of per-node results in a new Tcl list. Here is a simple example procedure that shows this principle by gathering up all of the MPI node hostnames and printing them. Note that to avoid redundant output, the script always does output only on node rank 0:
proc testgather { } { set noderank [parallel noderank] # only print messages on node 0 if {$noderank == 0} { puts "Testing parallel gather..." } # Do a parallel gather of all node names set datalist [parallel allgather [parallel nodename]] # only print messages on node 0 if {$noderank == 0} { puts "datalist length: [llength $datalist]" puts "datalist: $datalist" } }
The parallel allreduce command allows VMD to compute a parallel reduction across all MPI ranks, returning the final result to all nodes. Each rank contributes one input to the reduction. The user must provide a Tcl proc that performs the appropriate reduction operation for a pair of data items, resulting in a single item. This approach allows arbitrarily complex reductions on arbitrary data to be devised by the user. The VMD reduction implementation calls the user provided routine in parallel on pairs of arguments exchanged between ranks, with each such call producing a single reduced output value. VMD performs successive parallel reduction operations until it computes the final reduced value that is returned to all ranks.
The example below returns the sum of all of the MPI node ranks:
proc sumreduction { a b } { return [expr $a + $b] } proc testreduction {} { set noderank [parallel noderank] # only print messages on node 0 if {$noderank == 0} { puts "Testing parallel reductions..." } parallel allreduce sumreduction $noderank }
VMD can easily perform parallel rendering of trajectories or other kinds of movies with relatively simple scripting based on the parallel commands above. Try running this simple script in an MPI-based VMD session, which just uses the individual MPI node ranks to render one frame per-node. Be sure to replace ``somedir'' with your own directory:
set noderank [parallel noderank] puts "node $noderank is running ..." parallel barrier mol new /somedir/alanin.pdb waitfor all puts "node $noderank has loaded data" parallel barrier rotate y by [expr $noderank * 20] render TachyonInternal test_node_$noderank.tga puts "node $noderank has rendered a frame" parallel barrier quit
A much more sophisticated (but incomplete) example below shows how the parallel for command can be used along with a user-defined procedure to do larger scale parallel rendering with dynamic load balancing, passing parameters into the user defined procedure, triggering the VMD movie maker plugin and any user-defined per-frame callback that may be active therein. The userdata parameter shown here is used to communicate information necessary for the user-defined worker procedure to interpret the meaning of incoming work indices and take appropriate actions. The userdata parameter enables the user-provided procedure to avoid using global variables or hard-coded implementation details.
proc render_one_frame { frameno userdata } { # retrieve user data rendering workers set formatstr [lindex $userdata 0] set dir [lindex $userdata 1] set renderer [lindex $userdata 2] # Set frame, triggering user-defined movie # callbacks to update the molecular scene # prior to rendering of the frame set ::MovieMaker::userframe $frameno # Regenerate molecular geometry if not up to date display update # generate output filename, and render the frame set fname [format $formatstr $frameno] render $renderer $dir$fname } proc render_movie { dir formatstr framecount renderer } { set userdata {} lappend userdata $formatstr lappend userdata $dir lappend userdata $renderer set lastframe [expr $framecount - 1] parallel for 0 $lastframe render_one_frame $userdata }