Re: question to run namd2 with multiple CPUs

From: Renfro, Michael (Renfro_at_tntech.edu)
Date: Thu Mar 05 2020 - 10:25:21 CST

If we’re limiting discussion to the original goal of 1 node, 16 cores, and no GPUs, would the multicore instructions at [1] be simpler? That would just be a:

  namd2 +p16 pc.conf

using the Linux-x85_64-multicore binary from [2].

[1] https://www.ks.uiuc.edu/Research/namd/2.9/ug/node79.html
[2] https://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD

> On Mar 2, 2020, at 3:30 PM, Josh Vermaas <joshua.vermaas_at_gmail.com> wrote:
>
> Hi Tao,
>
> If I compile NAMD appropriately (MPI backend), I find that I can just use the usual mpirun to run NAMD, and its a bunch less tricky to setup than charmrun. The exception is that if you want to run on GPUs, you can't have an MPI backend and still get good performance, which is why that option is disabled. Here is what I used to compile on CPU-based systems:
>
> tar -zxf NAMD_2.13_Source.tar.gz
> cd NAMD_2.13_Source
> tar -xf charm-6.8.2.tar
> cd charm-6.8.2
> #Now start the charmc build.
> #This is interactive, at lets you pick and choose options. For MPI, you want to let it use the mpicc it finds in the path
> #./build
> #The equivalent options are here:
> ./build charm++ mpi-linux-x86_64 mpicxx -j8
> cd ..
>
> #Download fftw/tcl libraries and move them. This is optional, but you have to change the arch files appropriately if you have your libraries in a different location than is the default. I always do.
> tar xzf fftw-linux-x86_64.tar.gz
> mv linux-x86_64 fftw
> wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-linux-x86_64.tar.gz
> wget http://www.ks.uiuc.edu/Research/namd/libraries/tcl8.5.9-linux-x86_64-threaded.tar.gz
> tar xzf tcl8.5.9-linux-x86_64.tar.gz
> tar xzf tcl8.5.9-linux-x86_64-threaded.tar.gz
> mv tcl8.5.9-linux-x86_64 tcl
> mv tcl8.5.9-linux-x86_64-threaded tcl-threaded
>
> ./config Linux-x86_64-g++ --charm-arch mpi-linux-x86_64-mpicxx
> #Compile NAMD.
> cd Linux-x86_64-g++
> make -j8
> #Now all the binaries should be built. Move them to wherever you want!
>
> -Josh
>
>
> On Mon, Mar 2, 2020 at 4:08 PM Yu, Tao <tao.yu.1_at_und.edu> wrote:
> Josh,
>
> Thank you so much!
>
> After module load mpi and openmpi, we tried using
> /share/apps/namd/namd2.13/charmrun +p16 ++mpiexec ++remote-shell mpirun /share/apps/namd/namd2.13/namd2 pc.conf
> ,
> but still did not work.
> Here is the error we got in the output file.
> Info: *****************************
> Info: Reading from binary file 300.restart.coor
> Info:
> Info: Entering startup at 8.05077 s, 222.184 MB of memory in use
> Info: Startup phase 0 took 0.000228167 s, 222.184 MB of memory in use
> [0] wc[0] status 9 wc[i].opcode 0
> [0] Stack Traceback:
> [0:0] [0x145023d]
> [0:1] [0x143c4fd]
> [0:2] [0x1462fb2]
> [0:3] [0x1458f82]
> [0:4] [0x53a5e0]
> [0:5] [0xcfa97e]
> [0:6] [0xed3ca0]
> [0:7] [0xd77f12]
> [0:8] [0xd77581]
> [0:9] [0x127f077]
> [0:10] [0x1459735]
> [0:11] [0xea5fcf]
> [0:12] TclInvokeStringCommand+0x88 [0x14b5338]
> [0:13] [0x14b7f57]
> [0:14] [0x14b9372]
> [0:15] [0x14b9b96]
> [0:16] [0x151bd41]
> [0:17] [0x151befe]
> [0:18] [0xe99ad0]
> [0:19] [0x517421]
> [0:20] __libc_start_main+0xf5 [0x2aaaabbc8545]
> [0:21] _ZNSt8ios_base4InitD1Ev+0x61 [0x40fc69]
> "pc.log" 186L, 6779C 186,3 Bot
>
>
> I think the issue is that our local cluster does not have a correct environment to use mpiexec.
> Any thoughts on this?
>
> Meanwhile, our cluster manager is plan to compile the source code. My question is that after compiling the source code, to run namd2, should I use mpirun or charm run?
>
>
>
> Best,
>
> Tao
> From: Josh Vermaas <joshua.vermaas_at_gmail.com>
> Sent: Friday, February 28, 2020 4:29 PM
> To: namd-l_at_ks.uiuc.edu <namd-l_at_ks.uiuc.edu>; Yu, Tao <tao.yu.1_at_und.edu>
> Subject: Re: namd-l: question to run namd2 with multiple CPUs
>
> Hi Tao,
>
> The default mpirun with no arguments will only start 1 mpi rank, so that's at least part of the problem. The other part of the problem is that the smp builds don't actually use mpi at all, and because of that they need to be started with charmrun. See https://www.ks.uiuc.edu/Research/namd/2.13/ug/node98.html for more information.
>
> I *think* what you will settle on is something like this:
> /share/apps/namd/namd2.13/charmrun +p16 ++mpiexec ++remote-shell mpirun /share/apps/namd/namd2.13/namd2 pc.conf
> Josh
>
> On Fri, Feb 28, 2020, 5:17 PM Yu, Tao <tao.yu.1_at_und.edu> wrote:
> Hi,
>
> Our university just installed the Linux-x86_64-ibverbs-smp (InfiniBand plus shared memory, no MPI needed) on our local cluster. There is no problem to run with a single core. But when I started to test running with 16 cores, the output indicated it was still using "1" cpu.
>
> I tried to submit the job either with or without mpirun, but the result is the same. One 1 cpu was used.
>
> Please give me some help here.
>
> The script I was using was attached in below
>
> #!/bin/bash
> #####Number of nodes
> #SBATCH --nodes=1
> #SBATCH --partition=talon
> #SBATCH --ntasks-per-node=16
> #SBATCH --workdir=.
> #####SBATCH -o slurm_run_%j_output.txt
> #####SBATCH -e slurm_run_%j_error.txt
> #SBATCH -s
> #SBATCH --time=00:10:00
>
> cd $SLURM_SUBMIT_DIR
> srun -n $SLURM_NTASKS hostname | sort -u > $SLURM_JOB_ID.hosts
>
> module load intel/mpi/64
>
> mpirun /share/apps/namd/namd2.13/namd2 pc.conf > pc.log
> **************************************************************************
>
> The output where indicated "1" cpu was using:
>
> ENERGY: 0 482.9967 1279.0637 1052.3094 78.2784 -71938.7727 7732.6846 0.0000 0.0000 11869.2504 -49444.1894 298.3828 -61313.4398 -49399.0775 298.3828 79.3838 81.2849 193787.4555 79.3838 81.2849
>
> OPENING EXTENDED SYSTEM TRAJECTORY FILE
> LDB: ============= START OF LOAD BALANCING ============== 2.06838
> LDB: ============== END OF LOAD BALANCING =============== 2.06856
> LDB: =============== DONE WITH MIGRATION ================ 2.06971
> LDB: ============= START OF LOAD BALANCING ============== 8.05131
> LDB: ============== END OF LOAD BALANCING =============== 8.05246
> LDB: =============== DONE WITH MIGRATION ================ 8.05289
> Info: Initial time: 1 CPUs 0.150798 s/step 0.872672 days/ns 236.934 MB memory
> LDB: ============= START OF LOAD BALANCING ============== 9.55101
> LDB: ============== END OF LOAD BALANCING =============== 9.55108
> LDB: =============== DONE WITH MIGRATION ================ 9.5515
> LDB: ============= START OF LOAD BALANCING ============== 15.3206
> LDB: ============== END OF LOAD BALANCING =============== 15.3217
> LDB: =============== DONE WITH MIGRATION ================ 15.3221
> Info: Initial time: 1 CPUs 0.144713 s/step 0.837461 days/ns 237.516 MB memory
> LDB: ============= START OF LOAD BALANCING ============== 22.4754
> LDB: ============== END OF LOAD BALANCING =============== 22.4766
> LDB: =============== DONE WITH MIGRATION ================ 22.477
> Info: Initial time: 1 CPUs 0.143573 s/step 0.830862 days/ns 237.516 MB memory
>

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:08 CST