From: Josh Vermaas (joshua.vermaas_at_gmail.com)
Date: Fri Feb 28 2020 - 16:29:09 CST
Hi Tao,
The default mpirun with no arguments will only start 1 mpi rank, so that's
at least part of the problem. The other part of the problem is that the smp
builds don't actually use mpi at all, and because of that they need to be
started with charmrun. See
https://www.ks.uiuc.edu/Research/namd/2.13/ug/node98.html for more
information.
I *think* what you will settle on is something like this:
/share/apps/namd/namd2.13/charmrun +p16 ++mpiexec ++remote-shell
mpirun /share/apps/namd/namd2.13/namd2 pc.conf
Josh
On Fri, Feb 28, 2020, 5:17 PM Yu, Tao <tao.yu.1_at_und.edu> wrote:
> Hi,
>
> Our university just installed the Linux-x86_64-ibverbs-smp
> <https://www.ks.uiuc.edu/Development/Download/download.cgi?UserID=&AccessCode=&ArchiveID=1546>
> (InfiniBand plus shared memory, no MPI needed) on our local cluster.
> There is no problem to run with a single core. But when I started to test
> running with 16 cores, the output indicated it was still using "1" cpu.
>
> I tried to submit the job either with or without mpirun, but the result is
> the same. One 1 cpu was used.
>
> Please give me some help here.
>
> The script I was using was attached in below
>
> #!/bin/bash
> #####Number of nodes
> #SBATCH --nodes=1
> #SBATCH --partition=talon
> #SBATCH --ntasks-per-node=16
> #SBATCH --workdir=.
> #####SBATCH -o slurm_run_%j_output.txt
> #####SBATCH -e slurm_run_%j_error.txt
> #SBATCH -s
> #SBATCH --time=00:10:00
>
> cd $SLURM_SUBMIT_DIR
> srun -n $SLURM_NTASKS hostname | sort -u > $SLURM_JOB_ID.hosts
>
> module load intel/mpi/64
>
> mpirun /share/apps/namd/namd2.13/namd2 pc.conf > pc.log
> **************************************************************************
>
> The output where indicated "1" cpu was using:
>
> ENERGY: 0 482.9967 1279.0637 1052.3094
> 78.2784 -71938.7727 7732.6846 0.0000 0.0000
> 11869.2504 -49444.1894 298.3828 -61313.4398
> -49399.0775 298.3828 79.3838 81.2849
> 193787.4555 79.3838 81.2849
>
> OPENING EXTENDED SYSTEM TRAJECTORY FILE
> LDB: ============= START OF LOAD BALANCING ============== 2.06838
> LDB: ============== END OF LOAD BALANCING =============== 2.06856
> LDB: =============== DONE WITH MIGRATION ================ 2.06971
> LDB: ============= START OF LOAD BALANCING ============== 8.05131
> LDB: ============== END OF LOAD BALANCING =============== 8.05246
> LDB: =============== DONE WITH MIGRATION ================ 8.05289
> Info: Initial time: 1 CPUs 0.150798 s/step 0.872672 days/ns 236.934 MB
> memory
> LDB: ============= START OF LOAD BALANCING ============== 9.55101
> LDB: ============== END OF LOAD BALANCING =============== 9.55108
> LDB: =============== DONE WITH MIGRATION ================ 9.5515
> LDB: ============= START OF LOAD BALANCING ============== 15.3206
> LDB: ============== END OF LOAD BALANCING =============== 15.3217
> LDB: =============== DONE WITH MIGRATION ================ 15.3221
> Info: Initial time: 1 CPUs 0.144713 s/step 0.837461 days/ns 237.516 MB
> memory
> LDB: ============= START OF LOAD BALANCING ============== 22.4754
> LDB: ============== END OF LOAD BALANCING =============== 22.4766
> LDB: =============== DONE WITH MIGRATION ================ 22.477
> Info: Initial time: 1 CPUs 0.143573 s/step 0.830862 days/ns 237.516 MB
> memory
>
>
This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:08 CST