From: Pang, Yui Tik (andrewpang_at_gatech.edu)
Date: Wed Jan 15 2020 - 13:58:47 CST
Hi Alexey,
I am not familiar with sbatch, but it seems like you are using a special script "ompi" to launch your MPI jobs. I would suggest you to contact your cluster support team to figure out the correct way to launch NAMD on the specific cluster. Given the build you are using is verbs-smp, I believe this paragraph I copied from "notes.txt" will be most helpful for you and your cluster support:
For MPI-based SMP builds one would specify any mpiexec options needed
for the required number of processes and pass +ppn to the NAMD binary as:
mpiexec -n 4 namd2 +ppn 3 <configfile>
MPI-based SMP builds have worse performance than verbs or ibverbs and
are not recommended, particularly for GPU-accelerated builds.
Hope it helps!
Best,
Andrew
________________________________
From: owner-namd-l_at_ks.uiuc.edu <owner-namd-l_at_ks.uiuc.edu> on behalf of Alexei Rossokhin <alrossokhin_at_REMOVE_yahoo.com>
Sent: Wednesday, January 15, 2020 5:44 AM
To: namd-l_at_ks.uiuc.edu <namd-l_at_ks.uiuc.edu>
Subject: Re: namd-l: log file
Hi Andrew,
Thank you for your attempt to help me.
>Did you see one line of that "Info: Running on ..." line, or multiple instances? If you see multiple of them, it is very likely you have started multiple instance of NAMD accidentally.
No, I don't have such line in my log files.
With "Running on ... " I have the following message
Charm++> Running on 1 unique compute nodes (28-way SMP).
Charm++> cpu topology info is gathered in 0.019 seconds.
Charm++> Running on 1 unique compute nodes (28-way SMP).
Charm++> Running on 1 unique compute nodes (28-way SMP).
Info: NAMD 2.11 for Linux-x86_64-verbs-smp
Below is a command line I use to start NAMD calculation
sbatch -n 16 --time=0-22:30:00 ompi --bind-to none namd2 <configfile>
Is it right?
Thank you.
Alexey
On Wednesday, January 15, 2020, 3:15:37 AM GMT+3, Pang, Yui Tik <andrewpang_at_gatech.edu> wrote:
Hi Alexey,
We will need more information to help you out. Did you see one line of that "Info: Running on ..." line, or multiple instances? If you see multiple of them, it is very likely you have started multiple instance of NAMD accidentally.
Please be aware that NAMD built for different platforms should be launched in different ways. For example, if you are using Linux-x86_64-multicore built, you should launch NAMD by:
namd2 +p<procs> <configfile>
However, if the NAMD built you have is Linux-x86_64-ibverbs, you should launch it by:
charmrun namd2 ++local +p<procs> <configfile>
If you are using Linux-x86_64-verbs instead, the command to use is:
charmrun +p<procs> ++mpiexec namd2 <configfile>
(All the above examples assumed you are running NAMD on a single Linux node.)
Please refer to "notes.txt" that come with the NAMD binaries for details. If you are confused which NAMD built you are, you should able to find it by looking at the first "Info:" line of your log file. In my case, it wrote: "Info: NAMD 2.11 for Linux-x86_64-ibverbs".
Let us know if you still have difficulties.
Andrew
________________________________
From: owner-namd-l_at_ks.uiuc.edu <owner-namd-l_at_ks.uiuc.edu> on behalf of Alexei Rossokhin <alrossokhin_at_REMOVE_yahoo.com>
Sent: Tuesday, January 14, 2020 11:10 AM
To: namd-l_at_ks.uiuc.edu <namd-l_at_ks.uiuc.edu>
Subject: Re: namd-l: log file
Hi,
thanks for the quick response.
I have the following lines in my log file
Info: Running on 1 processors, 1 nodes, 1 physical nodes.
and I am not trying to launche multiple copies.
Alexey
On Tuesday, January 14, 2020, 6:24:27 PM GMT+3, Giacomo Fiorin <giacomo.fiorin_at_gmail.com> wrote:
I usually get this when I have multiple NAMD instances writing to the same file. Can you check how you launched it to make sure that it's consistent with what you want? Look for this line in the output:
Info: Running on XX processors, YY nodes, ZZ physical nodes.
If you are explicitly running multiple copies with +replicas, use the +stdout flag as well.
Giacomo
On Tue, Jan 14, 2020 at 9:39 AM Alexei Rossokhin <alrossokhin_at_remove_yahoo.com<mailto:alrossokhin_at_remove_yahoo.com>> wrote:
Dear NAMD experts,
can anybody expalin me why at the minimzation/anneling process I got such jumps step by step in my log file (for exmple, see below 34200, 45920, 43560, 33800, 43880). Thank you in advance.
Alexey
PRESSURE: 34200 -131.805 -3.38526 -181.954 -114.61 -17.7996 60.7462 -131.747 0.842376 119.959
GPRESSURE: 34200 -77.9873 37.8937 -173.265 -145.494 6.57063 67.6804 -141.456 25.7605 147.873
ENERGY: 34200 4172.4443 15131.3118 23859.0366 488.5060 -496028.8436 35114.8672 1045.4922 0.0000 33364.0599 -382853.1256 112.7096 -416217.1855 -382641.2713 112.7096 -9.8818 25.4854 1375535.7696 -9.8818 25.4854
PRESSURE: 45920 -92.5175 -16.0369 -41.5325 -83.8353 -152.168 28.1227 -33.9996 28.2141 151.485
GPRESSURE: 45920 -91.4706 20.6981 -30.7003 -84.405 -91.3729 41.7199 -36.1052 11.5266 201.678
PRESSAVG: 45920 0.875517 3.18486 -120.187 -57.3614 -46.7854 -5.78555 -104.491 -3.40833 13.9777
GPRESSAVG: 45920 0.380058 -1.93937 -119.743 -54.8274 -50.7971 -5.32708 -103.372 -4.36064 18.9772
TIMING: 45920 CPU: 71613.8, 2.38445/step Wall: 71590.9, 2.3836/step, 0.185391 hours remaining, 747.523438 MB of memory in use.
ENERGY: 45920 4580.4501 16499.6773 24046.8464 529.9106 -491740.4083 33722.1165 1058.4929 0.0000 39177.6548 -372125.2597 132.3489 -411302.9145 -371873.7401 132.4044 -31.0671 6.2783 1367961.6321 -10.6441 -10.4799
PRESSURE: 43560 96.8541 -39.7641 -25.2649 5.00658 318.426 -63.9513 0.783388 -93.6069 -11.7887
GPRESSURE: 43560 157.321 -23.8416 -68.6609 30.0552 339.915 -16.382 17.7373 -129.45 8.88396
PRESSAVG: 43560 -47.9912 -34.4189 28.8564 -27.2364 84.094 -61.7098 30.6092 -50.0621 -28.549
GPRESSAVG: 43560 -43.7836 -29.9769 31.0341 -28.0596 82.7295 -58.5285 32.8886 -54.4655 -32.2422
TIMING: 43560 CPU: 71610.6, 1.92252/step Wall: 71590, 1.92194/step, 0.128129 hours remaining, 746.316406 MB of memory in use.
ENERGY: 43560 4541.2762 16094.5270 23995.0947 520.0559 -492124.8618 33756.3212 1090.0505 0.0000 38174.0456 -373953.4907 128.9586 -412127.5363 -373709.6920 128.9583 134.4972 168.7067 1368951.1044 2.5179 2.2346
PRESSURE: 33800 -79.6388 132.96 160.817 204.468 -85.1759 23.1872 140.642 38.0828 42.6884
GPRESSURE: 33800 -47.9999 126.61 186.276 204.676 -43.6118 14.7897 122.071 91.1083 78.5532
PRESSAVG: 33800 24.3857 141.494 33.4736 188.615 -43.053 64.7807 6.54744 121.856 13.4976
GPRESSAVG: 33800 24.3734 147.377 32.1721 186.81 -43.3077 67.3887 10.6156 119.282 14.0064
TIMING: 33800 CPU: 71621.1, 2.37152/step Wall: 71595.6, 2.37067/step, 0.263408 hours remaining, 731.367188 MB of memory in use.
ENERGY: 33800 4120.2594 15040.2501 23814.4334 493.7457 -496069.9001 34946.4178 1063.4852 0.0000 33246.9564 -383344.3523 112.3140 -416591.3087 -383135.0758 112.0731 -40.7088 -4.3528 1371890.3038 -1.7232 -1.6426
PRESSURE: 43880 131.555 -126.426 97.3805 -65.9241 81.7644 139.926 -4.57359 -68.5444 -43.4502
GPRESSURE: 43880 181.451 -122.94 102.786 -58.7188 109.383 115.383 10.557 -64.6083 -17.0218
PRESSAVG: 43880 3.52469 -101.27 36.8295 -13.2392 -11.7467 121.355 25.0006 -36.616 -18.0008
GPRESSAVG: 43880 5.11272 -101.453 33.2715 -12.502 -10.8696 124.847 25.92 -41.2756 -21.1324
TIMING: 43880 CPU: 71621.2, 1.92101/step Wall: 71598.1, 1.92032/step, 0.27738 hours remaining, 745.691406 MB of memory in use.
ENERGY: 43880 4504.5678 16140.9938 24031.9612 519.6740 -492895.7751 34264.549
-- Giacomo Fiorin Associate Professor of Research, Temple University, Philadelphia, PA Research collaborator, National Institutes of Health, Bethesda, MD http://goo.gl/Q3TBQU https://github.com/giacomofiorin
This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:08 CST