4/5 Images running on Climbing Image NEB
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 4
- Joined: Sat Feb 25, 2023 8:57 pm
4/5 Images running on Climbing Image NEB
Hello,
I am trying to run a Climbing Image Nudged Elastic Band Tutorial that I found on github: https://github.com/drinwater/Nudged-Ela ... d-Tutorial.
Here is what I did:
0. made directories 00-06
1. I copied the POSCAR and OUTCAR files from 00 and 06 from the example run to my new folder's 00 and 06 directory
2. copied the INCAR and KPOINT files, created the POTCAR file using cat Pt N O > POTCAR
3. I used vtst's nebmake.pl command to produce the POSCARs for 01-05 (checked them to make sure no overlap)
4. edited the number of cores (I am running on a 40 core, 4 gpu node)
5. I then ran my slurm script (script included in the zip file).
When I run the code, only files 01, 02, 03, and 04 indicate they are running (I.e have files besides POSCAR in them). I made sure my NCORE was an integer multiple of IMAGES (NCORE = 40, IMAGES = 5), so I'm not entirely sure what is going on here.
Another issue, is the vasprun.xml file alone is 12MB for each folder. Would sharing my INCAR be enough? That's the only file I changed compared to the tutorial in the assets/NEB run/run1/ folder. I have attached my slurm file, INCAR, and output of nebef.pl for now, but can add anything else.
Appreciate any help received here Thanks!
I am trying to run a Climbing Image Nudged Elastic Band Tutorial that I found on github: https://github.com/drinwater/Nudged-Ela ... d-Tutorial.
Here is what I did:
0. made directories 00-06
1. I copied the POSCAR and OUTCAR files from 00 and 06 from the example run to my new folder's 00 and 06 directory
2. copied the INCAR and KPOINT files, created the POTCAR file using cat Pt N O > POTCAR
3. I used vtst's nebmake.pl command to produce the POSCARs for 01-05 (checked them to make sure no overlap)
4. edited the number of cores (I am running on a 40 core, 4 gpu node)
5. I then ran my slurm script (script included in the zip file).
When I run the code, only files 01, 02, 03, and 04 indicate they are running (I.e have files besides POSCAR in them). I made sure my NCORE was an integer multiple of IMAGES (NCORE = 40, IMAGES = 5), so I'm not entirely sure what is going on here.
Another issue, is the vasprun.xml file alone is 12MB for each folder. Would sharing my INCAR be enough? That's the only file I changed compared to the tutorial in the assets/NEB run/run1/ folder. I have attached my slurm file, INCAR, and output of nebef.pl for now, but can add anything else.
Appreciate any help received here Thanks!
You do not have the required permissions to view the files attached to this post.
-
- Newbie
- Posts: 4
- Joined: Sat Feb 25, 2023 8:57 pm
Re: 4/5 Images running on Climbing Image NEB
Is there anyone who has experienced this or could help me diagnose this?
To reiterate, I have folders 00, 01, 02, 03, 04, 05, and 06.
00 and 06 are my end points, where 01, 02, 03, 04, and 05 are my images. Folders 01-04 show files besides POSCAR in them, where as 05 only has POSCAR.
Thanks again!
To reiterate, I have folders 00, 01, 02, 03, 04, 05, and 06.
00 and 06 are my end points, where 01, 02, 03, 04, and 05 are my images. Folders 01-04 show files besides POSCAR in them, where as 05 only has POSCAR.
Thanks again!
-
- Global Moderator
- Posts: 542
- Joined: Fri Nov 08, 2019 7:18 am
Re: 4/5 Images running on Climbing Image NEB
Please revisit the parallel setup of the calculation. It appears that you mix a lot of different parallelization options (MPI + OpenMP + OpenACC) and this causes the observed behavior.
Specifically, you run on 4 MPI ranks with 10 OpenMP treads according to your standard output, but in the INCAR file, you set
First, you should never set both NCORE and NPAR and in fact VASP will overrule that choice. For GPUs, NCORE will be set to 1 and then by default all the remainder will go to band parallelization, i.e., NPAR. So for your specific case you need neither of these flags. Secondly, because you use only 4 MPI ranks you cannot effectively parallelize over 5 images. This leads to the behavior you see. The MPI ranks start calculating the first 4 images and then after all of them are done 1 rank would deal with the single remaining image. Until the 4 first images are done with with there first electronic self consistency, there is no output in the directory of the 5th image.
So for your specific setup there are two reasonable choice: You can either run with 40 MPI ranks, where always 8 ranks would share a single image, or you can change your NEB run to 4 or 8 images so that you can run with 4 MPI ranks and 10 OpenMP threads. If you can deal with the different number of images, the latter case is probably more efficient.
You may also check out the NEB tutorial in the wiki for more information.
Specifically, you run on 4 MPI ranks with 10 OpenMP treads according to your standard output, but in the INCAR file, you set
Code: Select all
IMAGES = 5
NCORE=40
NPAR = 8
So for your specific setup there are two reasonable choice: You can either run with 40 MPI ranks, where always 8 ranks would share a single image, or you can change your NEB run to 4 or 8 images so that you can run with 4 MPI ranks and 10 OpenMP threads. If you can deal with the different number of images, the latter case is probably more efficient.
You may also check out the NEB tutorial in the wiki for more information.
Martin Schlipf
VASP developer
-
- Newbie
- Posts: 4
- Joined: Sat Feb 25, 2023 8:57 pm
Re: 4/5 Images running on Climbing Image NEB
Hi Dr. Schlipf,
Thank you for your reply and help!
I indeed made a kerfuffle there...
I found that by changing the number of tasks in my slurm file (--ntastks-per-node) to the number of images things began to work. (It seems that even though the job would run the first 4 images, it would never work on the final image in the fifth folder before ending).
My system has only 3 nodes, each one with 4 CPUs per node and 4 V100 GPUs per node. There is no high-speed connection between the nodes, so effectively I can at most use 1 node of 40 cores and 4 gpus per job.
I tried to allocate 4 gpus to the task using the following line in my slurm file:
but it seems like the other 3 GPUs don't get utilized:
After looking through the wiki, I wasn't able to figure out how I would use more than one GPU for the same job, is there another resource you could please point me towards?
Thanks,
Myles
Thank you for your reply and help!
I indeed made a kerfuffle there...
I found that by changing the number of tasks in my slurm file (--ntastks-per-node) to the number of images things began to work. (It seems that even though the job would run the first 4 images, it would never work on the final image in the fifth folder before ending).
My system has only 3 nodes, each one with 4 CPUs per node and 4 V100 GPUs per node. There is no high-speed connection between the nodes, so effectively I can at most use 1 node of 40 cores and 4 gpus per job.
I tried to allocate 4 gpus to the task using the following line in my slurm file:
Code: Select all
#SBATCH --gres=gpu:4
#SBATCH --contraint=v100s
Code: Select all
[gpu-v100s-01:2477212] 1 more process has sent help message help-mpi-btl-base.txt / btl:no-nics
[gpu-v100s-01:2477212] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Thanks,
Myles
-
- Global Moderator
- Posts: 542
- Joined: Fri Nov 08, 2019 7:18 am
Re: 4/5 Images running on Climbing Image NEB
In principle, 4 MPI ranks with 4 GPUs on 1 node should work. I will inquire if it would work if the number of ranks does not match with the number of GPUs or whether the deactivation of NCCL will also lead to not using some of the GPUs.
Martin Schlipf
VASP developer
-
- Global Moderator
- Posts: 542
- Joined: Fri Nov 08, 2019 7:18 am
Re: 4/5 Images running on Climbing Image NEB
I can confirm that this setup should work. Please make sure that the environment variable CUDA_VISIBLE_DEVICES is set when you run VASP. For 4 GPUs and OpenMPI this could be done with
Code: Select all
mpirun -np 4 -x CUDA_VISIBLE_DEVICES=0,1,2,3 /path/to/vasp/executable
Martin Schlipf
VASP developer
-
- Newbie
- Posts: 4
- Joined: Sat Feb 25, 2023 8:57 pm
Re: 4/5 Images running on Climbing Image NEB
Good afternoon Dr. Schlipf,
Thank you for your suggestion and help with this!
After changing:
to:
Entire slurm file for reference:
I still get the following from std-out
I also compared the compute times in the OUTCAR files and got 2643.359 seconds when using the new command line argument for mpi, versus 2675.369 seconds without using it. That seems within the realm of standard error? Unsure as of now.
Thanks,
Myles
Thank you for your suggestion and help with this!
After changing:
Code: Select all
/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/comm_libs/mpi/bin/mpirun /home/myless/VASP/vasp.6.3.2/bin/vasp_std
Code: Select all
/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/comm_libs/mpi/bin/mpirun -np 3 -x CUDA_VISIBLE_DEVICES=0,1,2 /home/myless/VASP/vasp.6.3.2/bin/vasp_std
Code: Select all
#!/bin/bash
#
#SBATCH --job-name=V_3gp
#SBATCH --output=std-out
#SBATCH --ntasks-per-node=3
#SBATCH --nodes=1
#SBATCH --gres=gpu:3
#SBATCH --constraint=v100s
#SBATCH --time=1-05:00:00
#SBATCH -p regular
cd "$SLURM_SUBMIT_DIR"
/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/comm_libs/mpi/bin/mpirun -np 3 -x CUDA_VISIBLE_DEVICES=0,1,2 /home/myless/VASP/vasp.6.3.2/bin/vasp_std
exit
Code: Select all
btl_base_warn_component_unused to 0.
--------------------------------------------------------------------------
----------------------------------------------------
OOO PPPP EEEEE N N M M PPPP
O O P P E NN N MM MM P P
O O PPPP EEEEE N N N M M M PPPP -- VERSION
O O P E N NN M M P
OOO P EEEEE N N M M P
----------------------------------------------------
running 3 mpi-ranks, with 10 threads/rank
each image running on 1 cores
distrk: each k-point on 1 cores, 1 groups
distr: one band on 1 cores, 1 groups
OpenACC runtime initialized ... 3 GPUs detected
vasp.6.3.2 27Jun22 (build Mar 8 2023 11:59:18) complex
POSCAR found type information on POSCAR V
01/POSCAR found : 1 types and 53 ions
scaLAPACK will be used selectively (only on CPU)
-----------------------------------------------------------------------------
| |
| ----> ADVICE to this user running VASP <---- |
| |
| You have a (more or less) 'large supercell' and for larger cells it |
| might be more efficient to use real-space projection operators. |
| Therefore, try LREAL= Auto in the INCAR file. |
| Mind: For very accurate calculation, you might also keep the |
| reciprocal projection scheme (i.e. LREAL=.FALSE.). |
| |
-----------------------------------------------------------------------------
LDA part: xc-table for Pade appr. of Perdew
POSCAR found type information on POSCAR V
00/POSCAR found : 1 types and 53 ions
POSCAR found type information on POSCAR V
04/POSCAR found : 1 types and 53 ions
Jacobian: 17.34300086515033
POSCAR found type information on POSCAR V
00/POSCAR found : 1 types and 53 ions
POSCAR found type information on POSCAR V
04/POSCAR found : 1 types and 53 ions
POSCAR, INCAR and KPOINTS ok, starting setup
FFT: planning ... GRIDC
FFT: planning ... GRID_SOFT
FFT: planning ... GRID
WAVECAR not read
[gpu-v100s-01:3001370] 2 more processes have sent help message help-mpi-btl-base.txt / btl:no-nics
[gpu-v100s-01:3001370] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
entering main loop
N E dE d eps ncg rms rms(c)
DAV: 1 0.834099289090E+04 0.83410E+04 -0.29610E+05 8660 0.127E+03
DAV: 2 0.231803422156E+02 -0.83178E+04 -0.81113E+04 8260 0.533E+02
DAV: 3 -0.515463820131E+03 -0.53864E+03 -0.50927E+03 11270 0.135E+02
DAV: 4 -0.545139401809E+03 -0.29676E+02 -0.28658E+02 10256 0.235E+01
Thanks,
Myles
-
- Global Moderator
- Posts: 542
- Joined: Fri Nov 08, 2019 7:18 am
Re: 4/5 Images running on Climbing Image NEB
Well, it seems like you are using 3 GPUs now. At least that is what the output says. If this is as fast as before that means one of two things: Either you ran on multiple GPUs before or your system cannot be accelerated much because the limiting factor is related to something else. If you want to figure out what is going on, I would recommend to compile VASP with profiling support and then compare the OUTCARs of the two runs.
One more advice: If you want to find the best possible setup it is often advisable to reduce the number of steps (NSW or NELM). Then you don't need to wait for nearly an hour to get feedback. You can check that in your output but I expect that every iteration takes about the same time, so optimizing the performance can be done on a subset of the steps.
One more advice: If you want to find the best possible setup it is often advisable to reduce the number of steps (NSW or NELM). Then you don't need to wait for nearly an hour to get feedback. You can check that in your output but I expect that every iteration takes about the same time, so optimizing the performance can be done on a subset of the steps.
Martin Schlipf
VASP developer