Parallelization-related nstabilities
Posted: Thu May 03, 2018 8:03 pm
Dear all,
in the last few week I am experiencing some instabilities in VASP (the 5.4.4 version) related to the choice of the parallelization tags, namely particular NCORE/NPAR and KPAR.
In particular the most neat issues I encountered are:
1 - the scf loops aborts before starting, just after NELMDL steps or hangs, always reporting this error: "EDWAV: internal error, the gradient is not orthogonal"
The usual answer is that either the geometry is unrealistic or the optimization during compile is too extreme. I was able to get rid of this message with just one combination of NPAR, number of cpus and number of nodes. For example with the
same INCAR (NPAR = 8, KPAR = 1) 3x48 nodes does not work, but 6x48 it does (tested on two different clusters);
2 - even if the simulation runs, some results are unrealistic and DO depend on the size of it. For example, while relaxing a monolayer of NaCl, with converged cutoffs and kpoints, the 6x4 supercell works smoothly, while the 9x4 one, with the same
INCAR, gets some forces that tend to split the two atomic species apart (tested even with very small POTIM). This behavior gets fixed with a different setting of NPAR (or NCORE);
3 - to extract the DOS from the system I start a nscf calculation of a previously converged one with a denser k-mesh. In very few cases, no related to the size of the supercell, the code aborts with Out-of-memory error OR the AECCAR0 file is not
calculated properly (NaN along the grid points and in the calculated augmentation charge).
What worries me is that I cannot find a rationale behind this behavior and at the end I am force to throw blindly sets of values in the INCAR until the code works. I am sure that this is related to my little experience in using VASP: however I just wanted to report it in case some of these infos can affect other users or really reflect an issue. I did not attach the input files because they would be too many, but I have them if needed.
Best regards,
Aldo
in the last few week I am experiencing some instabilities in VASP (the 5.4.4 version) related to the choice of the parallelization tags, namely particular NCORE/NPAR and KPAR.
In particular the most neat issues I encountered are:
1 - the scf loops aborts before starting, just after NELMDL steps or hangs, always reporting this error: "EDWAV: internal error, the gradient is not orthogonal"
The usual answer is that either the geometry is unrealistic or the optimization during compile is too extreme. I was able to get rid of this message with just one combination of NPAR, number of cpus and number of nodes. For example with the
same INCAR (NPAR = 8, KPAR = 1) 3x48 nodes does not work, but 6x48 it does (tested on two different clusters);
2 - even if the simulation runs, some results are unrealistic and DO depend on the size of it. For example, while relaxing a monolayer of NaCl, with converged cutoffs and kpoints, the 6x4 supercell works smoothly, while the 9x4 one, with the same
INCAR, gets some forces that tend to split the two atomic species apart (tested even with very small POTIM). This behavior gets fixed with a different setting of NPAR (or NCORE);
3 - to extract the DOS from the system I start a nscf calculation of a previously converged one with a denser k-mesh. In very few cases, no related to the size of the supercell, the code aborts with Out-of-memory error OR the AECCAR0 file is not
calculated properly (NaN along the grid points and in the calculated augmentation charge).
What worries me is that I cannot find a rationale behind this behavior and at the end I am force to throw blindly sets of values in the INCAR until the code works. I am sure that this is related to my little experience in using VASP: however I just wanted to report it in case some of these infos can affect other users or really reflect an issue. I did not attach the input files because they would be too many, but I have them if needed.
Best regards,
Aldo