electronic convergence issue in large(r) systems

Problems running VASP: crashes, internal errors, "wrong" results.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
chrstphr
Newbie
Newbie
Posts: 7
Joined: Sat Feb 17, 2018 5:08 pm

electronic convergence issue in large(r) systems

#1 Post by chrstphr » Fri Nov 06, 2020 6:45 pm

Hello everyone,


we're trying to compute Nb(001) surfaces. First, the required amount of free atomic layers needed to obtain a reasonably relaxed surface layer profile was checked. To that end, a thick symmetric slab (blue in the plot below) was relaxed and it was found that considering half of this slab with the bottom 4 layers fixed (orange in the plot below) results in acceptable relaxation profiles:
relax.PNG
All these calculations were carried out with slabs only 1x1 conventional unit cells in lateral size and electronically converged just fine in the reasonable relaxation profiles visualized above.

However, once the thinner slab is replicated into a slab with 3x3 lateral unit cells in size and a tetrahedral interstitial hydrogen atom is added, a very persistent electronic convergence issue arises. Extensive tweaking of the electronic problem (by e.g. changing ALGO, NELMDL, playing around with the mixing, and much more) did not get us past the issue. Adhering to the flowchart at slide 22, we finally checked with ALGO=Normal again and found the convergence issue to persist as can be seen in the attached report1.zip. We then advanced according to the flowchart linked above and repeated the calculation with ICHARG=12, which did not converge either (see the attached report2.zip). We thus filed this bug report, as the ionic configuration seems to be perfectly reasonable. Note while we used vasp.6.1.1 with OpenMP support in all these calculations, we effectively only used the MPI parallelization capabilities of the binary.

Any help would be greatly appreciated and additional information will be happily provided if needed.

Thank you in advance and best regards.
You do not have the required permissions to view the files attached to this post.

ferenc_karsai
Global Moderator
Global Moderator
Posts: 460
Joined: Mon Nov 04, 2019 12:44 pm

Re: electronic convergence issue in large(r) systems

#2 Post by ferenc_karsai » Thu Nov 19, 2020 3:34 pm

I've rerun your calculation (REPORT1) with your INCAR file. I can't see any problems on my side. Calculation converges slowly but definitely to a reasonable energy, while in your case the energies are totally wrong.
Please redo the calculation without OPENMP support (compile without it) and don't set KPAR or NPAR. These three and most likely the POTCAR file were the only parameters that are different.
You also didn't send me the POTCAR file you were using. Please also upload that one.

chrstphr
Newbie
Newbie
Posts: 7
Joined: Sat Feb 17, 2018 5:08 pm

Re: electronic convergence issue in large(r) systems

#3 Post by chrstphr » Mon Nov 30, 2020 10:09 am

Thank you very much for your answer.
Please redo the calculation without OPENMP support (compile without it) and don't set KPAR or NPAR.
Actually, a coworker of mine originally ran into issues with these calculations and I did some troubleshooting, eventually swapping out basically all parameters and even the binary from vasp.5.4.4.pl2 (without OpenMP support) to vasp.6.1.1 (with OpenMP support). Because of this, I don't expect this to fix things, although I'll definitely compile another binary without OpenMP support and have a go at it, just to be sure. In a more general consideration, do you think there might be another technical issue at play here? E.g., would you recommend a certain intel compiler version? Also, is it safe to keep default optimization levels and only add -x<highest compatible CPU features>? Because that is how I built the binaries used here as well without many issues apart from the one reported above.
You also didn't send me the POTCAR file you were using. Please also upload that one.
Sorry, I thought posting POTCARs was prohibited due to the license, but I realize now it's even asked for in the forum posting guidelines. Please find the corresponding POTCAR file attached.
You do not have the required permissions to view the files attached to this post.

ferenc_karsai
Global Moderator
Global Moderator
Posts: 460
Joined: Mon Nov 04, 2019 12:44 pm

Re: electronic convergence issue in large(r) systems

#4 Post by ferenc_karsai » Tue Dec 01, 2020 1:02 pm

I checked now and you used the same POTCAR file as I do.

The most likely problem is in your toolchain.
It's really hard to help you blindly.
Maybe some suggestions:
-) Try compiling wihout scalapack (-DscaLAPACK in the CPP_OPTIONS). If that works you now the problem is in an inproper installation of scalapack.
-) Try different compiler (gfortran) and different mpi (openmpi).
-) As you suggested you can also try disabling compiler optimization but I doubt that will help.

ferenc_karsai
Global Moderator
Global Moderator
Posts: 460
Joined: Mon Nov 04, 2019 12:44 pm

Re: electronic convergence issue in large(r) systems

#5 Post by ferenc_karsai » Tue Dec 01, 2020 1:04 pm

I recommend intel 19.x, i ran the tests with intel 19.1.0.166 and open mpi 4.0.4.

chrstphr
Newbie
Newbie
Posts: 7
Joined: Sat Feb 17, 2018 5:08 pm

Re: electronic convergence issue in large(r) systems

#6 Post by chrstphr » Wed Dec 02, 2020 9:47 pm

Thank you very much for your suggestions. I'll try building new binaries along those lines ASAP. Best regards!

chrstphr
Newbie
Newbie
Posts: 7
Joined: Sat Feb 17, 2018 5:08 pm

Re: electronic convergence issue in large(r) systems

#7 Post by chrstphr » Tue Dec 08, 2020 7:14 pm

I now did build a couple of new binaries (all with intel parallel studio and intelmpi 2018) and your suggestion to compile without scaLAPACK usage was spot on. A calculation started with the provided example input now convergences perfectly fine without any issue whatsoever. :)

I guess I'll notice the HPC administration about that as well because I confirmed the VASP binaries they provide via the module system uses scaLAPACK as well and suffers the exact same issues when run with the provided input.

Thank you very much!

Post Reply