[VASP 6.2 intel 2020.0] FeAl_333_RPAFORCE fails with 12 tasks with Fatal error in PMPI_Waitall:

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
thibautvery
Newbie
Newbie
Posts: 8
Joined: Fri Mar 24, 2017 4:02 pm

[VASP 6.2 intel 2020.0] FeAl_333_RPAFORCE fails with 12 tasks with Fatal error in PMPI_Waitall:

#1 Post by thibautvery » Thu Mar 04, 2021 9:41 am

Hello,

I run the test FeAl_333_RPAFORCE with a version of VASP 6.2 compiled with Intel parallel studio 2020.0 (as recommended on the wiki).
It runs smoothly up to 10 MPI tasks.
With 12 tasks, there is an error just after the end of the first geometric step.

Code: Select all

Abort(17) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Waitall: See the MPI_ERROR field in MPI_Status for the error code
The same test runs well with a version compiled with intel parallel studio 2018.5.

I attached the stdout file and OUTCAR for both runs, the makefile.include file (the same for both compilers) and the output of DDT for the stacktrace.

Do you know if there is a workaround for the problem?

Thibaut Véry
You do not have the required permissions to view the files attached to this post.

merzuk.kaltak
Administrator
Administrator
Posts: 282
Joined: Mon Sep 24, 2018 9:39 am

Re: [VASP 6.2 intel 2020.0] FeAl_333_RPAFORCE fails with 12 tasks with Fatal error in PMPI_Waitall:

#2 Post by merzuk.kaltak » Thu Mar 04, 2021 12:42 pm

Dear Thibaut,

This might be an Intel-MPI problem. Is it possible to change the intel-mpi version, but use the same compiler?
Alternatively, you may try other compiler toolchains as listed on our wiki.

Also, tests in the testsuite have been tested for 1, 2, 3, 4, 6 and 8 MPI ranks as mentioned here. In addition the test you are running is testing an undocumented feature of vasp that is still in development. As such, it is never run with "make test" or "make test_all". We are still working on the RPA forces to make them more stable and reliable.

with regards,
Merzuk

andreas.singraber
Global Moderator
Global Moderator
Posts: 236
Joined: Mon Apr 26, 2021 7:40 am

Re: [VASP 6.2 intel 2020.0] FeAl_333_RPAFORCE fails with 12 tasks with Fatal error in PMPI_Waitall:

#3 Post by andreas.singraber » Wed Sep 08, 2021 7:35 am

Dear Thibaut,

we came across the same error messages and similar behaviour for some other calculation and figured out that there is a compiler bug up to Intel 2021.2 which affects non-blocking broadcast messages (MPI_Ibcast) as used in VASP. I was also able to write a simple reproducer code to trigger the error.

Now, I am not entirely sure that your error has the same origin as the one we found but it is very likely. In this case the fix we found may also work for you: just try to upgrade your Intel compiler to the latest version 2021.3 which resolves the MPI_Ibcast issues. However, there is one downside to this.. with 2021.3 there are problems with the OpenMP version of VASP (see here), so it may be better to try without OpenMP support.

Also, the next version of VASP will allow to set a preprocessor flag to avoid this compiler bug for Intel compiler versions < 2021.3.

If you are able to test 2021.3, please let us know if it fixed your problem as well!

All the best,

Andreas Singraber

Post Reply