Page 1 of 1

VASP2WANNIER90 UNK file bug

Posted: Thu May 20, 2021 10:59 am
by chengcheng_xiao1
The UNK files generated by VASP (v6.2.1) include all bands where bands specified by `exclude_bands` should be excluded.

Those bands should be excluded as specified by by `pw2wannier90` interface (to Pwscf code) provide by the Wannier90 distro (see: https://github.com/wannier-developers/w ... 0.f90#L595)

I'm using the new interface the old way (with a standalone `.win` file) and not sure if this is the desired behavior since `.amn`, `.mmn` files all contain the correct number of bands.

Note that prior to VASP v6.2.0, the UNK files are generated correctly using the old interface.

This behavior should be easily corrected by sending `EXCLUDE_BAND` to `GET_WAVE_FUNCTIONS` subroutine in `mlwf.F` (as the old interface does.)

Re: VASP2WANNIER90 UNK file bug

Posted: Thu May 20, 2021 3:35 pm
by andreas.singraber
Thank you for your report and suggestions, we are looking into this...

Re: VASP2WANNIER90 UNK file bug

Posted: Fri May 28, 2021 7:57 am
by henrique_miranda
Yes, indeed a bug was introduced in the last changes related to mlwf.F
The fix is indeed to pass the `exclude_bands` array to `get_wave_functions` so that the information about the excluded bands is not written to the UNK* files.

Thanks a lot for pointing it out!
The fix will be integrated with the next release.

Re: VASP2WANNIER90 UNK file bug

Posted: Wed Jun 02, 2021 2:32 pm
by jbackman
Another UNK file problem is that they are not written if no projections are specified. This might not a bug, but there are cases where one would like to write AMN and UNK files in separate runs. For example when the UNK files are very large and one needs to think about disk space, or when the calculation of the AMN is very slow, as reported here: (forum/viewtopic.php?f=4&t=18069).

For me a temporary solution was to comment out the following line:
IF ((P_MLWF%PROJ_MODE==UNKNOWN_MODE).AND.(.NOT.MY_LWANNIER90_RUN)) RETURN
in the mlwf.F file.

Re: VASP2WANNIER90 UNK file bug

Posted: Wed Mar 16, 2022 4:09 pm
by joel_eaves
Has this writing error been fixed in the newest release of VASP, v6.3.0? I could not find mention of it in the patch notes, but I might have missed it.

Thanks,
Peyton Cline
Postdoctoral Associate
Dr. Joel Eaves Group
Department of Chemistry
University of Colorado Boulder

Re: VASP2WANNIER90 UNK file bug

Posted: Fri Apr 01, 2022 5:37 pm
by joel_eaves
henrique_miranda wrote: Fri May 28, 2021 7:57 am Yes, indeed a bug was introduced in the last changes related to mlwf.F
The fix is indeed to pass the `exclude_bands` array to `get_wave_functions` so that the information about the excluded bands is not written to the UNK* files.

Thanks a lot for pointing it out!
The fix will be integrated with the next release.
Has this bug been fixed in VASP 6.3.0?

Thanks,
Peyton Cline
Postdoctoral Associate
Dr. Joel Eaves Group
University of Colorado Boulder
Department of Chemistry

Re: VASP2WANNIER90 UNK file bug

Posted: Mon Apr 04, 2022 7:38 am
by henrique_miranda
Yes, this bug has been fixed in vasp 6.3.0.
I added it to our list of known bugs:
wiki/index.php/Known_issues
If you still encounter some issues please let me know.

Re: VASP2WANNIER90 UNK file bug

Posted: Wed Jun 29, 2022 11:47 pm
by joel_eaves
henrique_miranda wrote: Mon Apr 04, 2022 7:38 am Yes, this bug has been fixed in vasp 6.3.0.
I added it to our list of known bugs:
https://www.vasp.at/wiki/index.php/Known_issues
If you still encounter some issues please let me know.
Dear Henrique,

I have recently begun to do more Wannier-conversion runs using VASP 6.3.1 and Wannier90 v3.1.0. Things seem to be working correctly, for the most part; however, the 'exclude_bands' flag is causing some very strange behavior. It seems the UNK* files are being written correctly at least, but I've noticed that if I have 'exclude_bands' in the wannier90.win file or alternatively in the INCAR via the WANNIER90_WIN flag, I can receive 1 of 3 possible errors for KPAR = 1, and some mixture of these same errors for KPAR > 1. I will only discuss the KPAR = 1 results here. I've attached files within a ZIP folder to the message.

To start, I'm considering a bulk primitive CdS wurtzite unit cell, which has 4 atoms and 36 electrons. I am using the default cutoff for these calculations, and NBANDS = 48 to ensure a well-converged valence band structure. I also want to mention at this point that the errors occur regardless if the cores are standard cores (~4.2 GB memory per core) or high-memory cores (~42 GB memory per core). Therefore, I do not believe the problem concerns the total available memory since this system is so small.

(1) The first error I receive seems to occur if I run the vasp-to-wannier interface on multiple cores, say 6-12 cores, but where the core count is less than NBANDS. The error message, which is printed in an e.* file and repeated multiple times depending on the core count, says something like

[smem0301:75240:0:75240] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x1178ee78)

See attached file called 'e.6cores'. What's strange is this error does not cause VASP to crash, and no error message prints to the screen. The calculation stalls but continues running until the wall-time is reached, or until I cancel the calculation manually. The error also happens after the UNK* files are written if LWRITE_UNK = T, or after the *amn file is written if LWRITE_UNK = F. So it seems that all wannier90 files are written, at the very least, but it is still concerning this error happens at all. I do not recall seeing anything like this when I used to run VASP 5.4.4 linked with Wannier90 v1.2.

(2) The second error I can receive seems to occur if I run the vasp-to-wannier interface on a single core (again, either standard or high memory). Error messages print to the screen and to an e.* file, followed by a termination of my VASP process. Again, this happens after the wannier90 files are written, either after the UNK* files if LWRITE_UNK = T or after the *amn file if LWRITE_UNK = F. The error that prints to the screen states

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 53651 RUNNING AT smem0201
= KILLED BY SIGNAL: 6 (Aborted)
===================================================================================

See screen.1core for an example. The information e.* file is much more complicated than the previous case. See attached file called e.1core. At the head of this e.1core file, the error reads

*** Error in `/projects/rocl5502/programs/summit_vasp.6.3.1_O2_intel20/bin/vasp_std': corrupted size vs. prev_size: 0x0000000020b25c10 ***

followed by a backtrace and memory map.

(3) The third error I can receive seems to occur when I use numcores = NBANDS = 48. This error looks a lot like the previous error for 1 core and causes a full crash, yet interestingly, the messages I see are distinct from either of the previous cases. First, the messages printed to the screen involve several "KILLED BY SIGNAL: 9 (Killed)" statuses and a few " KILLED BY SIGNAL: 6 (Aborted)" statuses. In my experience in the past, signal 9 messages general mean the calculation fails due to memory demands. However, that doesn't make much sense to me here since I ran this calculation on a full high-memory node, which has 2 TB across all 48 cores. See screen.48cores for an example. Lastly, the e.* file gives a different message than previously, namely

*** Error in `/projects/rocl5502/programs/summit_vasp.6.3.1_O2_intel20/bin/vasp_std': free(): invalid next size (normal): 0x000000000ddf67d0 ***

several times (differing only in the last few characters), followed by several backtraces and memory maps. See e.48cores for an example.

Any help would be appreciated. I've tried multiple compilations of VASP, including -O2 and -O0 compilations, with and without the -xHOST flag, and I am using Intel 20.2, MKL 20.2, and IMPI 19.8 in all versions. Something of note: on my -O0 compilation without -xHOST, I did use 'gdb' on a single-core run to see if I could backtrace where exactly these issues happen in the code, but I could not figure out where to go from there. I could not find any of the addressed functions within either the VASP or Wannier90 codes, and the first message (#0) after the backtrace seems to point to some issue in the C-standard library, I believe. See output.gdb for more details.

Thanks for your help,
Peyton Cline
Postdoctoral Associate
Prof. Joel Eaves' Group
University of Colorado Boulder
Department of Chemistry

Re: VASP2WANNIER90 UNK file bug

Posted: Wed Jul 06, 2022 4:01 pm
by henrique_miranda
Dear Peyton,

Sorry that you are experiencing this issue and thank you for the detailed bug report.
I would like to try to reproduce this issue locally and for that I would need to have the POSCAR, POTCAR, KPOINTS and INCAR files.
Could you please share them?