Hi all,
I am Liu Jiyuan, who asked this memory leak problem in the Q&A during the VASP workshop.
This job was run by 2xA30 GPU associated with the 2 Xeon Gold 6326 sockets and 256 G memory. The used memory exceeded the total memory when the calculation reached 6700+ steps. The VASP was compiled by nvhpc 22.7 along with cuda 11.7 and VTST. The ompi414 was compiled by nvc+nvfortran with coda aware.
Thanks!
memory leak: AIMD with openmpi 4.1.4 on GPUs
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 4
- Joined: Tue Mar 29, 2022 6:44 am
memory leak: AIMD with openmpi 4.1.4 on GPUs
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 491
- Joined: Mon Nov 04, 2019 12:41 pm
- Contact:
Re: memory leak: AIMD with openmpi 4.1.4 on GPUs
Hi Liu,
Could you try running the same calculation using OMP_NUM_THREADS=1 and check if the problem persists?
Recently we had a report about a similar issue in this thread:
https://www.vasp.at/forum/viewtopic.php?f=3&t=18493
We are still looking into it but knowing whether setting OMP_NUM_THREADS=1 alleviates the issue would be a great help for us to narrow down the scope of possible issues.
Could you try running the same calculation using OMP_NUM_THREADS=1 and check if the problem persists?
Recently we had a report about a similar issue in this thread:
https://www.vasp.at/forum/viewtopic.php?f=3&t=18493
We are still looking into it but knowing whether setting OMP_NUM_THREADS=1 alleviates the issue would be a great help for us to narrow down the scope of possible issues.
-
- Newbie
- Posts: 4
- Joined: Tue Mar 29, 2022 6:44 am
Re: memory leak: AIMD with openmpi 4.1.4 on GPUs
Hi Henrique,
OMP_NUM_THREADS=1 works! The memory usage is greatly reduced.
For OMP_NUM_THREADS=16 ion step 0~6000 OUTCAR:
Total CPU time used (sec): 96743.969
User time (sec): 94038.925
System time (sec): 2705.043
Elapsed time (sec): 70288.818
Maximum memory used (kb): 131238960.
Average memory used (kb): N/A
Minor page faults: 102568499
Major page faults: 5194
Voluntary context switches: 59524318
For OMP_NUM_THREADS=1 ion step 6001~12000 OUTCAR (continue run):
Total CPU time used (sec): 83684.477
User time (sec): 83524.712
System time (sec): 159.766
Elapsed time (sec): 83831.101
Maximum memory used (kb): 16510832.
Average memory used (kb): N/A
Minor page faults: 18466879
Major page faults: 4622
Voluntary context switches: 846170
The real usage of memory is much higher that the recorded one, but the magnitude makes sense.
Thanks.
OMP_NUM_THREADS=1 works! The memory usage is greatly reduced.
For OMP_NUM_THREADS=16 ion step 0~6000 OUTCAR:
Total CPU time used (sec): 96743.969
User time (sec): 94038.925
System time (sec): 2705.043
Elapsed time (sec): 70288.818
Maximum memory used (kb): 131238960.
Average memory used (kb): N/A
Minor page faults: 102568499
Major page faults: 5194
Voluntary context switches: 59524318
For OMP_NUM_THREADS=1 ion step 6001~12000 OUTCAR (continue run):
Total CPU time used (sec): 83684.477
User time (sec): 83524.712
System time (sec): 159.766
Elapsed time (sec): 83831.101
Maximum memory used (kb): 16510832.
Average memory used (kb): N/A
Minor page faults: 18466879
Major page faults: 4622
Voluntary context switches: 846170
The real usage of memory is much higher that the recorded one, but the magnitude makes sense.
Thanks.