Memory requirement not met
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 28
- Joined: Fri Dec 18, 2020 1:23 pm
Memory requirement not met
NKDIM*NBANDS*NRPLWV*16
Please eleborate what is NRPLWV. I am continuously getting memory requirement not met though I changed the memory from 6GB to 10 GB per core with total 220 cores. I am attaching the necessary files below.
PFA....
Please eleborate what is NRPLWV. I am continuously getting memory requirement not met though I changed the memory from 6GB to 10 GB per core with total 220 cores. I am attaching the necessary files below.
PFA....
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 236
- Joined: Mon Apr 26, 2021 7:40 am
Re: Memory requirement not met
Hello!
Please be more informative when you ask questions in the forum. I guess you mean the formula here in the Wiki. I have to admit that this page is hard to interpret, I will put it on the agenda so it will be improved. To my knowledge this is an estimate of the number of plane waves but I do not know exactly and will try to get the right answer from my colleagues.
However, instead of relying on this formula I would strongly recommend to do some simple memory scaling tests for your system by hand. You should start with a less demanding setup with less k-points and a smaller energy cutoff. Measure the memory usage and slowly increase k-points and cutoff until you can extract some scaling behavior. Then you can make an estimate for the actual system requirements.
Are you sure you will really need ENCUT = 625 and a 1 x 5 x 5 k-points mesh for this system? Have you done convergence tests and no simpler setup was possible?
Also, can you make sure that you really have 10 GB available for each core on each node? I can see from your SLURM output that two different node names were used, are these identical nodes? I can see that you asked for 10 GB by setting --mem-per-cpu=10000M in your submit script, but are you sure that this is not silently ignored? Or maybe you shared the nodes with other people?
Best,
Andreas Singraber
Please be more informative when you ask questions in the forum. I guess you mean the formula here in the Wiki. I have to admit that this page is hard to interpret, I will put it on the agenda so it will be improved. To my knowledge this is an estimate of the number of plane waves but I do not know exactly and will try to get the right answer from my colleagues.
However, instead of relying on this formula I would strongly recommend to do some simple memory scaling tests for your system by hand. You should start with a less demanding setup with less k-points and a smaller energy cutoff. Measure the memory usage and slowly increase k-points and cutoff until you can extract some scaling behavior. Then you can make an estimate for the actual system requirements.
Are you sure you will really need ENCUT = 625 and a 1 x 5 x 5 k-points mesh for this system? Have you done convergence tests and no simpler setup was possible?
Also, can you make sure that you really have 10 GB available for each core on each node? I can see from your SLURM output that two different node names were used, are these identical nodes? I can see that you asked for 10 GB by setting --mem-per-cpu=10000M in your submit script, but are you sure that this is not silently ignored? Or maybe you shared the nodes with other people?
Best,
Andreas Singraber
-
- Newbie
- Posts: 28
- Joined: Fri Dec 18, 2020 1:23 pm
Re: Memory requirement not met
Hii,
I looked for plane wave cutoff criteria initially by increasing the plane wave cutoff from initial 550ev
550 575 600 625 650eV
-2079.090364 -2222.557333 -2222.753423 -2227.900555 -2227.945189eV
So after setting the plane wave cutoff to 625. I started for varying the number of k-point like 1*2*2 , 1*3*3 to 1*7*7.
Looking for energy of the system.
Finally will look for convergence criteria.
Initially for plane wave cutoff with k-point 1*1*1. There was no error and it worked for 220 cores and memory of 7000MB. Later I increased the memory to larger values.
Will check if the momory requirement request can be skipped but when I discussed it once, it was said that wait time would be large till the requirement met.Will confirm this.
Thank you.
I looked for plane wave cutoff criteria initially by increasing the plane wave cutoff from initial 550ev
550 575 600 625 650eV
-2079.090364 -2222.557333 -2222.753423 -2227.900555 -2227.945189eV
So after setting the plane wave cutoff to 625. I started for varying the number of k-point like 1*2*2 , 1*3*3 to 1*7*7.
Looking for energy of the system.
Finally will look for convergence criteria.
Initially for plane wave cutoff with k-point 1*1*1. There was no error and it worked for 220 cores and memory of 7000MB. Later I increased the memory to larger values.
Will check if the momory requirement request can be skipped but when I discussed it once, it was said that wait time would be large till the requirement met.Will confirm this.
Thank you.
-
- Global Moderator
- Posts: 236
- Joined: Mon Apr 26, 2021 7:40 am
Re: Memory requirement not met
Hello!
Ok, I tried to estimate the memory consumption and I believe that the VASP run should fit on 220 cores if each core gets 6 GB of memory. This is the result of a test run on a single core:
Now, the memory consumption should scale linearly with the number of k-points. In your OUTCAR file you can find the line
Hence, the planned run with KPOINTS 1x5x5 setting should require an approximate total of 1246 GB memory. If that can be evenly split across 220 cores you will need ~5.7 GB per core. Of course there will be some overhead if you run this in parallel but that should be manageable.
-----------
Some additional coments:
1.) Your SLURM output mentions
which sounds incorrect as you have allocated 26 nodes (see SLURM_JOB_NODELIST) with 220 cores in total. Can you please find out on what kind of nodes you are running this job (number and type of CPU used, how much memory is installed)?
2.) You should make use of parallelization via the NCORE tag when you can find a setup that works. However, do not use parallelization via KPAR as it will multiply the memory demand!
3.) Is there a specific reason why you turned scaLAPACK off (LSCALAPACK = .FALSE.)?
All the best,
Andreas Singraber
Ok, I tried to estimate the memory consumption and I believe that the VASP run should fit on 220 cores if each core gets 6 GB of memory. This is the result of a test run on a single core:
Code: Select all
VASP std binary
Single core run
KPOINTS: 1x1x1
ENCUT = 625 ===> 95.8 GB total memory consumption
Code: Select all
Found 13 irreducible k-points:
-----------
Some additional coments:
1.) Your SLURM output mentions
Code: Select all
...
SLURM_CPUS_ON_NODE = 1
...
2.) You should make use of parallelization via the NCORE tag when you can find a setup that works. However, do not use parallelization via KPAR as it will multiply the memory demand!
3.) Is there a specific reason why you turned scaLAPACK off (LSCALAPACK = .FALSE.)?
All the best,
Andreas Singraber
-
- Newbie
- Posts: 28
- Joined: Fri Dec 18, 2020 1:23 pm
Re: Memory requirement not met
Hello!!
Thanks for the detailed answer....
1]The cluster has 7200 cores and 28800Gb memory which makes 4 Gb per core. It has 150 compute nodes with 48 cores each. I can request the number of nodes but the request will have a large wait time. So i request the required cores from the cluster and it provides those cores depending on availability over the nodes.
2]..
3]No there is not specific reason to keep the sCALAPACK off. How does it affect computationally.
Thanks for the detailed answer....
1]The cluster has 7200 cores and 28800Gb memory which makes 4 Gb per core. It has 150 compute nodes with 48 cores each. I can request the number of nodes but the request will have a large wait time. So i request the required cores from the cluster and it provides those cores depending on availability over the nodes.
2]..
3]No there is not specific reason to keep the sCALAPACK off. How does it affect computationally.