optimizatione of nodes and cores
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 34
- Joined: Wed Aug 03, 2022 10:42 am
optimizatione of nodes and cores
Hello,
I am using HSE06 tags for a system including 160 atoms. my setups for the computing machine is 24 nodes and 24 core for each node.
I am using a 1*4*2 Kpoint and ISYM=0. But I was wondering if there would be a better setting for computing machine to get a faster result? (for my settings when I use only Gama KPOINTs for NSW = 100 it took 2 days to be done and for NSW= 1 and 1*4*2 KPOINTS it took around 1 day.)
Best regards,
Asiyeh
I am using HSE06 tags for a system including 160 atoms. my setups for the computing machine is 24 nodes and 24 core for each node.
I am using a 1*4*2 Kpoint and ISYM=0. But I was wondering if there would be a better setting for computing machine to get a faster result? (for my settings when I use only Gama KPOINTs for NSW = 100 it took 2 days to be done and for NSW= 1 and 1*4*2 KPOINTS it took around 1 day.)
Best regards,
Asiyeh
-
- Global Moderator
- Posts: 542
- Joined: Fri Nov 08, 2019 7:18 am
Re: optimizatione of nodes and cores
The computational cost of hybrid functionals like HSE increases quadratically with the number of k points. The Gamma-only version can reduce the computational cost further by making some arrays real instead of complex. Hence, the time difference between a Gamma-only and a 1 x 4 x 2 mesh don't seem unreasonable to me. Whether you need the denser mesh depends on the system and needs to be carefully tested.
Here are a couple ideas you can use to improve the speed of your calculation
Here are a couple ideas you can use to improve the speed of your calculation
- For hybrid functionals in particular there is the flag NKRED to reduce the inner second summation over the mesh.
- If your system has symmetry, not setting ISYM=0 can reduce the number of k points in the outer summation.
- You should also carefully review the documentation of the parallelization and see if you can optimize something further. If you did not set any parallelization flags (NCORE or KPAR) yet, this may be a good place to start.
- If you have access to GPU nodes, you could compile VASP with OpenACC support which tends to perform quite well for hybrid functionals.
- If you intend to run longer runs (NSW >> 1) with this setup, you may explore machine learning. You could learn a reasonable good force field at small system sizes and may then only need a few steps for this big system.
Martin Schlipf
VASP developer
-
- Newbie
- Posts: 34
- Joined: Wed Aug 03, 2022 10:42 am
Re: optimizatione of nodes and cores
Thank you for your kind respond, I will try ML.
Best,
Asiyeh
Best,
Asiyeh
-
- Newbie
- Posts: 34
- Joined: Wed Aug 03, 2022 10:42 am
Re: optimizatione of nodes and cores
In the case of GPUs I need to know if AMD GPUs sopport VASP?martin.schlipf wrote: ↑Fri Jun 02, 2023 1:41 pm The computational cost of hybrid functionals like HSE increases quadratically with the number of k points. The Gamma-only version can reduce the computational cost further by making some arrays real instead of complex. Hence, the time difference between a Gamma-only and a 1 x 4 x 2 mesh don't seem unreasonable to me. Whether you need the denser mesh depends on the system and needs to be carefully tested.
Here are a couple ideas you can use to improve the speed of your calculation
- For hybrid functionals in particular there is the flag NKRED to reduce the inner second summation over the mesh.
- If your system has symmetry, not setting ISYM=0 can reduce the number of k points in the outer summation.
- You should also carefully review the documentation of the parallelization and see if you can optimize something further. If you did not set any parallelization flags (NCORE or KPAR) yet, this may be a good place to start.
- If you have access to GPU nodes, you could compile VASP with OpenACC support which tends to perform quite well for hybrid functionals.
- If you intend to run longer runs (NSW >> 1) with this setup, you may explore machine learning. You could learn a reasonable good force field at small system sizes and may then only need a few steps for this big system.
Best regards,
Asiyeh
-
- Global Moderator
- Posts: 542
- Joined: Fri Nov 08, 2019 7:18 am
Re: optimizatione of nodes and cores
AMD GPUs are not officially supported because their compiler does not support a sufficient OpenACC standard at this stage. In principle, gfortran claims to support a sufficient OpenACC standard but we did not verify that it works to run VASP on AMD GPUs. We are in contact with the compiler developers and try to resolve this issue.
Martin Schlipf
VASP developer
-
- Newbie
- Posts: 34
- Joined: Wed Aug 03, 2022 10:42 am
Re: optimizatione of nodes and cores
Thank you so much for clarification.
Best regards,
Asiyeh
Best regards,
Asiyeh
-
- Newbie
- Posts: 34
- Joined: Wed Aug 03, 2022 10:42 am
Re: optimizatione of nodes and cores
Hello,asiyeh_shokri2 wrote: ↑Tue Jun 13, 2023 2:10 pmIn the case of GPUs I need to know if AMD GPUs sopport VASP?martin.schlipf wrote: ↑Fri Jun 02, 2023 1:41 pm The computational cost of hybrid functionals like HSE increases quadratically with the number of k points. The Gamma-only version can reduce the computational cost further by making some arrays real instead of complex. Hence, the time difference between a Gamma-only and a 1 x 4 x 2 mesh don't seem unreasonable to me. Whether you need the denser mesh depends on the system and needs to be carefully tested.
Here are a couple ideas you can use to improve the speed of your calculation
- For hybrid functionals in particular there is the flag NKRED to reduce the inner second summation over the mesh.
- If your system has symmetry, not setting ISYM=0 can reduce the number of k points in the outer summation.
- You should also carefully review the documentation of the parallelization and see if you can optimize something further. If you did not set any parallelization flags (NCORE or KPAR) yet, this may be a good place to start.
- If you have access to GPU nodes, you could compile VASP with OpenACC support which tends to perform quite well for hybrid functionals.
- If you intend to run longer runs (NSW >> 1) with this setup, you may explore machine learning. You could learn a reasonable good force field at small system sizes and may then only need a few steps for this big system.
Best regards,
Asiyeh
I am trying your advice about ML. I used the example test in the package for ML_MgO_defect (but as I saw in the POSCAR there was no defect included). I just changed the ISIF from 3 to 2 as I just want to relax the position of atoms. But my question is about the temprature I should use. In this example the T is 300 K. As I should use the final CONTCAR positions for DFT calculations I do not know if 0 K is a good choise or room temprature? I tryed both and I see complet different finall position for my impurity even after DFT calcs on ML CONTCAR results.
Would you please let me know if you have any idea.
Best regards,
Asiyeh
-
- Global Moderator
- Posts: 542
- Joined: Fri Nov 08, 2019 7:18 am
Re: optimizatione of nodes and cores
It depends what you want to do: If you want to run an ab-initio MD simulation then you want to run the training runs at about the same temperature than the production calculation. You could possibly even use a temperature range if you want to vary the temperature afterwards. Note that in this case, you do not expect to get the same positions for different temperatures because the initial velocities and the impact of the thermostat are different.
If you only want to relax the big structure but want to learn a force-field from a smaller structure, you should choose the temperature such that the forces are comparable. A general property of ML is that it works well if the space where it is applied is similar to where if is learned. So you want to see similar local configurations of the atoms in the training and in the production run. Note that it is possible to do this more systematically as well. You learn a force field and then check that the predicted forces agree well with the exact calculation. The things you need to consider for the ML run is documented on this site.
If you only want to relax the big structure but want to learn a force-field from a smaller structure, you should choose the temperature such that the forces are comparable. A general property of ML is that it works well if the space where it is applied is similar to where if is learned. So you want to see similar local configurations of the atoms in the training and in the production run. Note that it is possible to do this more systematically as well. You learn a force field and then check that the predicted forces agree well with the exact calculation. The things you need to consider for the ML run is documented on this site.
Martin Schlipf
VASP developer
-
- Newbie
- Posts: 34
- Joined: Wed Aug 03, 2022 10:42 am
Re: optimizatione of nodes and cores
Thank you for your explanation.
Best regards,
Asiyeh
Best regards,
Asiyeh