Efficiency of Simultaneous Multithreading on a Cluster
Posted: Wed Oct 18, 2023 10:14 am
Hi,
I'm currently starting to use VASP on a cluster, rather than on a workstation, and have a question concerning AMD's version of hyperthreading. On the Wiki I found a warning that stated that hyperthreading is not beneficial to VASP's performance. I don't quite understand whether this statement applies to my situation:
I'm running calculations with 448 bands present. I found online that ideally the number of cores used is NBANDS/4, so in my case 112. VSC5 has 64 core nodes using AMD 7713 CPUs, which have 2 threads per core. Instinctively I'd use one node, using 112 threads using "mpirun --use-hwthread-cpus -np 112 vasp_std".
Is this more or less efficient than just using the 64 physical cores without utilising SMT (running with -np 64)? I'll try benchmarking it myself over the next couple of days, but I'd like to understand what the warning in the wiki really means ^^
And maybe since you're here: I also saw on the wiki that NCORE = 4 is good for systems with around 100 atoms. Should I also set KPAR to another value than 1 for my situation? According to a post I found KPAR should equal the number of nodes used, as long as that divides the number of k-points evenly. I'll work with one node at a time for now, so is the default KPAR ideal?
Cheers and thanks for the help,
Max
I'm currently starting to use VASP on a cluster, rather than on a workstation, and have a question concerning AMD's version of hyperthreading. On the Wiki I found a warning that stated that hyperthreading is not beneficial to VASP's performance. I don't quite understand whether this statement applies to my situation:
I'm running calculations with 448 bands present. I found online that ideally the number of cores used is NBANDS/4, so in my case 112. VSC5 has 64 core nodes using AMD 7713 CPUs, which have 2 threads per core. Instinctively I'd use one node, using 112 threads using "mpirun --use-hwthread-cpus -np 112 vasp_std".
Is this more or less efficient than just using the 64 physical cores without utilising SMT (running with -np 64)? I'll try benchmarking it myself over the next couple of days, but I'd like to understand what the warning in the wiki really means ^^
And maybe since you're here: I also saw on the wiki that NCORE = 4 is good for systems with around 100 atoms. Should I also set KPAR to another value than 1 for my situation? According to a post I found KPAR should equal the number of nodes used, as long as that divides the number of k-points evenly. I'll work with one node at a time for now, so is the default KPAR ideal?
Cheers and thanks for the help,
Max