Page 1 of 1
How to improve parallelised calculation ?
Posted: Sun Apr 25, 2010 8:41 pm
by vasp16888
Dear vasp users:
I am running vasp 5.2 on suse linux operatiing system with Infiniband network card(20 GB/S).
My question is: if I wanna calculate on 2 nodes or even more nodes(each nodes has 12 processors) , how should I set the NPAR, LPLANE and other parameters about parallelisation??
Thanks a lot in advance:)
How to improve parallelised calculation ?
Posted: Mon Apr 26, 2010 6:44 am
by vasp16888
Can somebody give me some tips, thanks a lot:)
How to improve parallelised calculation ?
Posted: Mon Apr 26, 2010 1:01 pm
by alex
Nobody will answer faster if you scream around. This part of the forum is voluntarily.
About your question: You have to try on your own. It'll depend on number of atoms, cutoff, number of k-points, speed of your memory interface, speed of your cpu and so on. I hope you got an idea ...
Start with NPAR = 1 and remove LPLANE fron INCAR.
Hth
alex
How to improve parallelised calculation ?
Posted: Mon Apr 26, 2010 6:51 pm
by vasp16888
[quote="20px"]scream[/size] around. This part of the forum is voluntarily.
About your question: You have to try on your own. It'll depend on number of atoms, cutoff, number of k-points, speed of your memory interface, speed of your cpu and so on. I hope you got an idea ...
Start with NPAR = 1 and remove LPLANE fron INCAR.
Hth
alex
[/quote]First[/color] I have to say, sorry, I am a new comer of vasp forum, and there are a lot of things to learn here, thank you for your suggestions(all of them).
Second, about your reply (number of atoms, cutoff, number of k-points, speed of your memory interface, speed of your cpu), I already know it and tested some of them before I posted the thread. But thanks anyway.
Third, I read the userguide, it said:
LPLANE = .TRUE.
NPAR = number of nodes.
LSCALU = .FALSE.
NSIM = 4
but the improvement of parallelisation is not obvious.
My supercomputer's network card is Inifniband (20GB/S), and all hardwares are the latest, I just wanna know how to deal with large system efficiently(test many times, but failed).
If your time permiting, any suggestion will be greatly appreciated, sorry to bother you:)
Hui
<span class='smallblacktext'>[ Edited Mon Apr 26 2010, 08:53PM ]</span>
How to improve parallelised calculation ?
Posted: Tue Apr 27, 2010 1:46 pm
by Danny
The only way to do this is the hard way.
Step one, find a calculation which is welbehaved and runs ~5-20h on a single CPU containing ~100 atoms
do a calculation on multiple nodes/CPU's with LPLANE=.TRUE. and one with LPLANE=.False.
You should see a clear difference (sometime 50%) in time. (note: you need to use exactly the same CPU configuration)
After this you choose the best LPLANE value.
Second step: NSIM and NPAR, these two parameters seem connected and their behavior seems quite system dependent.
Choose a set of values NPAR, and NSIM (choose wisely so that they have some physical meaning wrt your cpu's and nodes)
and loop over all of them doing your test calculation.(i.e. #NPARx#NSIM calculations) and find a trend.
Default values are NSIM=1, NPAR=#CPU's.
The current machine I run on needs NSIM=8-16, NPAR=#nodes/2
(a former machine needed NPAR=#cores, while another one needed NPAR=#nodes)
It takes time but it is worth it.
Cheers
Danny
How to improve parallelised calculation ?
Posted: Wed Apr 28, 2010 3:14 am
by vasp16888
[quote author=.TRUE. and one with LPLANE=.False.
You should see a clear difference (sometime 50%) in time. (note: you need to use exactly the same CPU configuration)
After this you choose the best LPLANE value.
Second step: NSIM and NPAR, these two parameters seem connected and their behavior seems quite system dependent.
Choose a set of values NPAR, and NSIM (choose wisely so that they have some physical meaning wrt your cpu's and nodes)
and loop over all of them doing your test calculation.(i.e. #NPARx#NSIM calculations) and find a trend.
Default values are NSIM=1, NPAR=#CPU's.
The current machine I run on needs NSIM=8-16, NPAR=#nodes/2
(a former machine needed NPAR=#cores, while another one needed NPAR=#nodes)
It takes time but it is worth it.
Cheers
Danny[/quote]
Hi Danny:
I am little confused about the concept of node, cpu, core, and processor.
In my opinion, for instance: we have 6 nodes which are connected by Infiniband card, and each node has 2 cpus on the motheboard, and each cpu has 6 cores, which means each node has 12 cores. I think processor = cpu.
Please correct me if I am wrong.
Thanks:)
Yours sincerely:
Hui
How to improve parallelised calculation ?
Posted: Wed Apr 28, 2010 9:14 am
by Danny
Yes you are right, current day machinerie is confusing since node/CPU and core are often used interchangeably.
In the vasp manual the reference to node means actually core. In my case I try to only refer to
1) nodes= nodes
2) CPU/core/processor=smallest part that does the calculation, i.e. in your case the 12 cores I would refer to as 12 CPU's (I know technically it's wrong)
In your case for a 2 node(=24core calculation=4cpu) I would suggest trying
NPAR=1 (each band on the entire system)
NPAR=2 (one band per node)
NPAR=4 (one band per CPU)
NPAR=24 (one band per core)
NPAR=8 & NPAR=12 (one band per 3, 2 cores)
combined with NSIM=1,2,4,6,12,24
Danny
How to improve parallelised calculation ?
Posted: Thu Apr 29, 2010 1:11 am
by vasp16888
[quote author= nodes
2) CPU/core/processor=smallest part that does the calculation, i.e. in your case the 12 cores I would refer to as 12 CPU's (I know technically it's wrong)
In your case for a 2 node(=24core calculation=4cpu) I would suggest trying
NPAR=1 (each band on the entire system)
NPAR=2 (one band per node)
NPAR=4 (one band per CPU)
NPAR=24 (one band per core)
NPAR=8 & NPAR=12 (one band per 3, 2 cores)
combined with NSIM=1,2,4,6,12,24
Danny[/quote]
Thanks, Danny, it's more clear:)
Are the combination you tailed about last time:
NPAR=1, NSIM=1
NPAR=2, NSIM=2
NPAR=4, NSIM=4
NPAR=8, NSIM=6
NPAR=12, NSIM=12
NPAR=24, NSIM=24
if this is the case, I think it is relative easier.
But if it doesn't combine orderly, it's gonna be 36 combinations, this is a really hardwork for our limited computer resources:(
Thanks in advance
How to improve parallelised calculation ?
Posted: Thu Apr 29, 2010 7:40 am
by alex
Hi there again,
some hints:
You've got a fast machine with fast network, so start with NPAR = 1 and NSIM = 1 for one number of tasks you are most likely to use most often.
Next: NPAR = # of tasks / 4, NSIM unchanged.
Then: NPAR = # of tasks, NSIM unchanged.
Then take the two fastest and optimize NPAR further. I'd guess, you'll end up at NPAR = 2 or 4.
Then touch NSIM, same game.
Gotcha. Hth
alex
How to improve parallelised calculation ?
Posted: Fri Apr 30, 2010 8:33 am
by Danny
[quote="Danny"]Yes you are right, current day machinerie is confusing since node/CPU and core are often used interchangeably.
In the vasp manual the reference to node means actually core. In my case I try to only refer to
1) nodes= nodes
2) CPU/core/processor=smallest part that does the calculation, i.e. in your case the 12 cores I would refer to as 12 CPU's (I know technically it's wrong)
In your case for a 2 node(=24core calculation=4cpu) I would suggest trying
NPAR=1 (each band on the entire system)
NPAR=2 (one band per node)
NPAR=4 (one band per CPU)
NPAR=24 (one band per core)
NPAR=8 & NPAR=12 (one band per 3, 2 cores)
combined with NSIM=1,2,4,6,12,24
Danny[/quote]
Nope, I'm affraid it's the 36. Then again If you have a job that takes 10h on 1 core, this job on 24 cores might take 30 to 60 minutes...so you will need <36 hours on 24 cores = <1000 CPU hours ( still reasonable, knowing that you will probably face relaxations that take twice that time for 1 calculation, plus if you can gain 25-30% compared to the normal settings those 1000 hours are recovered quite quickly ;-)
Danny
How to improve parallelised calculation ?
Posted: Sat May 01, 2010 2:46 am
by vasp16888
ok, I am gonna do it:)
How to improve parallelised calculation ?
Posted: Mon May 03, 2010 11:03 pm
by vasp16888
[quote author= 1 and NSIM = 1 for one number of tasks you are most likely to use most often.
Next: NPAR = # of tasks / 4, NSIM unchanged.
Then: NPAR = # of tasks, NSIM unchanged.
Then take the two fastest and optimize NPAR further. I'd guess, you'll end up at NPAR = 2 or 4.
Then touch NSIM, same game.
Gotcha. Hth
alex
[/quote]</span>
How to improve parallelised calculation ?
Posted: Mon May 03, 2010 11:10 pm
by vasp16888
[quote="vasp16888"][quote author= nodes
2) CPU/core/processor=smallest part that does the calculation, i.e. in your case the 12 cores I would refer to as 12 CPU's (I know technically it's wrong)
In your case for a 2 node(=24core calculation=4cpu) I would suggest trying
NPAR=1 (each band on the entire system)
NPAR=2 (one band per node)
NPAR=4 (one band per CPU)
NPAR=24 (one band per core)
NPAR=8 & NPAR=12 (one band per 3, 2 cores)
combined with NSIM=1,2,4,6,12,24
Danny[/quote]
Nope, I'm affraid it's the 36. Then again If you have a job that takes 10h on 1 core, this job on 24 cores might take 30 to 60 minutes...so you will need <36 hours on 24 cores = <1000 CPU hours ( still reasonable, knowing that you will probably face relaxations that take twice that time for 1 calculation, plus if you can gain 25-30% compared to the normal settings those 1000 hours are recovered quite quickly ;-)
Danny[/quote]NPAR, NSIM, and LPLANE [/b]which may improve the efficiency. The result are posted in a new thread:
http://cms.mpi.univie.ac.at/vasp-forum/ ... php?4.7257
Please take a look, and there are some questions about the testing result, waiting for your suggestions, thanks in advance.