Total energy unreasonably different with parallel NPAR
Posted: Mon Sep 14, 2015 9:23 am
Dear all,
I do a calculation on a node with 12 cores on it. I do a repeated slab calculation. Firstly use the default value for NPAR. I find the self consistent calculation is converged but the total energy is positive, as
N E dE d eps ncg rms rms(c)
DAV: 1 0.697842165012E+03 0.69784E+03 -0.60118E+04 2016 0.837E+02
DAV: 2 0.231956316158E+03 -0.46589E+03 -0.43967E+03 2376 0.140E+02
DAV: 3 0.200190143989E+03 -0.31766E+02 -0.29509E+02 2436 0.310E+01
DAV: 4 0.199405108860E+03 -0.78504E+00 -0.73560E+00 2376 0.371E+00
DAV: 5 0.199388295250E+03 -0.16814E-01 -0.15968E-01 2100 0.372E-01 0.171E+02
DAV: 6 0.193222117805E+04 0.17328E+04 -0.10575E+04 2208 0.184E+02 0.114E+02
DAV: 7 0.306973187688E+04 0.11375E+04 -0.37124E+03 2076 0.116E+02 0.872E+01
DAV: 8 0.396531468939E+04 0.89558E+03 -0.20813E+03 3024 0.943E+01 0.244E+01
DAV: 9 0.399119740779E+04 0.25883E+02 -0.10042E+03 2796 0.736E+01 0.229E+01
DAV: 10 0.399343140005E+04 0.22340E+01 -0.19082E+02 2580 0.322E+01 0.202E+01
DAV: 11 0.397970519245E+04 -0.13726E+02 -0.17259E+02 2688 0.421E+01 0.180E+01
DAV: 12 0.401856874830E+04 0.38864E+02 -0.44255E+01 3036 0.225E+01 0.511E+00
DAV: 13 0.401846134440E+04 -0.10740E+00 -0.14256E+01 2400 0.909E+00 0.347E+00
DAV: 14 0.401922519746E+04 0.76385E+00 -0.21566E+00 2808 0.409E+00 0.109E+00
DAV: 15 0.401930512549E+04 0.79928E-01 -0.53074E-01 2400 0.207E+00 0.422E-01
DAV: 16 0.401935305226E+04 0.47927E-01 -0.10427E-01 2388 0.955E-01 0.281E-01
DAV: 17 0.401937332021E+04 0.20268E-01 -0.80278E-02 2808 0.948E-01 0.404E-01
DAV: 18 0.401938892756E+04 0.15607E-01 -0.39520E-02 2472 0.637E-01 0.177E-01
DAV: 19 0.401939131595E+04 0.23884E-02 -0.20370E-02 2508 0.406E-01 0.113E-01
DAV: 20 0.401939123928E+04 -0.76662E-04 -0.41487E-03 2868 0.192E-01 0.118E-01
DAV: 21 0.401939139125E+04 0.15197E-03 -0.87410E-04 2364 0.961E-02 0.109E-01
DAV: 22 0.401939161995E+04 0.22870E-03 -0.72381E-04 2256 0.669E-02 0.100E-01
DAV: 23 0.401939170716E+04 0.87205E-04 -0.14303E-04 1260 0.368E-02
1 F= 0.40193917E+04 E0= 0.40193917E+04 d E =0.000000E+00
and the number of bands is NBANDS=144, as
k-points NKPTS = 8 k-points in BZ NKDIM = 8 number of bands NBANDS= 144
However, with all the input the same except NPAR=4. The the self consistent calculation is converged with the total energy negative, as
N E dE d eps ncg rms rms(c)
DAV: 1 0.105163862033E+04 0.10516E+04 -0.67221E+04 2176 0.886E+02
DAV: 2 -0.107529395773E+02 -0.10624E+04 -0.10284E+04 2520 0.235E+02
DAV: 3 -0.875016376226E+02 -0.76749E+02 -0.75871E+02 2496 0.618E+01
DAV: 4 -0.893402286864E+02 -0.18386E+01 -0.18289E+01 2600 0.913E+00
DAV: 5 -0.893782348763E+02 -0.38006E-01 -0.37943E-01 2552 0.130E+00 0.724E+00
DAV: 6 -0.861988495185E+02 0.31794E+01 -0.64072E+00 2424 0.103E+01 0.353E+00
DAV: 7 -0.859051346963E+02 0.29371E+00 -0.30370E+00 2448 0.440E+00 0.162E+00
DAV: 8 -0.858878435512E+02 0.17291E-01 -0.48658E-01 2456 0.290E+00 0.630E-01
DAV: 9 -0.858835325671E+02 0.43110E-02 -0.14669E-01 2464 0.157E+00 0.233E-01
DAV: 10 -0.858827057887E+02 0.82678E-03 -0.42983E-02 2456 0.562E-01 0.148E-01
DAV: 11 -0.858795801960E+02 0.31256E-02 -0.40763E-03 2568 0.327E-01 0.595E-02
DAV: 12 -0.858783060030E+02 0.12742E-02 -0.27255E-03 2704 0.265E-01 0.559E-02
DAV: 13 -0.858774514252E+02 0.85458E-03 -0.22847E-03 3176 0.292E-01 0.286E-02
DAV: 14 -0.858775450251E+02 -0.93600E-04 -0.27668E-03 2384 0.122E-01 0.260E-02
DAV: 15 -0.858774491900E+02 0.95835E-04 -0.22038E-04 1984 0.558E-02
1 F= -.85877449E+02 E0= -.85877334E+02 d E =-.230281E-03
and the band is 136, which is 8 smaller than the NPAR=default
k-points NKPTS = 8 k-points in BZ NKDIM = 8 number of bands NBANDS= 136
And these are from OUTCAR:
executed on LinuxIFC date 2015.09.14 15:33:28
running on 12 total cores
distrk: each k-point on 12 cores, 1 groups
distr: one band on NCORES_PER_BAND= 3 cores, 4 groups
--------------------------------------------------------------------------------------------------------
executed on LinuxIFC date 2015.09.12 15:59:35
running on 12 total cores
distrk: each k-point on 12 cores, 1 groups
distr: one band on NCORES_PER_BAND= 1 cores, 12 groups
I think there is something wrong with the NPAR=default total energy, the result is extremely strange.
Best
Gang
I do a calculation on a node with 12 cores on it. I do a repeated slab calculation. Firstly use the default value for NPAR. I find the self consistent calculation is converged but the total energy is positive, as
N E dE d eps ncg rms rms(c)
DAV: 1 0.697842165012E+03 0.69784E+03 -0.60118E+04 2016 0.837E+02
DAV: 2 0.231956316158E+03 -0.46589E+03 -0.43967E+03 2376 0.140E+02
DAV: 3 0.200190143989E+03 -0.31766E+02 -0.29509E+02 2436 0.310E+01
DAV: 4 0.199405108860E+03 -0.78504E+00 -0.73560E+00 2376 0.371E+00
DAV: 5 0.199388295250E+03 -0.16814E-01 -0.15968E-01 2100 0.372E-01 0.171E+02
DAV: 6 0.193222117805E+04 0.17328E+04 -0.10575E+04 2208 0.184E+02 0.114E+02
DAV: 7 0.306973187688E+04 0.11375E+04 -0.37124E+03 2076 0.116E+02 0.872E+01
DAV: 8 0.396531468939E+04 0.89558E+03 -0.20813E+03 3024 0.943E+01 0.244E+01
DAV: 9 0.399119740779E+04 0.25883E+02 -0.10042E+03 2796 0.736E+01 0.229E+01
DAV: 10 0.399343140005E+04 0.22340E+01 -0.19082E+02 2580 0.322E+01 0.202E+01
DAV: 11 0.397970519245E+04 -0.13726E+02 -0.17259E+02 2688 0.421E+01 0.180E+01
DAV: 12 0.401856874830E+04 0.38864E+02 -0.44255E+01 3036 0.225E+01 0.511E+00
DAV: 13 0.401846134440E+04 -0.10740E+00 -0.14256E+01 2400 0.909E+00 0.347E+00
DAV: 14 0.401922519746E+04 0.76385E+00 -0.21566E+00 2808 0.409E+00 0.109E+00
DAV: 15 0.401930512549E+04 0.79928E-01 -0.53074E-01 2400 0.207E+00 0.422E-01
DAV: 16 0.401935305226E+04 0.47927E-01 -0.10427E-01 2388 0.955E-01 0.281E-01
DAV: 17 0.401937332021E+04 0.20268E-01 -0.80278E-02 2808 0.948E-01 0.404E-01
DAV: 18 0.401938892756E+04 0.15607E-01 -0.39520E-02 2472 0.637E-01 0.177E-01
DAV: 19 0.401939131595E+04 0.23884E-02 -0.20370E-02 2508 0.406E-01 0.113E-01
DAV: 20 0.401939123928E+04 -0.76662E-04 -0.41487E-03 2868 0.192E-01 0.118E-01
DAV: 21 0.401939139125E+04 0.15197E-03 -0.87410E-04 2364 0.961E-02 0.109E-01
DAV: 22 0.401939161995E+04 0.22870E-03 -0.72381E-04 2256 0.669E-02 0.100E-01
DAV: 23 0.401939170716E+04 0.87205E-04 -0.14303E-04 1260 0.368E-02
1 F= 0.40193917E+04 E0= 0.40193917E+04 d E =0.000000E+00
and the number of bands is NBANDS=144, as
k-points NKPTS = 8 k-points in BZ NKDIM = 8 number of bands NBANDS= 144
However, with all the input the same except NPAR=4. The the self consistent calculation is converged with the total energy negative, as
N E dE d eps ncg rms rms(c)
DAV: 1 0.105163862033E+04 0.10516E+04 -0.67221E+04 2176 0.886E+02
DAV: 2 -0.107529395773E+02 -0.10624E+04 -0.10284E+04 2520 0.235E+02
DAV: 3 -0.875016376226E+02 -0.76749E+02 -0.75871E+02 2496 0.618E+01
DAV: 4 -0.893402286864E+02 -0.18386E+01 -0.18289E+01 2600 0.913E+00
DAV: 5 -0.893782348763E+02 -0.38006E-01 -0.37943E-01 2552 0.130E+00 0.724E+00
DAV: 6 -0.861988495185E+02 0.31794E+01 -0.64072E+00 2424 0.103E+01 0.353E+00
DAV: 7 -0.859051346963E+02 0.29371E+00 -0.30370E+00 2448 0.440E+00 0.162E+00
DAV: 8 -0.858878435512E+02 0.17291E-01 -0.48658E-01 2456 0.290E+00 0.630E-01
DAV: 9 -0.858835325671E+02 0.43110E-02 -0.14669E-01 2464 0.157E+00 0.233E-01
DAV: 10 -0.858827057887E+02 0.82678E-03 -0.42983E-02 2456 0.562E-01 0.148E-01
DAV: 11 -0.858795801960E+02 0.31256E-02 -0.40763E-03 2568 0.327E-01 0.595E-02
DAV: 12 -0.858783060030E+02 0.12742E-02 -0.27255E-03 2704 0.265E-01 0.559E-02
DAV: 13 -0.858774514252E+02 0.85458E-03 -0.22847E-03 3176 0.292E-01 0.286E-02
DAV: 14 -0.858775450251E+02 -0.93600E-04 -0.27668E-03 2384 0.122E-01 0.260E-02
DAV: 15 -0.858774491900E+02 0.95835E-04 -0.22038E-04 1984 0.558E-02
1 F= -.85877449E+02 E0= -.85877334E+02 d E =-.230281E-03
and the band is 136, which is 8 smaller than the NPAR=default
k-points NKPTS = 8 k-points in BZ NKDIM = 8 number of bands NBANDS= 136
And these are from OUTCAR:
executed on LinuxIFC date 2015.09.14 15:33:28
running on 12 total cores
distrk: each k-point on 12 cores, 1 groups
distr: one band on NCORES_PER_BAND= 3 cores, 4 groups
--------------------------------------------------------------------------------------------------------
executed on LinuxIFC date 2015.09.12 15:59:35
running on 12 total cores
distrk: each k-point on 12 cores, 1 groups
distr: one band on NCORES_PER_BAND= 1 cores, 12 groups
I think there is something wrong with the NPAR=default total energy, the result is extremely strange.
Best
Gang