VASP_GPU Performance Issue

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Locked
Message
Author
burakgurlek
Jr. Member
Jr. Member
Posts: 51
Joined: Thu Apr 06, 2023 12:25 pm

VASP_GPU Performance Issue

#1 Post by burakgurlek » Tue Oct 24, 2023 1:12 pm

Dear all,

I wanted to test the performance of GPU run over CPU one. I tried a single SCF loop, but I have not seen any improvement with GPU use (3 times slower compared to CPU, which is calculated as NBANDS*KPOINTS/8/NCORE) despite NSIM=128 and 1 MPI rank per GPU. I probably make some mistakes and would be happy if you could help. The simulation files are attached.

Regards,
Burak
You do not have the required permissions to view the files attached to this post.

marie-therese.huebsch
Full Member
Full Member
Posts: 211
Joined: Tue Jan 19, 2021 12:01 am

Re: VASP_GPU Performance Issue

#2 Post by marie-therese.huebsch » Mon Nov 06, 2023 11:26 am

Hi,

Thank you for running performance tests and sharing them here!

It is true that for the comparison of time to solution for GPU vs. CPU, the CPU version has the upper hand by a factor of 3 for this particular calculation.

Code: Select all

OUTCAR_CPU:     LOOP+:  cpu time    670.8752: real time    672.0153
OUTCAR_GPU:     LOOP+:  cpu time   1941.4516: real time   1905.4101
There are a couple of aspects to note:
  • Most of the time is lost when computing the VdW forces.

    Code: Select all

    OUTCAR_CPU:    FORVDW:  cpu time     13.9918: real time     14.0008
    OUTCAR_GPU:    FORVDW:  cpu time    959.5451: real time    951.2572
    That is because this part of the code has not been ported to GPU. Thus, at this point, we cannot recommend running calculations that include VdW forces on GPU.
  • The GPU run takes 2 more iteration steps to reach convergence.

    Code: Select all

      1 OUTCAR_CPU:      LOOP:  cpu time     31.0964: real time     31.3084
      2 OUTCAR_CPU:      LOOP:  cpu time     37.0171: real time     37.1010
      3 OUTCAR_CPU:      LOOP:  cpu time     36.1667: real time     36.2138
      4 OUTCAR_CPU:      LOOP:  cpu time     35.8931: real time     35.9381
      5 OUTCAR_CPU:      LOOP:  cpu time     36.2547: real time     36.3063
      6 OUTCAR_CPU:      LOOP:  cpu time     29.3839: real time     29.4306
      7 OUTCAR_CPU:      LOOP:  cpu time     32.8986: real time     32.9410
      8 OUTCAR_CPU:      LOOP:  cpu time     28.9229: real time     28.9653
      9 OUTCAR_CPU:      LOOP:  cpu time     32.5930: real time     32.6356
     10 OUTCAR_CPU:      LOOP:  cpu time     33.8198: real time     33.8665
     11 OUTCAR_CPU:      LOOP:  cpu time     32.8474: real time     32.8873
     12 OUTCAR_CPU:      LOOP:  cpu time     34.5196: real time     34.5708
     13 OUTCAR_CPU:      LOOP:  cpu time     40.0197: real time     40.0728
     14 OUTCAR_CPU:      LOOP:  cpu time     34.0037: real time     34.0559
     15 OUTCAR_CPU:      LOOP:  cpu time     33.3978: real time     33.4419
     16 OUTCAR_CPU:      LOOP:  cpu time     35.9634: real time     36.0134
     17 OUTCAR_CPU:      LOOP:  cpu time     38.1054: real time     38.1565
     18 OUTCAR_CPU:      LOOP:  cpu time     25.3977: real time     25.4350
     19 OUTCAR_CPU:     LOOP+:  cpu time    670.8752: real time    672.0153
     

    Code: Select all

      1 OUTCAR_GPU:      LOOP:  cpu time     40.1964: real time     39.7315
      2 OUTCAR_GPU:      LOOP:  cpu time     46.5786: real time     46.2402
      3 OUTCAR_GPU:      LOOP:  cpu time     48.1026: real time     47.7735
      4 OUTCAR_GPU:      LOOP:  cpu time     54.6248: real time     54.3239
      5 OUTCAR_GPU:      LOOP:  cpu time     57.1046: real time     55.8257
      6 OUTCAR_GPU:      LOOP:  cpu time     38.0119: real time     39.7700
      7 OUTCAR_GPU:      LOOP:  cpu time     42.5045: real time     41.0562
      8 OUTCAR_GPU:      LOOP:  cpu time     37.5500: real time     36.0945
      9 OUTCAR_GPU:      LOOP:  cpu time     42.1764: real time     40.6554
     10 OUTCAR_GPU:      LOOP:  cpu time     44.2746: real time     42.7324
     11 OUTCAR_GPU:      LOOP:  cpu time     42.5493: real time     41.0302
     12 OUTCAR_GPU:      LOOP:  cpu time     46.4575: real time     44.9322
     13 OUTCAR_GPU:      LOOP:  cpu time     50.7778: real time     49.2683
     14 OUTCAR_GPU:      LOOP:  cpu time     43.5063: real time     41.9760
     15 OUTCAR_GPU:      LOOP:  cpu time     42.6561: real time     41.1782
     16 OUTCAR_GPU:      LOOP:  cpu time     45.6093: real time     44.1545
     17 OUTCAR_GPU:      LOOP:  cpu time     48.8931: real time     47.4580
     18 OUTCAR_GPU:      LOOP:  cpu time     33.3592: real time     31.8699
     19 OUTCAR_GPU:      LOOP:  cpu time     35.0750: real time     33.5765
     20 OUTCAR_GPU:      LOOP:  cpu time     29.2735: real time     28.8527
     21 OUTCAR_GPU:     LOOP+:  cpu time   1941.4516: real time   1905.4101
     
    This could just as well be the other way around. So, there is no fundamental conclusion we can draw from this observation.
  • Time to solution vs. power per iteration step: I understand the interest in comparing time to solution, but alternatively, one can look at time per iteration step to judge the performance. If we do that and subtract the contribution from the VdW forces, we still observe that the GPU run takes about 25% more time to solution. Additionally, you could consider the power consumption and availability of resources. Depending on your hardware, the GPU may have the upper hand (for calculations without VdW forces) after all when considering power per iteration step.
I hope these comments are helpful.
Cheers,
Marie-Therese

Locked