accelerating the ML_FF MD run

Message

suojiang_zhang1 · #1 Post by **suojiang_zhang1** » Tue Jun 27, 2023 6:05 am

Hi,
I trained a ML_FF and tested the FF, and the results are good by checking the experimental density.
subsequently, I ran the ML_FF using a much larger system with 1500 atoms and ran on 96 cores by paralleling the 3 nodes.
but I saw that the calculation is slower than the classic MD lammps calculation.
So my question is how to accelerate my MD calculation? or the trained FF used in the third part MD code, lammps or gromacs to realize the faster calculation

thank you for your attentions

#2 Post by **ferenc_karsai** » Tue Jun 27, 2023 12:17 pm

I'm not so sure what you want, but let me try to answer to what I understand.

Classical force-fields will be always much faster than machine-learned force fields since they are much simpler. But of course significantly less accurate for most problems.

Are you using the fast force-field. This means did you retrain with ML_MODE=REFIT?
Without it you are 20-100 times slower.

We have no LAMMPS or Gromacs implementation, but the timings of our code should be similar in timings as most machine learning codes implemented in LAMMPS.

suojiang_zhang1 · #3 Post by **suojiang_zhang1** » Wed Jun 28, 2023 2:12 am

thank you
I did not use the ML_MODE=REFIT.
I used the ML_MODE=train, then got a ML_FFN with good accurate, the real RMSE of energy and force are low.
I copy the ML_FFN to ML_FF, and run the test at a large system with ML_MODE=run.

#4 Post by **ferenc_karsai** » Wed Jun 28, 2023 6:47 am

Ok please use ML_MODE=REFIT for two reasons:
1) Refit is done with SVD, which leads to more accurate force fields.
2) The fast method can be used.

You will see the acceleration will be huge.

suojiang_zhang1 · #5 Post by **suojiang_zhang1** » Wed Jun 28, 2023 7:34 am

thank for your nice advices
I tried to use the ML_MODE=refit in the same context, I copied my ML_ABN to ML_AB.
but I found the "Total memory consumption" in ML_LOGFILE is 7712.4MB, which is much larger than that of ML_MODE=run with 96.7MB memory consumption, thus my calculation wat stoped because the memory was exhausted.

How I solve the problem?

in addition, ML_MODE=refit, I set up ML_OUTBLOCK=100, the error is "ML_INTERFACE: ERROR: ML_OUTBLOCK must be 1 for ML_ISTART not equal 2".

#6 Post by **ferenc_karsai** » Wed Jun 28, 2023 7:58 am

ML_MODE=REFIT and ML_MODE=RUN are two entirely different things. So it's pointless to compare their memory consumption.

After all fitting methods the ML_MODE=RUN is the way to use the force fields for production runs. So you would have to run ML_MODE=REFIT and after that run ML_MODE=RUN for many steps.

In ML_MODE=TRAIN on-the-fly training and refits many times with changed local reference functions and training structures. This results in a force-field (ML_FFN), which can give error prediction but can be only run with the slow version of MODE=RUN and may not contain the fitting parameters one wanted to use for the final force field.
After that one should refit the force-field with probably different parameters, there are two options:
ML_MODE=REFITBAYESIAN
ML_MODE=REFIT

REFITBAYESIAN uses the same fitting method (Bayesian linear regression) as ML_MODE=TRAIN, which needs less memory but which has lower accuracy and can be only used with the slow execution method in the production runs. This method has a very bad conditions and hence we don't recommend it.
Hence the optimal solution is to refit with ML_MODE=REFIT: This method uses the full design matrix (please read wiki/index.php/Machine_learning_force_field:_Theory) which needs significantly more memory than the Bayesian linear regression. This method is the method of choice for solving linear equations and should be always used.

Ok what can you do about the memory:
1) First of all check that you have compiled with shared memory MPI "-Duse_shmem" in the compiler options. Without it only small calculations are possible.
2) The largest amount of memory needed is usually for the design matrix. This matrix is linearly distributed over compute ranks. By increasing the number of compute nodes the required space per node should shrink.

suojiang_zhang1 · #7 Post by **suojiang_zhang1** » Thu Jun 29, 2023 1:02 am

Yes, I do it well, thank you.

My Community

accelerating the ML_FF MD run

accelerating the ML_FF MD run

Re: accelerating the ML_FF MD run

Re: accelerating the ML_FF MD run

Re: accelerating the ML_FF MD run

Re: accelerating the ML_FF MD run

Re: accelerating the ML_FF MD run

Re: accelerating the ML_FF MD run