Page 1 of 1
BUG in vasp.6.3.0: internal error in: radial.F
Posted: Fri Mar 11, 2022 3:39 pm
by Yicheng
Dear developer,
I run with the on-the-fly ML in VASP.6.3.0 and found the calculation terminated at 3568 ion steps and reported a bug. I encountered this problem in two different calculations.
Here is one of the task files:
bugreport.rar
Here is the bug:
bug.png
Re: BUG in vasp.6.3.0: internal error in: radial.F
Posted: Mon Mar 14, 2022 9:38 am
by ferenc_karsai
At first glance I see it is in the 3569th step, which is an ab-initio step.
Hence I don't see an obvious bug in machine learning.
Your force fits are very inaccurate.
I looked at your input settings and saw that you set 8000K as target temperature. Do you really need that high temperature?
Of course you encounter more configurations at high temperature, but it also introduces more noise.
I need to look further into the ab initio parts.
Could you please also upload the OUTCAR and CONTCAR file. The CONTCAR I would like to have to quickly look at how the structure looks at the 3568th step.
I think it will be very hard for me to reproduce the problem on the current size, because it would be computationally very demanding.
If I don't find anything suspicious in the OUTCAR files then we would need to reduce the problem size.
Have you tried the calculation also on smaller cells?
Could you post your other calculation too where this happens.
Re: BUG in vasp.6.3.0: internal error in: radial.F
Posted: Mon Mar 14, 2022 1:02 pm
by Yicheng
ferenc_karsai wrote: ↑Mon Mar 14, 2022 9:38 am
I need to look further into the ab initio parts.
Could you please also upload the OUTCAR and CONTCAR file. The CONTCAR I would like to have to quickly look at how the structure looks at the 3568th step.
I think it will be very hard for me to reproduce the problem on the current size, because it would be computationally very demanding.
If I don't find anything suspicious in the OUTCAR files then we would need to reduce the problem size.
Have you tried the calculation also on smaller cells?
Could you post your other calculation too where this happens.
Dear Moderator,
Thank you for your help.
I've noticed this bug appearing on many different tasks over the last few days (same system but different volumes). The OUCAR file of the previous task is too big to upload. Here is another task with the same error, the attachments contain more complete input and output files.
As you said, all errors occur at the ab initio step. 8000K should not be the cause, as I successfully ran these input files on the same cluster using vasp.5.4 a long time ago. The only difference is that I am now using vasp.6.3.0 and have added machine learning parameters to the INCAR file.
Here are the input and output files:
bugreport.tar.bz2
OUTCAR.tar.bz2
Re: BUG in vasp.6.3.0: internal error in: radial.F
Posted: Mon Mar 14, 2022 1:28 pm
by Yicheng
ferenc_karsai wrote: ↑Mon Mar 14, 2022 9:38 am
I need to look further into the ab initio parts.
Could you please also upload the OUTCAR and CONTCAR file. The CONTCAR I would like to have to quickly look at how the structure looks at the 3568th step.
I think it will be very hard for me to reproduce the problem on the current size, because it would be computationally very demanding.
If I don't find anything suspicious in the OUTCAR files then we would need to reduce the problem size.
Have you tried the calculation also on smaller cells?
Could you post your other calculation too where this happens.
Here are files from another task with the same error.:
bugreport2.tar.bz2
OUTCAR.part1.rar
OUTCAR.part2.rar
Almost all tasks of this system report this bug. But the calculations of another research system do not have this problem.
Re: BUG in vasp.6.3.0: internal error in: radial.F
Posted: Tue Mar 22, 2022 8:39 am
by ferenc_karsai
I reran the last structure (CONTCAR and also the next structure by selecting NSW=0 and NSW=1) but I could not get the error. So the problem is most likely not the structure itself. Together with Georg Kresse we looked at your input and saw that you set MAXMIX=30. For machine learning calculations it is very important that one does not set MAXMIX>0. We've now even put this information onto the VASP wiki:
wiki/index.php/Machine_learning_force_f ... ns:_Basics
and
wiki/index.php/MAXMIX
The information on the wiki contains:
"Do not set MAXMIX>0 when using MLFF. During machine learning, the first principles calculations are often bypassed for hundreds or even thousands of ionic steps, and the ions might move considerably between first principles calculations. In this cases using MAXMIX will very often lead to electronic divergence or strange errors during the self-consistency cycle."
So in your calculation you set MAXMIX=30, but very often you have more than 30 steps. Also after a few hundred to thousand steps the number of ab-initio calculations drastically decreases. The problem with MAXMIX is that it continous countig even between ionic steps and restarts the mixer after 30 electronic steps. Most likely in your case it unluckily cuts in the 4th electronic step of the 3569th ionic iteration. You can see also that after that step the electronic calculation starts to strongly diverge.
So maybe you could try running without MAXMIX.
Another big problem in your calculation is that the quality of your force field is terrible. In particular the real error of the forces is too large (usually it should be around 0.1 eV/Ang) and growing. (the 4th column in "grep ERR ML_LOGFILE"). I'm not sure if MAXMIX will cure this. We generally see that magnetic structures are very hard to learn, also the high temperature you have will induce larger energy differences which will lead to larger standard deviations. This will increase the error of the force field. I would definitely set a lower temperature. Maybe max 500K above the expected melting point.
Re: BUG in vasp.6.3.0: internal error in: radial.F
Posted: Wed Mar 23, 2022 8:38 am
by Yicheng
ferenc_karsai wrote: ↑Tue Mar 22, 2022 8:39 am
I reran the last structure (CONTCAR and also the next structure by selecting NSW=0 and NSW=1) but I could not get the error. So the problem is most likely not the structure itself. Together with Georg Kresse we looked at your input and saw that you set MAXMIX=30. For machine learning calculations it is very important that one does not set MAXMIX>0. We've now even put this information onto the VASP wiki:
wiki/index.php/Machine_learning_force_f ... ns:_Basics
and
wiki/index.php/MAXMIX
Thank you very much for your detailed answer!
I will try a new test based on your suggestion.