It's not exactly the same problem.
In the other post the user ran out of memory immediately because scaLAPACK was not employed. The code is practically unusable without scaLAPACK for realistic systems, since each processor needs to have the entire design matrix which is a huge object. With scaLAPACK the design matrix is linearly scaling with the number of processors.
In our current case Gerbrand doesn't run out of memory, but the maximum number of local reference configurations (ML_MB) is reached. The default is ML_MB=1500. This number is usually enough for simple to medium difficult systems, but for complex systems or training data from different conditions (e.g. Si in it's different phases) this is easily not enough. So in that case just simply increase this number.
ML_MB sets the column dimension of the design matrx for each atom type.
The row dimension is ML_MCONF. ML_MCONF conatins the whole training structures (this is exportable to other ML methods), ML_MB conatins the local reference configurations for specific atoms (this is specific to Kernel ridge regression). So the size of the design matrix that will be allocated is ML_MB*ML_CONF*Number_of_atom_types. Again with scalapack this array will be then shared by all processors, so the more processors one uses the smaller this array will get per processor.
The beginning of the ML_LOGFILE (
wiki/index.php/ML_LOGFILE) contains information on the estimated memory.
Most of the arrays like the design matrix are statically allocated at the beginning of the code. Why? Because we use shared memory MPI. At the point when we implemented shared memory using SystemV we saw that reallocations of shared memory segments lead to total irregular crashes. Shared memory is used for many important arrays, so we ended up using static memory allocations.