IBM p575 with xlf90 compiler, OS = AIX 5.3

I have a parallel version of vasp (4.6) compiled with mpi - but it dies after 5 or 6 iterations for each system I have tested with a segmentation fault. I am pretty sure that this should not be a memory issue, as I have tried increasing this and it seems to be a systematic problem. e.g. for the ZnO benchmark I get
running on 8 nodes
distr: one band on 1 nodes, 8 groups
vasp.4.6.21 23Feb03 complex
POSCAR found : 2 types and 4 ions
LDA part: xc-table for Ceperly-Alder, standard interpolation
POSCAR, INCAR and KPOINTS ok, starting setup
WARNING: wrap around errors must be expected
FFT: planning ... 1
reading WAVECAR
entering main loop
N E dE d eps ncg rms rms(c)
DAV: 1 0.212310080706E+03 0.21231E+03 -0.13908E+04 11760 0.228E+03
DAV: 2 0.159873211399E+01 -0.21071E+03 -0.19908E+03 11000 0.364E+02
DAV: 3 -0.209552998554E+02 -0.22554E+02 -0.22370E+02 11768 0.116E+02
DAV: 4 -0.213289625180E+02 -0.37366E+00 -0.37335E+00 11984 0.159E+01
DAV: 5 -0.213312742778E+02 -0.23118E-02 -0.23117E-02 11864 0.123E+00 0.105E+01
DAV: 6 -0.220796307926E+02 -0.74836E+00 -0.34184E+01 16960 0.167E+02
which looks normal enough, and the only warnings I see in OUTCAR are about the wrap around errors, but at this point it stops with only the message:
ERROR: 0031-250 task 0: Segmentation fault
ERROR: 0031-250 task 6: Terminated
ERROR: 0031-250 task 1: Terminated
ERROR: 0031-250 task 2: Terminated
ERROR: 0031-250 task 3: Terminated
ERROR: 0031-250 task 4: Terminated
ERROR: 0031-250 task 5: Terminated
ERROR: 0031-250 task 7: Terminated

It behaves very similarly for a couple of other systems - managing 5 or 6 iterations only.
I have tried the obvious things (increasing memory, decreasing no. of k-points to see if it makes a difference) but I think there is probably something at compilation time that needs fixing.

Hope someone can comment or suggest something to try!

Makefile follows:
.SUFFIXES: .inc .f .F

# all CPP processed fortran files have the extension .f

# fortran compiler and linker

# C-preprocessor define any of the flags given below
# NGXhalf charge density reduced in X direction
# wNGXhalf gamma point only reduced in X direction
# CACHE_SIZE 5001 for SP3 and Power 3
# 32768 for 550,590,3CT
# 8001 595/397 quad word systems
CPP_ = /usr/ccs/lib/cpp -P
OFLAG = -O3 -qarch=auto -qipa -q32
OBJ_HIGH = none
OBJ_NOOPT = none
DEBUG = -g
INLINE = $(OFLAG) -Q+dfro1,+dfro2,+dfq1,+dfq2,+fun,+expw,+cpw,+CORLSD,+GCOR,+cpwsp

# just in case of testing the f77 fft routines
FFLAGS_F77= -qautodbl=dblpad -qdpc=e -O3 -qarch=auto

# options for linking
# the following option increases the size of the data frame
LINK = -Wl,-bD:1000000000 -qipa

FFT3D = fftmpi.o fftmpi_map.o fft3dlib.o
# fortran linker for mpi:


# additional options for CPP in parallel version (see also above):
# NGZhalf charge density reduced in Z direction
# wNGZhalf gamma point only reduced in Z direction
# scaLAPACK use scaLAPACK (usually slower on 100 Mbit Net)

CPP = $(CPP_) -DNGZhalf -DMPI -Dessl -DHOST=\"IBMBF\" -DMPI_BLOCK=500\
-Dkind8 -DCACHE_SIZE=0 -Davoidalloc -DIFC -DPGF90 \
$*.F >$*$(SUFFIX)
# libraries for mpi

#LIB = -L../vasp.4.lib -ldmy
#./vasp.4.lib/lapack_double.o ../vasp.4.lib/linpack_double.o
LIB = -L../vasp.4.lib -ldmy ../vasp.4.lib/lapack_double.o ../vasp.4.lib/linpack_double.o \
-lesslsmp -L/usr/lpp/ppe.poe/lib/ -lmpi

# general rules and compile lines

# general rules and compile lines
BASIC= symmetry.o symlib.o lattlib.o random.o

SOURCE= base.o mpi.o smart_allocate.o xml.o \
constant.o jacobi.o main_mpi.o scala.o \
asa.o lattice.o poscar.o ini.o setex.o radial.o \
pseudo.o mgrid.o mkpoints.o wave.o wave_mpi.o $(BASIC) \
nonl.o nonlr.o dfast.o choleski2.o \
mix.o charge.o xcgrad.o xcspin.o potex1.o potex2.o \
metagga.o constrmag.o pot.o cl_shift.o force.o dos.o elf.o \
tet.o hamil.o steep.o \
chain.o dyna.o relativistic.o LDApU.o sphpro.o paw.o us.o \
ebs.o wavpre.o wavpre_noio.o broyden.o \
dynbr.o rmm-diis.o reader.o writer.o tutor.o xml_writer.o \
brent.o stufak.o fileio.o opergrid.o stepver.o \
dipol.o xclib.o chgloc.o subrot.o optreal.o davidson.o \
edtest.o electron.o shm.o pardens.o paircorrection.o \
optics.o constr_cell_relax.o stm.o finite_diff.o \
elpol.o setlocalpp.o

vasp: $(SOURCE) $(FFT3D) $(INC) main.o
rm -f vasp
$(FCL) -o vasp $(LINK) main.o $(SOURCE) $(FFT3D) $(LIB)
makeparam: $(SOURCE) $(FFT3D) makeparam.o main.F $(INC)
$(FCL) -o makeparam $(LINK) makeparam.o $(SOURCE) $(FFT3D) $(LIB)
zgemmtest: zgemmtest.o base.o random.o $(INC)
$(FCL) -o zgemmtest $(LINK) zgemmtest.o random.o base.o $(LIB)
dgemmtest: dgemmtest.o base.o random.o $(INC)
$(FCL) -o dgemmtest $(LINK) dgemmtest.o random.o base.o $(LIB)
ffttest: base.o smart_allocate.o mpi.o mgrid.o random.o ffttest.o $(FFT3D) $(INC)
$(FCL) -o ffttest $(LINK) ffttest.o mpi.o mgrid.o random.o smart_allocate.o base.o $(FFT3D) $(LIB)
kpoints: $(SOURCE) $(FFT3D) makekpoints.o main.F $(INC)
$(FCL) -o kpoints $(LINK) makekpoints.o $(SOURCE) $(FFT3D) $(LIB)

-rm -f *.g *.f *.o *.L *.mod ; touch *.F

main.o: main$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c main$(SUFFIX)
xcgrad.o: xcgrad$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcgrad$(SUFFIX)
xcspin.o: xcspin$(SUFFIX)
$(FC) $(FFLAGS) $(INLINE) $(INCS) -c xcspin$(SUFFIX)

makeparam.o: makeparam$(SUFFIX)
$(FC) $(FFLAGS)$(DEBUG) $(INCS) -c makeparam$(SUFFIX)

makeparam$(SUFFIX): makeparam.F main.F
# MIND: I do not have a full dependency list for the include
# and MODULES: here are only the minimal basic dependencies
# if one strucuture is changed then touch_dep must be called
# with the corresponding name of the structure
base.o: base.F
mgrid.o: mgrid.F
constant.o: constant.F
lattice.o: lattice.F
setex.o: setex.F
pseudo.o: pseudo.F
poscar.o: poscar.F
mkpoints.o: mkpoints.F
wave.o: wave.F
nonl.o: nonl.F
nonlr.o: nonlr.F
fftw3.o: fftw3.f

$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

fft3dlib_f77.o: fft3dlib_f77.F
$(F77) $(FFLAGS_F77) -c $*$(SUFFIX)

$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)
$(FC) $(FFLAGS) $(OFLAG) $(INCS) -c $*$(SUFFIX)

# special rules

#$(FC) $(FFLAGS) $(INCS) -qoptimize=2 -O2 -c $*$(SUFFIX)
radial.o: radial.F
$(FC) $(FFLAGS) $(INCS) -c $*$(SUFFIX)

nonl.o: nonl.F
$(FC) $(FFLAGS) $(INCS) -O -c $*$(SUFFIX)

paw.o: paw.F
$(FC) $(FFLAGS) $(INCS) -O1 -c $*$(SUFFIX)

pseudo.o: pseudo.F
$(FC) $(FFLAGS) $(INCS) -O1 -c $*$(SUFFIX)

vasp.4.6.21 23Feb03 complex
executed on IBMBF date 2007.10.17 15:41:09
running on 8 nodes
distr: one band on 1 nodes, 8 groups


VRHFIN =Zn: d10 p2
LEXCH = 91
EATOM = 1489.7187 eV, 109.4912 Ry
LULTRA = T use ultrasoft PP ?
IUNSCR = 1 unscreen: 0-lin 1-nonlin 2-no
RPACOR = 1.300 partial core radius
POMASS = 65.390; ZVAL = 12.000 mass and valenz
RCORE = 2.650 outmost cutoff radius
RWIGS = 2.650; RWIGS = 1.402 wigner-seitz radius (au A)
ENMAX = 209.545; ENMIN = 157.159 eV
EAUG = 346.344

RCLOC = 1.828 cutoff for local pot
LCOR = T correct aug charges
RMAX = 3.185 core radius for proj-oper
QCUT = -3.924; QGAM = 7.849 optimization parameters

2 .000 7 2.150 23 2.650
2 .000 7 2.150 23 2.650
0 .000 15 2.360 23 2.360
0 .000 15 2.360 23 2.360
1 -.200 15 2.360 23 2.650
1 .000 15 2.360 23 2.650
3 .000 7 .000 0 .000
local pseudopotential read in
partial core-charges read in
atomic valenz-charges read in
non local Contribution for L= 2 read in
real space projection operators read in
non local Contribution for L= 2 read in
real space projection operators read in
non local Contribution for L= 0 read in
real space projection operators read in
non local Contribution for L= 0 read in
real space projection operators read in
non local Contribution for L= 1 read in
real space projection operators read in
non local Contribution for L= 1 read in
real space projection operators read in
augmentation charges read in

number of l-projection operators is LMAX = 6
number of lm-projection operators is LMMAX = 18

VRHFIN =O: s2p4
LEXCH = 91
EATOM = 429.1268 eV, 31.5399 Ry
LULTRA = T use ultrasoft PP ?
IUNSCR = 0 unscreen: 0-lin 1-nonlin 2-no
RPACOR = .000 partial core radius
POMASS = 16.000; ZVAL = 6.000 mass and valenz
RCORE = 1.550 outmost cutoff radius
RWIGS = 1.400; RWIGS = .741 wigner-seitz radius (au A)
ENMAX = 395.994; ENMIN = 296.995 eV
EAUG = 700.000

ICORE = 2 local potential
LCOR = T correct aug charges
RMAX = 2.317 core radius for proj-oper
QCUT = -5.395; QGAM = 10.790 optimization parameters

0 .000 15 1.130 23 1.400
0 .000 15 1.130 23 1.400
1 .000 15 1.130 23 1.550
1 .000 15 1.130 23 1.550
2 .000 7 1.550 7 1.550
local pseudopotential read in
atomic valenz-charges read in
non local Contribution for L= 0 read in
real space projection operators read in
non local Contribution for L= 0 read in
real space projection operators read in
non local Contribution for L= 1 read in
real space projection operators read in
non local Contribution for L= 1 read in
real space projection operators read in
augmentation charges read in

number of l-projection operators is LMAX = 4
number of lm-projection operators is LMMAX = 8

US Zn :
energy of atom 1 EATOM=-1489.7187
kinetic energy error for atom= 0.0017 (will be added to EATOM!!)
US O :
energy of atom 2 EATOM= -429.1268
kinetic energy error for atom= 0.0618 (will be added to EATOM!!)

EXHCAR: internal setup
exchange correlation table for LEXCH = 7
RHO(1)= 0.500 N(1) = 2000
RHO(2)= 100.500 N(2) = 4000

POSCAR: ZnO: P63mc
positions in direct lattice
No initial velocities read in


ion position nearest neighbor table
1 0.333 0.667 0.000- 3 2.03 4 2.06 4 2.06 4 2.06 2 3.33 2 3.33 2 3.33 2 3.33
2 3.33 2 3.33
2 0.667 0.333 0.500- 4 2.03 3 2.06 3 2.06 3 2.06 1 3.33 1 3.33 1 3.33 1 3.33
1 3.33 1 3.33
3 0.333 0.667 0.375- 1 2.03 2 2.06 2 2.06 2 2.06
4 0.667 0.333 0.875- 2 2.03 1 2.06 1 2.06 1 2.06

LATTYP: Found a hexagonal cell.
ALAT = 3.3750134340
C/A-ratio = 1.6022936222

Lattice vectors:

A1 = ( 1.6875000000, -2.9228512500, 0.0000000000)
A2 = ( 1.6875000000, 2.9228512500, 0.0000000000)
A3 = ( 0.0000000000, 0.0000000000, 5.4077625000)
Subroutine PRICEL returns:
Original cell was already a primitive cell.

Analysis of symmetry for initial positions (statically):

Routine SETGRP: Setting up the symmetry group for a
hexagonal supercell.

Subroutine GETGRP returns: Found 4 space group operations
(whereof 2 operations were pure point group operations)
out of a pool of 24 trial point group operations.

The static configuration has the point symmetry C_1h.
The point group associated with its full space group is C_2v.

Analysis of symmetry for dynamics (positions and initial velocities):

Subroutine DYNSYM returns: Found 4 space group operations
(whereof 2 operations were pure point group operations)
out of a pool of 4 trial space group operations
(whereof 2 operations were pure point group operations)
and found also 1 'primitive' translations

The dynamic configuration has the point symmetry C_1h.
The point group associated with its full space group is C_2v.

KPOINTS: ZnO P63mc k_scan

Automatic generation of k-mesh.

Subroutine IBZKPT returns following result:

Found 216 irreducible k-points:

please check whether -Dessl is conflicting the additional link to the vasp.4.lib version of the lapack_double: the calling sequence of some parameters of the LAPACK included in the ESSL deviates from the standard calling sequence (mkl-lapack, lapack_double)
