next up previous contents
Next: 7.13 pw.x runs but Up: 7 Troubleshooting Previous: 7.11 pw.x works for Contents

7.12 pw.x crashes in parallel execution with an obscure message related to MPI errors

Random crashes due to MPI errors have often been reported, typically in Linux PC clusters. We cannot rule out the possibility that bugs in QUANTUM ESPRESSO cause such behavior, but we are quite confident that the most likely explanation is a hardware problem (defective RAM for instance) or a software bug (in MPI libraries, compiler, operating system).

Debugging a parallel code may be difficult, but you should at least verify if your problem is reproducible on different architectures/software configurations/input data sets, and if there is some particular condition that activates the bug. If this doesn't seem to happen, the odds are that the problem is not in QUANTUM ESPRESSO. You may still report your problem, but consider that reports like "it crashes with...(obscure MPI error)" contain 0 bits of information and are likely to get 0 bits of answers.

Concerning MPI libraries in particular, useful information can be found in Axel's web site:
http://www.theochem.rub.de/ axel.kohlmeyer/cpmd-linux.html, and in the following message by Javier Antonio Montoya:
http://www.democritos.it/pipermail/pw_forum/2008-April/008818.html


next up previous contents
Next: 7.13 pw.x runs but Up: 7 Troubleshooting Previous: 7.11 pw.x works for Contents
Paolo Giannozzi 2009-10-01