A scalable integral direct, distributed-data parallel algorithm for four-index transformation is presented. The algorithm was implemented in the context of the second-order Moller-Plesset (MP2) energy evaluation, yet it is easily adopted for other electron correlation methods, where only MO integrals with two indices in the virtual orbitals space are required. The major computational steps of the MP2 energy are the two-electron integral evaluation O(N-4) and transformation into the MO basis O(ON4) where N is the number of basis functions, and O the number of occupied orbitals, respectively. The associated maximal communication costs scale as O(n(Sigma)O(2)V N), where V and n(Sigma) denote the number of virtual orbitals, and the number of symmetry-unique shells. The largest local and global memory requirements are Co(N-2) for the MO coefficients and O(OV N) for the three-quarter transformed integrals, respectively. Several aspects of the implementation such as symmetry-treatment, integral prescreening, and the distribution of data and computational tasks are discussed. The parallel efficiency of the algorithm is demonstrated by calculations on the phenanthrene molecule, with 762 primitive Gaussians, contracted to 412 basis functions. The calculations were performed on an IBM SP2 with 48 nodes. The measured wall clock time on 48 nodes is less than 15 min for this calculation, and the speedup relative to single-node execution is estimated to 527. This superlinear speedup is a result of exploiting both the compute power and the aggregate memory of the parallel computer. The tatter reduces the number of passes through the AO integral list, and hence the operation count of the calculation. The test calculations also show that the evaluation of the two-electron integrals dominates the calculation, despite the higher scaling of the transformation step.
1997. Vol. 95, no 1-2, 13-34 p.