Non quia difficilia sunt non audemus, sed quia non audemus difficilia sunt
Home -> Publications
Home
  Publications
    
edited volumes
  Awards
  Research
  Teaching
  Miscellaneous
  Full CV [pdf]
  BLOG






  Events








  Past Events





Publications of Torsten Hoefler
Torsten Hoefler:

 Evaluation of publicly available Barrier-Algorithms and Improvement of the Barrier-Operation for large-scale Cluster-Systems with special Attention on InfiniBand Networks

(presented in Chemnitz, Germany, Apr. 2005)
TU Chemnitz Best Student Award, 2005

Abstract

The MPI Barrier-collective operation, as a part of the MPI-1.1 standard, is extremely important for all parallel applications using it. The latency of this operation increases the application run time and can not be overlaid. Thus, the whole MPI performance can be decreased by unsatisfactory barrier latency. The main goals of this work are to lower the barrier latency for InfiniBandTM networks by analyzing well known barrier algorithms with regards to their suitability within InfiniBandTM networks, to enhance the barrier operation by utilizing standard InfiniBandTM operations as much as possible, and to design a constant time barrier for InfiniBandTM with special hardware support. This partition into three main steps is retained throughout the whole thesis. The first part evaluates publicly known models and proposes a new more accurate model (LoP) for InfiniBandTM . All barrier algorithms are evaluated within the well known LogP and this new model. Two new algorithms which promise a better performance have been developed. A constant time barrier integrated into InfiniBandTM as well as a cheap separate barrier network is proposed in the hardware section. All results have been implemented inside the Open MPI framework. This work led to three new Open MPI collective modules. The first one implements different barrier algorithms which are dynamically benchmarked and selected during the startup phase to maximize the performance. The second one offers a special barrier implementation for InfiniBandTM with RDMA and performs up to 40% better than the best solution that has been published so far. The third implementation offers a constant time barrier in a separate network, leveraging commodity components, with a latency of only 2.5µs. All components have their specialty and can be used to enhance the barrier performance significantly.

Documents


 

BibTeX

@masterthesis{hoefler-thesis-05,
  author={Torsten Hoefler},
  title={{Evaluation of publicly available Barrier-Algorithms and Improvement of the Barrier-Operation for large-scale Cluster-Systems with special Attention on InfiniBand Networks}},
  institution={Technical University of Chemnitz},
  year={2005},
  month={Apr.},
  location={Chemnitz, Germany},
  source={http://www.unixer.de/~htor/publications/},
}


serving: 3.144.172.81:1165© Torsten Hoefler