Sergey Blagodurov, Simon Fraser University
The Contention Aware Scheduling in HPC Cluster Environment Contention for shared resources in High-Performance Computing (HPC) clusters occurs when jobs are concurrently executing on the same multicore node (there is a contention for allocated CPU time, shared caches, memory bus, memory controllers, etc.) and when jobs are concurrently accessing cluster interconnects as their processes communicate data between each other. The cluster network also has to be used by the cluster scheduler in a virtualized environment to migrate job virtual machines across the nodes. The contention for cluster shared resources incurs severe degradation to workload performance and stability and hence must be addressed. The state-of-the-art HPC cluster schedulers, however, are not contention-aware. The goal of this work is the design, implementation and evaluation of an HPC scheduling framework that is contention aware.
Leslie Groer, SciNet High-Performance Computing Consortium, University of Toronto
Evolution in the ATLAS Distributed Computing Operation, Management and Tools
The ATLAS experiment at the Large Hadron Collider (LHC) facility in Geneva, Switzerland started collecting high-energy proton-proton collision data in 2010. The computing platforms supporting the analysis of the data have been in development and production for several more years as part of the greater Worldwide LHC Computing Grid and ATLAS Software and Distributed Computing efforts. As the run has progressed, there have been significant development and improvement in the various tools underpinning the enormous undertaking in recording, storing and transporting the petabytes of data that are produced each year and in managing the computing efforts of thousands of researchers across the globe as they have ramped up their analyses with the collection of real data. Aspects of these tools and their evolution will be presented, especially in the context of operating the Canadian ATLAS computing facilities. s
General: Faculty, staff, students
Thursday, May 3rd
2:20 – 3:30 p.m.
Martin Siegert, WestGrid Site Lead, Simon Fraser University