S. Petersburg, ICCS 2003 Linear Algebra Computation on a Model Grid Platform Carlo Manuali [email protected] Centro d’Ateneo per i Servizi Informatici (C.A.S.I.) University of Perugia, Italy in collaboration with: Loriano Storchi Osvaldo Gervasi Giuseppe Vitillaro Antonio Laganà Francesco Tarantelli Carlo Manuali – [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] Centro d’Ateneo per i Servizi Informatici (C.A.S.I.) – University of Perugia, Italy Summary 1. Objective Customization of Globus Software Toolkit 2 for a Grid infrastructure based on Beowulf clusters 2. Contents a) b) c) d) The platform topology The multilevel process communication strategy Globus, MPI Communication Benchmarks, Computational Tests 01 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform The Grid Computing Grid Computing 02 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform The Globus Toolkit 2 http://www.globus.org 03 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform A model computing Grid ü Centralized installation of the Globus software into a NFS shared directory ü Implementation of Globus and MPICH-G2 for Grid management ü Modification of the LAM/MPI broadcast implementation 04 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform A model computing Grid LAN 100 Mb ATM WAN 16 Mb 05 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform A model computing Grid a dedicate node called front-end for each cluster • NIS, NFS and automount service • /usr/local is exported via NFS • All nodes access Globus in /usr/local/globus 06 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform A MDS Centralized Installation • MDS informations in the custom directory /usr/local/globus/etc/nodes • Customization of SXXgris command export sysconfdir=/usr/local/globus/etc/nodes/‘hostname‘ • grid-info-site-policy.conf policydata: (&(Mds-Service-hn=*.IP_domain)(Mds-Service-port=2135)) • grid-info-resource-register.conf reghn: hn: GIIS-server.IP_domain GRIS-server.IP_domain 07 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform MPICH-G2 and LAM-MPI • LAM/MPI (version 6.5.6) • Compilation of Globus Resource Management SDK with the mpi flavor 08 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform MPICH-G2 and LAM-MPI • “defines“ which are missing in the include file mpi.h of LAM/MPI 09 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform MPICH-G2 and LAM-MPI • The $GLOBUS_GRAM_JOB_MANAGER_MPIRUN variable point to the following script (mpigrun) which replace the standard mpirun command 10 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform MPICH-G2 and LAM-MPI ü ü ü ü ü ü ü ü ü ü ü ü ü 11 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform Topology-aware functions: broadcast models MPICH-G2 lv0 TCP-WAN lv1 TCP-LAN LAM/MPI provides for communication via TCP/IP among nodes in a dedicated network or via shared-memory lv2 TCP-Intra machine lv3 v-MPI 12 Carlo Manuali – ICCS 2003 • two point-to-point communication levels: - inter-cluster communication (lv0) - intra-cluster communication (lv3) Linear Algebra Computation Benchmarks on a Model Grid Platform Topology-aware functions: broadcast models • Comparison of three different broadcast methods (i) The broadcast operation provided by MPICH-G2 (ii) An optimized topology-aware broadcast of our own implementation (iii) A no-topology-aware broadcast 13 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform Topology-aware functions: broadcast models A typical broadcast: • HPC: p0 - p7 • GRID: p8 - p15 • GIZA: p16 - p23 At local level: • LAM/MPI uses non-blocking (asynchronous) send operations • We opted for blocking (synchronous) send operations 14 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform Topology-aware functions: broadcast models T= basic transmission time step • Asynchronous broadcast is completed in 6T • Our version takes 3T 15 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform Broadcast tests • ~39 Mb of data • T = 3.4s • Link speed between HPC or GRID and GIZA = 0.8 Mb/s Bcast_Time = WAN_inter-cluster_Bcast_T + Local_intra-cluster_Bcast_T WAN_inter-cluster_Bcast_T = ~49s Local_intra-cluster_Bcast_T = 3T = Local_intra-cluster_Bcast_T = 6T = ~10s ~20s (a) (b) (a) Optimized broadcast takes about 10 seconds (b) LAM/MPI broadcast takes about 20 seconds 16 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform Broadcast tests • Comparison with a no-topology-aware broadcast 1 2 3 In the last one the dominance of the long distance transfers is evident 17 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform Broadcast tests 7 intra-cluster data tranfers 16 inter-cluster data tranfers 18 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform Linear Algebra Benchmarks • BLAS and LAPACK at the local level • BLACS on top of MPICH-G2 and PBLAS • ScaLAPACK 19 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform Linear Algebra Benchmarks • Tests have been run with PDGEMM (PBLAS routines) • Effective speed of 2.5Gflops (20000 by 20000) • 70% of performance deterioration exchanging rows with columns 20 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform MFLOPS Linear Algebra Benchmarks • Speed varies from 2.52 Gflops to 2.55 Gflops for block sizes from 64 (red line) to 256 (blue line) • Top speed reached for a block size of 160 (green line) N 21 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform Conclusion • Two Globus communication levels using MPI on a model Grid made up of three workstation cluster • Comparison of Two-level implementation of broadcast with a binary-tree • Parallel linear algebra kernels to exploit the two communication levels 22 Carlo Manuali – ICCS 2003 Linear Algebra Computation Benchmarks on a Model Grid Platform