Convergence of Parallel Architectures. Recap of Lecture 1 Parallel Comp. Architecture driven by familiar technological and economic forces –application/platform.

Презентация:



Advertisements
Похожие презентации
Lecture # Computer Architecture Computer Architecture = ISA + MO ISA stands for instruction set architecture is a logical view of computer system.
Advertisements

Loader Design Options Linkage Editors Dynamic Linking Bootstrap Loaders.
Designing Network Management Services © 2004 Cisco Systems, Inc. All rights reserved. Designing the Network Management Architecture ARCH v
Flynns Architecture. SISD (single instruction and single data stream) SIMD (single instruction and multiple data streams) MISD (Multiple instructions.
© 2006 Cisco Systems, Inc. All rights reserved. BCMSN v Introducing Campus Networks Network Requirements.
© 2006 Cisco Systems, Inc. All rights reserved. BSCI v Implementing IPv6 Defining IPv6 Addressing.
© 2002 IBM Corporation Confidential | Date | Other Information, if necessary November 4, 2014 Copyright © 2006 Eclipse Foundation, Inc., Made available.
© 2005 Cisco Systems, Inc. All rights reserved.INTRO v Building a Simple Serial Network Understanding the OSI Model.
© 2006 Cisco Systems, Inc. All rights reserved.ONT v Describe Cisco VoIP Implementations Implementing Voice Support in an Enterprise Network.
© 2005 Cisco Systems, Inc. All rights reserved.INTRO v Building a Simple Ethernet Network Understanding How an Ethernet LAN Works.
Designing Virtual Private Networks © 2004 Cisco Systems, Inc. All rights reserved. Designing Site-to-Site VPNs ARCH v
Copyright 2003 CCNA 3 Chapter 5 Switching Concepts By Your Name.
Using Information Technology Chapter 1 Introduction to Information Technology.
© 2009 Avaya Inc. All rights reserved.1 Chapter Two, Voic Pro Components Module Two – Actions, Variables & Conditions.
HPC Pipelining Parallelism is achieved by starting to execute one instruction before the previous one is finished. The simplest kind overlaps the execution.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v Route Selection Using Policy Controls Applying Route-Maps as BGP Filters.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v Optimizing BGP Scalability Implementing BGP Peer Groups.
1 Where is the O(penness) in SaaS? Make sure youre ready for the next wave … Jiri De Jagere Senior Solution Engineer, Progress Software Session 123.
Designing Enterprise Edge Connectivity © 2004 Cisco Systems, Inc. All rights reserved. Designing the Internet Connectivity Module ARCH v
© 2006 Cisco Systems, Inc. All rights reserved. MPLS v MPLS Concepts Identifying MPLS Applications.
Транксрипт:

Convergence of Parallel Architectures

Recap of Lecture 1 Parallel Comp. Architecture driven by familiar technological and economic forces –application/platform cycle, but focused on the most demanding applications –hardware/software learning curve More attractive than ever because best building block - the microprocessor - is also the fastest BB. History of microprocessor architecture is parallelism –translates area and denisty into performance The Future is higher levels of parallelism –Parallel Architecture concepts apply at many levels –Communication also on exponential curve => Quantitative Engineering approach New Applications More Performance Speedup

Application Software System Software SIMD Message Passing Shared Memory Dataflow Systolic Arrays Architecture History Parallel architectures tied closely to programming models –Divergent architectures, with no predictable pattern of growth. –Mid 80s rennaisance

Plan for Today Look at major programming models –where did they come from? –The 80s architectural rennaisance! –What do they provide? –How have they converged? Extract general structure and fundamental issues Reexamine traditional camps from new perspective (next week) SIMD Message Passing Shared Memory Dataflow Systolic Arrays Generic Architecture

Programming Model Conceptualization of the machine that programmer uses in coding applications –How parts cooperate and coordinate their activities –Specifies communication and synchronization operations Multiprogramming –no communication or synch. at program level Shared address space –like bulletin board Message passing –like letters or phone calls, explicit point to point Data parallel: –more regimented, global actions on data –Implemented with shared address space or message passing

Shared Memory => Shared Addr. Space Bottom-up engineering factors Programming concepts Why its attactive.

Adding Processing Capacity Memory capacity increased by adding modules I/O by controllers and devices Add processors for processing! – For higher-throughput multiprogramming, or parallel programs

Historical Development Mainframe approach –Motivated by multiprogramming –Extends crossbar used for Mem and I/O –Processor cost-limited => crossbar –Bandwidth scales with p –High incremental cost use multistage instead Minicomputer approach –Almost all microprocessor systems have bus –Motivated by multiprogramming, TP –Used heavily for parallel computing –Called symmetric multiprocessor (SMP) –Latency larger than for uniprocessor –Bus is bandwidth bottleneck caching is key: coherence problem –Low incremental cost

Shared Physical Memory Any processor can directly reference any memory location Any I/O controller - any memory Operating system can run on any processor, or all. – OS uses shared memory to coordinate Communication occurs implicitly as result of loads and stores What about application processes?

Shared Virtual Address Space Process = address space plus thread of control Virtual-to-physical mapping can be established so that processes shared portions of address space. –User-kernel or multiple processes Multiple threads of control on one address space. –Popular approach to structuring OSs –Now standard application capability (ex: POSIX threads) Writes to shared address visible to other threads – Natural extension of uniprocessors model – conventional memory operations for communication – special atomic operations for synchronization also load/stores

Structured Shared Address Space Add hoc parallelism used in system code Most parallel applications have structured SAS Same program on each processor –shared variable X means the same thing to each thread

Engineering: Intel Pentium Pro Quad –All coherence and multiprocessing glue in processor module –Highly integrated, targeted at high volume –Low latency and bandwidth

Engineering: SUN Enterprise Proc + mem card - I/O card –16 cards of either type –All memory accessed over bus, so symmetric –Higher bandwidth, higher latency bus

Scaling Up –Problem is interconnect: cost (crossbar) or bandwidth (bus) –Dance-hall: bandwidth still scalable, but lower cost than crossbar latencies to memory uniform, but uniformly large –Distributed memory or non-uniform memory access (NUMA) Construct shared address space out of simple message transactions across a general-purpose network (e.g. read- request, read-response) –Caching shared (particularly nonlocal) data? MMM M MM Network P $ P $ P $ P $ P $ P $ Dance hallDistributed memory

Engineering: Cray T3E –Scale up to 1024 processors, 480MB/s links –Memory controller generates request message for non-local references –No hardware mechanism for coherence SGI Origin etc. provide this

SIMD Message Passing Shared Memory Dataflow Systolic Arrays Generic Architecture M MM Network P $ P $ P $

Message Passing Architectures Complete computer as building block, including I/O –Communication via explicit I/O operations Programming model –direct access only to private address space (local memory), –communication via explicit messages (send/receive) High-level block diagram –Communication integration? Mem, I/O, LAN, Cluster –Easier to build and scale than SAS Programming model more removed from basic hardware operations –Library or OS intervention M MM Network P $ P $ P $

Message-Passing Abstraction –Send specifies buffer to be transmitted and receiving process –Recv specifies sending process and application storage to receive into –Memory to memory copy, but need to name processes –Optional tag on send and matching rule on receive –User process names local data and entities in process/tag space too –In simplest form, the send/recv match achieves pairwise synch event Other variants too –Many overheads: copying, buffer management, protection Process PProcessQ Address Y AddressX SendX, Q, t ReceiveY, P, t Match Local process address space Local process address space

Evolution of Message-Passing Machines Early machines: FIFO on each link –HW close to prog. Model; –synchronous ops –topology central (hypercube algorithms) CalTech Cosmic Cube (Seitz, CACM Jan 95)

Diminishing Role of Topology Shift to general links –DMA, enabling non-blocking ops Buffered by system at destination until recv –Store&forward routing Diminishing role of topology –Any-to-any pipelined routing –node-network interface dominates communication time –Simplifies programming –Allows richer design space grids vs hypercubes H x (T 0 + n/B) vs T0 + H + n/B Intel iPSC/1 -> iPSC/2 -> iPSC/860

Example Intel Paragon

Building on the mainstream: IBM SP-2 Made out of essentially complete RS6000 workstations Network interface integrated in I/O bus (bw limited by I/O bus)

Berkeley NOW 100 Sun Ultra2 workstations Inteligent network interface –proc + mem Myrinet Network –160 MB/s per link –300 ns per hop

Toward Architectural Convergence Evolution and role of software have blurred boundary –Send/recv supported on SAS machines via buffers –Can construct global address space on MP (GA -> P | LA) –Page-based (or finer-grained) shared virtual memory Hardware organization converging too –Tighter NI integration even for MP (low-latency, high-bandwidth) –Hardware SAS passes messages Even clusters of workstations/SMPs are parallel systems –Emergence of fast system area networks (SAN) Programming models distinct, but organizations converging –Nodes connected by general network and communication assists –Implementations also converging, at least in high-end machines