Скачать презентацию
Идет загрузка презентации. Пожалуйста, подождите
Презентация была опубликована 9 лет назад пользователемОксана Каманина
1 11/4/20141 The Current State, Trends, and Future of Supercomputing Jack Dongarra University of Tennessee Oak Ridge National Laboratory
2 Overview Computational Science High Performance Computing Projections for the Future 2
3 3 Simulation: The Third Pillar of Science Simulation: The Third Pillar of Science Traditional scientific and engineering paradigm: 1)Do theory or paper design. 2)Perform experiments or build system. Limitations: Too difficult -- build large wind tunnels. Too expensive -- build a throw-away passenger jet. Too slow -- wait for climate or galactic evolution. Too dangerous -- weapons, drug design, climate experimentation. Computational science paradigm: 3)Use high performance computer systems to simulate the phenomenon Base on known physical laws and efficient numerical methods.
4 Computational Science Fuses Three Distinct Elements: 4
5 Look at the Fastest Computers Supercomputing Matters Essential for scientific discovery Critical for national security Fundamental contributor to the economy and competitiveness through use in engineering and manufacturing Supercomputers are the tool for solving the most challenging problems through simulations 5
6 6 H. Meuer, H. Simon, E. Strohmaier, & JD - Listing of the 500 most powerful Computers in the World - Yardstick: Rmax from LINPACK MPP Ax=b, dense problem - Updated twice a year SCxy in the States in November Meeting in Germany in June - All data available from org Size Rate TPP performance
7 Top50 Supercomputers Russia 7
8 Top100 Supercomputers China 8
9 Top100 Supercomputers India 9
10 Performance Development 6-8 years My Laptop
11 Gflop/s Tflop/s Pflop/s Eflop/s Cray 2 1 Gflop/s O(1) Thread ASCI Red 1 Tflop/s O(10 3 ) Threads RoadRunner 1.1 Pflop/s O(10 6 ) Threads 1 Eflop/s O(10 9 ) Threads ~8 Hours~1 Year~1000 Year~1 Min.
12 Processors Used in Supercomputers Intel 71% AMD 13% IBM 7%
13 How are the Processors Connected? 13 Percent
14 Efficiency
15 Countries / System Share 58% 9% 5% 4% 3% 2% 1%
16 Russian Top500 Systems 16 RankSiteComputerCores Rmax 35 Joint Supercomputer CenterHP Cluster 3000 Xeon 3 GHz, Iband Moscow State University SKIF/T-Platforms T60, Intel quad core 3 MHz, Iband Kurchatov Institute Moscow HP Cluster 3000 Xeon 2.33GHz, Iband Moscow State UniversityIBM Blue Gene/P Solution Ufa State Aviation Technical University IBM HS21 Cluster, Xeon quad core 2.33 GHz, Infiniband Vyatsky State University HP Cluster 3000 Xeon 2.33GHz, Iband Roshydromet SGI Altix ICE 8200, Xeon quad core 2.83 GHz Siberian National University IBM HS21 Cluster, Xeon quad core 2.33 GHz, Iband
17 Customer Segments
18 18 Industrial Use of Supercomputers Of the 500 Fastest Supercomputer Worldwide, Industrial Use is > 60% Aerospace Automotive Biology CFD Database Defense Digital Content Creation Digital Media Electronics Energy Environment Finance Gaming Geophysics Image Proc./Rendering Information Processing Service Information Service Life Science Media Medicine Pharmaceutics Research Retail Semiconductor Telecomm Weather and Climate Research Weather Forecasting
19 Distribution of the Top500 Rmax (Tflop/s) Rank 19 systems > 100 Tflop/s 51 systems > 50 Tflop/s 12.6 Tflop/s 1.1 Pflop/s 2 systems > 1 Pflop/s 8 Russian Supercomputers on the list 267 systems replaced last time 119 systems > 25 Tflop/s ORNL UT
20 32 nd List: The TOP10
22 LANL Roadrunner A Petascale System in 2008 Connected Unit cluster 192 Opteron nodes (180 w/ 2 dual-Cell blades connected w/ 4 PCIe x8 links) 13,000 Cell HPC chips 1.33 PetaFlop/s (from Cell) 7,000 dual-core Opterons 122,000 cores 17 clusters 2 nd stage InfiniBand 4x DDR interconnect (18 sets of 12 links to 8 switches) 2 nd stage InfiniBand interconnect (8 switches) Based on the 100 Gflop/s (DP) Cell chip Hybrid Design (2 kinds of chips & 3 kinds of cores) Programming required at 3 levels. Dual Core Opteron Chip Cell chip for each core
23 ORNLs Newest System Jaguar XT5 Office of Science The systems will be combined after acceptance of the new XT5 upgrade. Each system will be linked to the file system through 4x-DDR Infiniband JaguarTotalXT5XT4 Peak Performance1,6451, AMD Opteron Cores181,504150, ,328 System Memory (TB) Disk Bandwidth (GB/s) Disk Space (TB)10,75010, Interconnect Bandwidth (TB/s) Center (40,000 ft 2 ~ 3700 m 2 ) Upgrading power to 15 MW Deploying a 6,600 ton chiller plant Tripling UPS and generator capability
24 s HPC System University of Tennessees National Institute for Computational Sciences Housed at ORNL, operated for the NSF, named Kraken Today: Cray XT5 (608 TF) + Cray XT4 (167 TF) XT5: 16,512 sockets, 66,048 cores XT4: 4,512 sockets, 18,048 cores Number 15 on the Top500 Later 2009: upgrading to 1 Pflop/s
25 ORNL/UTK Computer Power Cost Projections Over the next 5 years ORNL/UTK will deploy 2 large Petascale systems Using 15 MW today By 2012 close to 50MW!! Power costs close to $10M today. Cost estimates based on $0.07 per KwH Cost Per Year Power becomes the architectural driver for future large systems > $10M > $20M > $30M
26 26 Power is an Industry Wide Problem Hiding in Plain Sight, Google Seeks More Power, by John Markoff, June 14, 2006 Google Plant in The Dalles, Oregon, from NYT, June 14, 2006 Google facilities leveraging hydroelectric power old aluminum plants Microsoft and Yahoo are building big data centers upstream in Wenatchee and Quincy, Wash. – To keep up with Google, which means they need cheap electricity and readily accessible data networking Microsoft Quincy, Wash. 470,000 Sq Ft, 47MW!
27 Somethings Happening Here… In the old days it was: each year processors would become faster Today the clock speed is fixed or getting slower Things are still doubling every months Moores Law reinterpretated. Number of cores double every months From K. Olukotun, L. Hammond, H. Sutter, and B. Smith A hardware issue just became a software problem
28 Moores Law Reinterpreted Number of cores per chip doubles every 2 year, while clock speed decreases (not increases). Need to deal with systems with millions of concurrent threads Future generation will have billions of threads! Need to be able to easily replace inter- chip parallelism with intro-chip parallelism Number of threads of execution doubles every 2 year
29 29 Power Cost of Frequency Power Voltage 2 x Frequency (V 2 F) Frequency Voltage Power Frequency 3
30 30 Power Cost of Frequency Power Voltage 2 x Frequency (V 2 F) Frequency Voltage Power Frequency 3
31 Todays Multicores 98% of Top500 Systems Are Based on Multicore 31 Sun Niagra2 (8 cores) Intel Polaris (80 cores) IBM BG/P (4 cores) AMD Opteron (4 cores) IBM Cell (9 cores) Intel Clovertown (4 cores) SciCortex (6 cores) 282 use Quad-Core 204 use Dual-Core 3 use Nona-core
32 Cores Per Socket 4 cores: 67% 2 cores: 31% 9 cores: 7 systems Single core: 4 systems
33 Whats Next? SRAM Many Floating- Point Cores All Large Core Mixed Large and Small Core All Small Core Many Small Cores Different Classes of Chips Home Games / Graphics Business Scientific Different Classes of Chips Home Games / Graphics Business Scientific + 3D Stacked Memory
34 And then theres the GPGPUs NVIDIAs Tesla T10P T10P chip 240 cores; 1.5 GHz Tpeak 1 Tflop/s - 32 bit floating point Tpeak 100 Gflop/s - 64 bit floating point S1070 board 4 - T10P devices; 700 Watts GTX – T10P; 1.3 GHz Tpeak 864 Gflop/s - 32 bit floating point Tpeak 86.4 Gflop/s - 64 bit floating point 34
35 35 Intels Larrabee Chip Many X 86 IA cores Scalable to Tflop/s New cache architecture New vector instructions set Vector memory operations Conditionals Integer and floating point arithmetic New vector processing unit / wide SIMD
36 Architecture of Interest Manycore chip Composed of hybrid cores Some general purpose Some graphics Some floating point 36
37 Architecture of Interest Board composed of multiple chips sharing memory 37 Memory
38 Architecture of Interest Rack composed of multiple boards 38 Memory
39 Architecture of Interest A room full of these racks Think millions of cores 39 Memory
40 Moores Law Reinterpreted Number of cores per chip doubles every 2 year, while clock speed decreases (not increases). Need to deal with systems with millions of concurrent threads Future generation will have billions of threads! Need to rethink the design of our software Very disruptive technology Number of threads of execution doubles every 2 year
41 Five Important Features to Consider When Computing at Scale Effective Use of Many-Core and Hybrid architectures Dynamic Data Driven Execution Block Data Layout Exploiting Mixed Precision in the Algorithms Single Precision is 2X faster than Double Precision With GP-GPUs 10x Self Adapting / Auto Tuning of Software Too hard to do by hand Fault Tolerant Algorithms With 1,000,000s of cores things will fail Communication Avoiding Algorithms For dense computations from O(n log p) to O(log p) communications GMRES s-step compute ( x, Ax, A 2 x, … A s x ) 41
42 Exascale Computing Exascale systems (10 18 Flop/s) are likely feasible by 2017± Million processing elements (cores or mini-cores) with chips perhaps as dense as 1,000 cores per socket, clock rates will grow more slowly 3D packaging likely Large-scale optics based interconnects PB of aggregate memory > 10,000s of I/O channels to Exabytes of secondary storage, disk bandwidth to storage ratios not optimal for HPC use Hardware and software based fault management Achievable performance per watt will likely be the primary measure of progress 42
43 Conclusions 43 Moores Law Reinterpreted Number of cores per chip doubles every two year, while clock speed roughly stable Threads of execution double every 2 years 100 M cores Need to deal with systems with millions of concurrent threads Future generation will have billions of threads! MPI and programming languages from the 60s will not make it Power limiting clock rate growth Power becomes the architectural driver for Exescale systems.
44 Conclusions For the last decade or more, the research investment strategy has been overwhelmingly biased in favor of hardware. This strategy needs to be rebalanced - barriers to progress are increasingly on the software side. Moreover, the return on investment is more favorable to software. Hardware has a half-life measured in years, while software has a half-life measured in decades. High Performance Ecosystem out of balance Hardware, OS, Compilers, Software, Algorithms, Applications No Moores Law for software, algorithms and applications
45 33 Collaborators / Support Top500 –Hans Meuer, Prometeus –Erich Strohmaier, LBNL/NERSC –Horst Simon, LBNL/NERSC
46 46 Weather and Economic Loss We now over-warn by a factor of 3 Average over-warning is 200 miles, or $200M per event Improved forecasts saving lives and resources Source: Kelvin Droegemeier, Oklahoma 40% of the $14T U.S. economy is impacted by weather and climate $1M in economic loss to evacuate each 1 mile of coastline
Еще похожие презентации в нашем архиве:
© 2024 MyShared Inc.
All rights reserved.