The Architecture for Discovery in a Parallel Universe

**Diane Bryant** Vice President, Intel Corporation General Manager, Datacenter & Connected Systems Group 

## Uncharted Territory on Path to Discovery In Science and Engineering





# HPC: Not an Optional Investment







## **FINANCIAL** ANALYSES



DIGITAL CONTENT **CREATION** 

# To Compute.. You Must Have The RIGHT ARCHITECTURE IT'S THE LAW...

"The speedup of a program using Multiple processors in parallel computing is limited by the sequential fraction of the program." Gene Amdahl



Number of Processors

## Xeon: Most Commonly Used Parallel Processor



Parallel, Fast Serial Multicore + Vector 2X Cadence Through Haswell Leadership Today and Tomorrow

**Parallel Features From** Intel<sup>®</sup> Xeon<sup>®</sup> E5 Processors Make It Ideal For Most HPC Applications



■ 0.00-1.00 ■ 1.00-2.00 ■ 2.00-3.00 ■ 3.00-4.00 ■ 4.00-5.00 ■ 5.00-6.00 ■ 6.00-7.00

\* Theoretical acceleration of a highly parallel processor over a Intel® Xeon® parallel processor (<1 Intel® Xeon® faster





■ 0.00-1.00 ■ 1.00-2.00 ■ 2.00-3.00 ■ 3.00-4.00 ■ 4.00-5.00 ■ 5.00-6.00 ■ 6.00-7.00 ■ 7.00-8.00







Application Algorithms Improvement Increasing The Number Of HIGHLY PARALLEL APPLICATIONS









# Highly Parallel Applications and Processors



**Optimized for Highly Parallel** Many Core Wider SIMD16 Vector instructions Up to 8X increase in Theoretical Performance Designed for Reliability In Large Systems

> It's the Highly Vectorizable **Applications that Benefit from Highly Parallel Architecture**



8.00

■ 0.00-1.00 ■ 1.00-2.00 ■ 2.00-3.00 ■ 3.00-4.00 ■ 4.00-5.00 ■ 5.00-6.00 ■ 6.00-7.00 ■ 7.00-8.00





## Programming on **CPU and Coprocessor**

Unlike accelerators, optimizations for Intel<sup>®</sup> Xeon Phi<sup>™</sup> and Intel<sup>®</sup> Xeon<sup>®</sup> products share the same languages, directives, libraries, and tools. "Unmatched Productivity"

## **OpenMP\* TR**

Open, Standard, Supports Diverse Hardware Intel will support the OpenMP TR for targeting extensions in January 2013!







## pport Intel Xeon Phi

## based products allows for transparent prog



# Introducing the Intel® Xeon Phi™ Coprocessor Family



## Intel<sup>®</sup> Xeon Phi<sup>™</sup> **Coprocessor 3100 Family**

## Intel<sup>®</sup> Xeon Phi<sup>™</sup> Coprocessor 5100 Family



**Outstanding Parallel Computing Solution** 

Available first half of 2013 >1000 Gigaflops DP (peak) 6GB GDDR5 memory at 240 GB/s Active and Passive form factors at 300W TDP Less than \$2,000



Highly Parallel Computing Solution that is **Optimized for High Density Environments** 

> General Availability Jan 28 2013 Up to 1010 Gigaflop DP (peak) 8GB GDDR5 memory at 320 GB/s Passive form factor at 225W TDP \$2,649 RCP

# Myth busting – >100x Improvement in Performance

## Intel<sup>®</sup> Xeon Running Serial Code







## Intel<sup>®</sup> Xeon Phi<sup>™</sup> Parallelized Code









## **!\$OMP PARALLEL do PRIVATE(j,k)**

do i = 1, 20offset = i\*128do j = 1,5000000!dir\$ vector aligned do k = 1, 128fa(k+offset) = a \* fa(k+offset) + fb(k+offset)end do end do end do

## Same Code Improves Xeon Performance!





## Synthetic Benchmark (Intel<sup>®</sup> MKL) Measured on the TACC<sup>+</sup> Stampede Cluster<sup>3</sup>



Coprocessor results: Benchmark run 100% on coprocessor, no help from Intel® Xeon® processor host (aka native)

### Notes

- 1. Intel<sup>®</sup> Xeon<sup>®</sup> Processor E5-2680 used for all SGEMM Matrix = 12800 x 12800 , DGEMM Matrix 10752 x 10752, SMP Linpack Matrix 26000 x 26000
- 2. Intel® Xeon Phi™ coprocessor SE10P (ECC on) with "Gold" SW stack SGEMM Matrix = 12800 x 12800, DGEMM Matrix 12800 x 12800, SMP Linpack Matrix 26872 x 28672
- Average single-node results from measurements across a set of nodes from the TACC+ Stampede\* Cluster
- + Texas Advanced Computing Center (TACC) at the University of Texas at Austin

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Intel Measured on TACC cluster results as of October 25, 2012 Configuration Details: Please reference slide speaker notes. For more information go to http://www.intel.com/performance

## **Application Performance:** Intel<sup>®</sup> Xeon Phi<sup>™</sup> Coprocessor

## **Finite Element** Analysis

## Embree Raytracing

Seismic

Molecular **Dynamics** 

**Physics** 



SANDIA NATIONAL LABS MiniFE

UP TO **17**X



INTEL LABS RAYTRACING



ACCELEWARE **8TH ORDER ISOTROPIC** VARIABLE VELOCITY UP TO 2.05X





LOS ALAMOS **MOLECULAR DYNAMICS** 



**JEFFERSON LAB** LATTICE QCD

> UP TO **2.7**X

\* Xeon = Intel<sup>®</sup> Xeon<sup>®</sup> processor;

\* Xeon Phi = Intel<sup>®</sup> Xeon Phi<sup>™</sup> coprocessor

### Notes

- 1. 2S Intel® Xeon® processor X5690 vs. 2S Xeon\* + 1 Intel® Xeon Phi™ coprocessor (pre production HW/SW)
- 2. 2S Intel<sup>®</sup> Xeon<sup>®</sup> processor E5-2687 vs. 1 Intel<sup>®</sup> Xeon Phi<sup>™</sup> coprocessor (preproduction HW/SW) (960 versions of improved workload)
- 2S Intel® Xeon® processor E5-2680 vs. 1 Intel® Xeon Phi™ coprocessor (preproduction HW/SW)
- 4 node cluster, each node with 2S Intel® Xeon® processor E5-2867 (comparison is cluster performance with and without 1 pre-production Intel® Xeon Phi™ coprocessor per node
- Includes additional FLOPS from transcendental function unit

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Intel Measured results as of October 17, 2012 Configuration Details: Please reference slide speaker notes. For more information go to http://www.intel.com/performance







## **BLACKSCHOLES SP** UP TO 7) ( Monte Carlo SP UP TO **10.75X**

# **Discovery and Innovation**

## Efficiency

Streamline bringing New Ideas to light



## Programmability to Enable Scientific Discovery





# Welcome!



## **Bob Galush** Vice President, System x IBM



## **Dr. Daniel Duffy** Lead Systems Engineer NASA Center for Climate Simulation (NCCS) NASA Goddard Space Flight Center (GSFC)



# Welcome!



## **Paul Santeler** VP & GM, Hyperscale Business Unit / ISS Hewlett Packard Company



## **Lincoln Wallen** Chief Technology Officer DreamWorks Animation



# Welcome!



## **Brian Payne Executive Director, PowerEdge Server Marketing** Dell Inc.



Jay Boisseau, Ph.D Director **Texas Advanced Computing Center** 





## Developing Today on Intel<sup>®</sup> Xeon Phi<sup>™</sup> Coprocessors

Bright Computing

NESA

Platform

Computing

📰 😻 👻 重

Ohio Supercomputer Center

CD-adapco















energie atomique - energies alternatives

MSC Software<sup>®</sup> TECHNOLOGY PARTNER





CAPS



PEPPHER •





🛕 Altair

 $\cdots \mathbf{T} \cdots \mathbf{Systems}$ 

CSCS

Centro Svizzero di Calcolo Scientifico wiss National Supercomputing Centre

東京大学 THE UNIVERSITY OF TOKYO









# **Top 500 Highlights**





## Intel<sup>®</sup> Xeon<sup>®</sup> processor:

- 379 systems
- 91% of new listings
- Intel<sup>®</sup> Xeon<sup>®</sup> processor 2600 family Fastest growing CPU on list

## Intel<sup>®</sup> Xeon Phi<sup>™</sup> coprocessor:

- 7 systems listed!
- 2.6 Petaflops #7 TACC Stampede
- Outstanding efficiency up to 75%
- ...and...

Other brands and names are the property of their respective owners. Source: www.top500.org



## Supercomputer Solutions

Moving HPC Forward

## WORLD RECORD! "Beacon" at NICS Intel® Xeon® + Intel Xeon Phi™ Cluster Most Power Efficient on the List 2.449 GigaFLOPS / Watt 70.1% efficiency





Other brands and names are the property of their respective owners. Source: www.top500.org



# Where to Learn More Today



# Two new Intel<sup>®</sup> Xeon Phi<sup>™</sup> coprocessor families provide:

**Performance and Performance/Watt** For highly parallel HPC workloads with cores, threads, wide-simd, caches, memory BW

# While maintaining the advantages of Intel Architecture

General purpose programming environment advanced power management technology

First products shipping now General availability January 2013





# Parallelism is Your Path to the Future Intel is ...more than

ever.. Your Roadmap



# Intel®



# **Risk Factors**

The above statements and any others in this document that refer to plans and expectations for the fourth quarter, the year and the future are forward-looking statements that involve a number of risks and uncertainties. Words such as "anticipates," "expects," "intends," "plans," "believes," "seeks," "estimates," "may," "will," "should" and their variations identify forward-looking statements. Statements that refer to or are based on projections, uncertain events or assumptions also identify forward-looking statements. Many factors could affect Intel's actual results, and variances from Intel's current expectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements. Intel presently considers the following to be the important factors that could cause actual results to differ materially from the company's expectations. Demand could be different from Intel's expectations due to factors including changes in business and economic conditions, including supply constraints and other disruptions affecting customers; customer acceptance of Intel's and competitors' products; changes in customer order patterns including order cancellations; and changes in the level of inventory at customers. Uncertainty in global economic and financial conditions poses a risk that consumers and businesses may defer purchases in response to negative financial events, which could negatively affect product demand and other related matters. Intel operates in intensely competitive industries that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short term and product demand that is highly variable and difficult to forecast. Revenue and the gross margin percentage are affected by the timing of Intel product introductions and the demand for and market acceptance of Intel's products; actions taken by Intel's competitors, including product offerings and introductions, marketing programs and pricing pressures and Intel's response to such actions; and Intel's ability to respond quickly to technological developments and to incorporate new features into its products. The gross margin percentage could vary significantly from expectations based on capacity utilization; variations in inventory valuation, including variations related to the timing of qualifying products for sale; changes in revenue levels; segment product mix; the timing and execution of the manufacturing ramp and associated costs; start-up costs; excess or obsolete inventory; changes in unit costs; defects or disruptions in the supply of materials or resources; product manufacturing quality/yields; and impairments of long-lived assets, including manufacturing, assembly/test and intangible assets. Intel's results could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Expenses, particularly certain marketing and compensation expenses, as well as restructuring and asset impairment charges, vary depending on the level of demand for Intel's products and the level of revenue and profits. Intel's results could be affected by the timing of closing of acquisitions and divestitures. Intel's results could be affected by adverse effects associated with product defects and errata (deviations from published specifications), and by litigation or regulatory matters involving intellectual property, stockholder, consumer, antitrust, disclosure and other issues, such as the litigation and regulatory matters described in Intel's SEC reports. An unfavorable ruling could include monetary damages or an injunction prohibiting Intel from manufacturing or selling one or more products, precluding particular business practices, impacting Intel's ability to design its products, or requiring other remedies such as compulsory licensing of intellectual property. A detailed discussion of these and other factors that could affect Intel's results is included in Intel's SEC filings, including the company's most recent Form 10-Q, Form 10-K and earnings release.

Rev. 10/16/12



# Legal Disclaimer & Optimization Notice

INFORMATION IN THIS DOCUMENT IS PROVIDED "AS IS". NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Copyright <sup>©</sup> 2012, Intel Corporation. All rights reserved. Intel, the Intel Iogo, Xeon, Xeon Phi, Xeon Phi Iogo, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries. \*Other names and brands may be claimed as the property of others. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families: Go to: Learn About Intel® Processor Numbers

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

> Copyright© 2012, Intel Corporation. All rights reserved. \*Other brands and names are the property of their respective owners.

intel.com/software/prod

Notice revision #20110804