

## Research with Impact

#### Joseph Schütz Vice President Corporate Technology Group

#### Corporate Technology Group Strategic Objectives

Conduct world-class research for all Intel Platforms and Products

Deliver innovative technologies from concept to product adoption

Collaborate with industry via standards, alliances and evangelism

Engage worldwide for the best research and technology



Seattle Hillsboro Berkeley Santa Clara Guadalajara

Pittsburgh \* \* \* \* Chandler Braunschweig ★ Barcelona ★

Moscow Beijing \*

🛧 St. Petersburg

Haifa

Shanghai ★

Bangalore

#### Nearly 900 researchers

#### 15 locations worldwide

#### Innovative research models



#### 2007 Research Focus Areas

#### **Tera-Scale Computing**

| 重要用    | nt ga en | nt ge ve            |
|--------|----------|---------------------|
| 制度现    | nt gen   | 副國際                 |
| 11.181 |          | 11)<br>2013<br>2014 |

Scalable computing for the future

#### **Energy-Efficient Platforms**



Workload-driven, optimal energy usage

#### Wireless



Mobile Broadband and Advanced Wireless Platforms

#### Carry Small, Live Large



Ultra-mobile platforms and usage models



#### **Tera-Scale Platforms**

Research spanning circuits to workloads for future, highly scalable platforms (2009+)

Partitioned, adaptable, reliable, trusted, scalable architecture





Over 60 Research Projects Related to Tera-Scale (intel

## IA Tera-Scale First Silicon in 2008

Bringing IA Programmability and Parallelism to High Throughput Computing



- Highly parallel, IA programmable architecture in development
- Ease of scaling for software ecosystem
- Array of enhanced IA cores
  - Scientific Computing, RMS, Visualization, Financial Analytics & Health applications
- Teraflops of performance



# Tera-Scale Research in the EU

- Significant projects in Braunschweig

   Accelerating MC research through FPGA
   emulation
- Broad Spectrum of projects in Barcelona

-Based on the interaction between micro-architecture and compilers





#### Led by Sebastian Steibl



### FPGAs speed CPU Arch. Research

- Full system FPGA emulation of a Tera-scale computing system including
  - Cores, Interconnect, Caches, Memory, I/O, Operating System
- Fills a gap between simulators and testchips like the Teraflops Research Processor (aka "Polaris") for architectural experiments:

|           | Turn around time | Simulation speed | Months of resources |
|-----------|------------------|------------------|---------------------|
| Simulator | Hours            | KIPS             | 1s-10s              |
| FPGA      | Days             | MIPS             | 10s-100s            |
| Silicon   | Months - years   | GIPS             | 100s-1000s          |

- Fully cycle accurate system
- Leverage full benefit of HW/SW co-design
  - HW can be adjusted incorporating programmer's feedback
- Performance: cycle accurate up to ~1/1000-1/100 of a silicon product cycle frequency (4-10 MHz)



### **Previous Work in FPGA Emulation**





- First Intel<sup>®</sup> Architecture research FPGA prototype, disclosed in 2006
- Motivation
  - Studies of tera-scale multi-core platforms
  - Emulation of HW, OS and application SW
  - Fast turn-around cycles are key for research
- Key Features
  - Fully IA32 compliant research core
  - Fits into standard mainboard
  - FPGA enables fast turn around times for new designs
  - High emulation speed capable of running OS and application workloads



#### Next Generation Tera-scale FPGA prototyping

- Next generation FPGA research platform
- Intel<sup>®</sup> designed research emulation system to focus on tera-scale computing research
  - Commercial emulators are designed for validation of silicon designs
  - Designed for prototyping new architectural concepts
  - Optimized for emulating a larger number of cores (up to 64)
- Evaluate using an Operating System and real applications
- Full real-time visibility of CPU internal signals and activities



## A Versatile Larger Scaleable System

- A tera-scale CPU prototype is build from multiple node cards in multiple cases
- Each node card is designed to emulate different tera-scale building blocks
  - Cores
  - Memory controllers
  - Hardware accelerators
- Rack mounted node cards with Cable based interconnect for high speed links
  - Designed to emulate different topologies like ring, mesh and torus interconnects
- Backplane for system control + debug







#### Research in Barcelona A Broad Variety of Tera-Scale Topics

 Leveraging the synergy between micro-archtecture and compilers



#### Main Research Vectors







increase reliability

## Speculative Multi Treading Approach

- Main idea
  - Support for speculative threads
    - Threads that can either commit or be squashed atomically, based on runtime conditions
- This has huge implications
  - No need to be conservative
    - TLP can be exploited even if independence cannot be proved
  - Dependent sections of code can be fully parallelized
    - Through value speculation
- Approach
  - Hardware/software to check for correct speculations
    - Transactional memory for CMP
    - Speculative buffer for SMT
    - Extra code in the binary inserted by the compiler or the VM
  - Static parallelization
    - Compiler for parallelization
    - User library for thread management
  - Dynamic parallelization
    - Thalia co-designed VM for parallelization



### **Preliminary Results for SMT**





#### Thread-Affinity Aware Cache







#### **Multi-Core Transition Accelerating**

intel

"We notified customers we're pulling in both the desktop and server (launch) of the first quad-core processors into the fourth quarter of this year from the first half of 2007"

> "The UltraSPARC T1 processor with CoolThreads technology is the highest-throughput and most eco-responsible processor ever created."





\*Third party marks and brands are the property of their respective owners

### **Design Challenges**

- Complex memory hierarchy
- Sophisticated on-die fabrics
- Explicit thread support
- Fixed function acceleration



Lacking The Quantitative Tools...



### The Waiting Game



Designing 2010 Processors Today Must Anticipate Future Applications





### Suite Repository and Working Group



#### **Professor Kai Li** Charles Fitzmorris Professor Department of Computer Science

#### **Professor J. P. Singh** Department of Computer Science





## **Closing Comments**

- The conversion to Tera-Scale is accelerating
- This is the biggest change since the inception of the microprocessor
- This is creating very significant challenges and opportunities
- The first IA Tera-Scale Prototype in 2008
- Two key Intel research sites in the EU contributing to the Tera-Scale agenda
- Please join us in pushing research forward in this area

