### "Single-chip Cloud Computer" An experimental many-core processor from Intel Labs



Justin Rattner
Chief Technology Officer
Intel Corporation

## Evolving User Experiences

Single-Core Era

Multi-Core Era

Many-Core Era









Textual
Multi-purpose
Productive



Visual Mobile Entertaining



Immersive Social Perceptive







# Performance Scaling Challenges



Energy Efficiency Design Complexity Programming Strategy

Emerging Applications

# Cloud Computing Today

#### Cloud datacenters:

- 1000s of networked computers
- Millions of threads & petabytes of data

### Opportunity:

- Lower power, higher density via integration
- Greater efficiency and better programmability



Example: Intel's Open Cirrus testbed Intel Labs Pittsburgh



# Single-chip Cloud Computer (SCC)



- Experimental many-core CPU on 45 nm Hi-K metal-gate silicon
- 48 IA-compatible cores the most ever built on a single chip
- Network of 2-core nodes mimics cloud computing at chip level
- Fine-grained power management scales from 25-125W
- Supports proven, highly parallel "scale-out" programming models

## Inside the SCC

24 Tiles

24 Routers 48 IA cores

Dual-core SCDC Tile



- 2D mesh network with 256 GB/s bisection bandwidth
- 4 Integrated DDR3 memory controllers (64GB addressable)

## New Data-Sharing Options

The SCC eliminates significant complexity & power by removing hardware cache coherency



Enables exploration of more scalable alternatives:

- Ultra-low latency HW-accelerated message passing
- Software-managed, page-level memory coherency

## Improving Energy Efficiency

Fine-grain, software-controlled power management



#### 8 voltage and 28 frequency islands

- Each tile can run at a different frequency
- 6 banks of four tiles can run at different voltages
- Also independent V&F control for I/O network & MCs

### A Platform for SW Innovation

- Planning underway to share the SCC platform
- Dozens of partners within 6 months, more over time
- Already working with several partners:
  - Microsoft, ETH Zurich, UC Berkeley and University of Illinois



"We're very excited about Intel's SCC. In the Barrelfish project we are designing OS architectures for future multi-core and many-core systems. The chip's memory system and message passing support are a great fit for us, and it's an ideal vehicle for us to test and validate our ideas."

- Prof. Timothy Roscoe, ETH Zurich



"The upcoming Single-chip Cloud Computer is of great interest to application developers and tools researchers. The availability of the hardware will greatly accelerate our development of applications and tools for massively parallel computing platforms."

- Prof. Wen-Mei Hwu, University of Illinois, UPCRC@Illinois co-director

Learn more at www.intel.com/go/terascale

## SCC Summary: Meeting the Scalability Challenges

### Energy Efficiency

- Dynamic voltage/frequency scaling
- 1/3 power reduction for core-core I/O

### Design Complexity

 Array of small IA-based tiles could lead to more agile, flexible designs

# Programming Models

 Message-passing, shared virtual memory, map-reduce, and actors

# Application Development

Working with Microsoft & others for academic, industry innovation

## SCC Demo Showcase

### Financial Analytics w/ shared virtual memory



#### Microsoft Visual Studio



#### **Advanced Power Management**



**JavaScript Physics Modeling** 



**HPC Parallel Workloads** 



Hadoop Web Search





Sponsors of Tomorrow.



## Extending Tera-scale Research





| <u> 2006 N</u> | <u> 1any-core</u> | Prototype    |
|----------------|-------------------|--------------|
| "Teraflo       | os Research       | n Processor" |

Many simple FP cores

Validated tiled-design concept

Tested HW limits of a mesh network

Sleep capabilities at core and circuit level

Lightweight message passing

Limited programmability for basic benchmarks

Primarily a circuit experiment

## 2009 Many-core Prototype "Single-chip Cloud Computer"

Many fully-functional IA cores

Prototypes a tiled-design microprocessor

Improved mesh with 3x performance/watt

Dynamic voltage & frequency scaling

Message passing & controlled memory sharing

Full programmability for application research

Circuit & software research vehicle

# Meeting Performance Demands



### **ISSCC Abstract**

"A 48-Core IA-32 Message Passing Processor with DVFS in 45nm CMOS"

#### **Abstract:**

A 567mm2 processor in 45nm CMOS integrates 48 IA-32 cores and 4 DDR3 channels in a 6×4 2D-mesh network. Cores communicate through message passing using 384KB of on-die shared memory. Fine grain power management takes advantage of 8 voltage and 28 frequency islands to allow independent DVFS of cores and mesh. As performance scales, the processor dissipates between 25W and 125W.

## Intel Labs Braunschweig

#### Design:

- IA core and the Message Passing solution
- Memory Controllers

#### Validation:

- Logic validation of the complete chip
- FPGA Emulation for pre-silicon SW prototyping

#### Platform:

- Test bed system validation and bring up platform
- Software:
  - A Linux OS
  - Platform firmware, operational SW & drivers







## Intel Labs Bangalore



- Circuit and physical design of the 45nm iA core using synthesis and custom designs
- Design and implementation of the on-die 2-D mesh network and the digital logic of the DDR3-Memory Controller
- Logic and performance verification of the prototype processor

