

#### Tera-scale software research: Programming 10s-100s of cores

#### Jerry Bautista, PhD Co-director, Tera-scale Computing Research Intel Corporation



© 2007 Intel Corporation

## What is Tera-scale?







## **Example emerging Application**



Many emerging apps are parallel and "model-based" Ex: Modeled body, modeled motion, modeled lighting

Intel Developer FORUM

See Justin's keynote on "Virtual World" Thursday



#### Tera-scale Computing: A New Drive for Parallel Programming



# Growing need for new parallel programming tools and capabilities





## Joint Hardware & Software R&D









## Taking Parallel Programming Mainstream

Used for decades in HPC, parallel programming requires

- Special expertise, not easily automated
- Requires parallel languages/language extensions
- Must co-exist with legacy code



Parallel languages or parallel language extensions need to:

Extract parallelism hiding within applications

Express parallelism via programming constructs

Exploit parallelism on multi-core platforms





## Types of Parallelism

 Task: multiple independent activities, which may or may not share data



- Data: one or few tasks operating on a large amount of data
  - Will review Ct later in presentation
  - Subject of demo and class (TCRS0003)









### Common Parallel Programming Models

| Programming<br>model       | Example<br>Application Today | Example<br>Language |
|----------------------------|------------------------------|---------------------|
| Message Passing            | HPC type apps on a cluster   | MPI                 |
| Shared-memory<br>or Thread | General Par Tasks, Database  | OpenMP              |
| Data Parallel              | Mathematical simulations     | MatLab              |
| Streaming                  | Graphics shaders             | C <sub>G</sub>      |





## Diverse Programming Environment

Historically, new languages often emerge and compete.

Adoption can be slow and success hard to predict.



Lesson: No one language will be a silver bullet that solves the parallel programming challenges





## Research to Help Enable Parallelism

- STM (Software Transactional Memory)
  - Shared memory model is dominant but problematic due to synchronization of locks in highly parallel environments
- Ct (C for throughput computing)
  - Relieves need to worry about threads in data parallel execution models providing parallel extensions to C and C++
- Exo (Accelerator Exoskeleton)
  - Bringing IA "look and feel " of common development environments to parallel, *heterogeneous* environments.





## Unlocking Parallelism in a Shared Memory Environment

Must carefully control how multiple threads access shared memory

Today we "lock" memory for one thread at a time.

- Other threads must wait, reducing multi-core benefit
- Locking code scales poorly, must re-do for more threads
- Can cause critical software deadlocks and errors
- We need to fix this...





2003 Northeast blackout

Courtesy NASA/JPL-Caltech



Mars rover problem







\*Other names and brands may be claimed as the property of others.

## STM: From Research to Reality

Ensures correct parallel memory access without locks



- Greater performance
- Easer to program
- Scales with hardware

#### Programmers can try it!

#### AVAILABLE STARTING TODAY Whatif.intel.com Intel® C++ STM Compiler Prototype Edition





### Ct: Nested Data Parallel Programming

Ct adds *new parallel* data structures & operators to C/C++; But uses *existing & unmodified* C/C++ compilers





14

Intel Developer

FUS

## Ct Motivation and Vision

- Make parallel programming easier now:
  - Extend *deterministic* parallel programming models
    - I.e. Data races not possible
  - Express complex behaviors through simple operators
  - Present a simple and predictable performance model
- Provide a forward-scaling programming model that maximizes ISV ROI for new code creation
  - "Future-proof" apps from increasing core count and inevitable ISA evolution

See today's post by Anwar Ghuloum at blogs.intel.com/research for more info





### Programming for Heterogeneous Cores

- Multi-core brings the opportunity to integrate fixed function accelerators with IA cores
- Implementation difficult and very HW specific







### Programming Accelerators Today

#### SW DEVELOPMENT SW EXECUTION

Intel Developer





### Exo - Accelerator Exoskeleton Model

The exoskeleton makes accelerators appear like a part of the processor





FORUM

## Demonstrations

1. Video enhancement application (de-interlacing)



2. Similar demo with a financial analytics application





## **Convergence of Many Parallel Apps**







## Independent IDC Market Analysis

- Spending on computing continues to grow
- New usage models are emerging
- IDC working with Intel and others to understand this space
  - "IDC believes that new use cases for computing are emerging which will drive significant growth...Our research has shown that highly parallel applications and multi-core processors are key drivers enabling this emerging trend."

-Matt Eastwood, Group Vice President, IDC Enterprise Platform Research

Full report to be published in October





## Summary

- The key challenges to parallel programming
  - Programmability and scalability
  - Interoperability and integration of heterogeneous execution blocks
- Parallel programming is the key to unlocking the full potential of the Tera-scale platforms
  - Mainstream programmers enabled through broad set of development and optimization tools.
  - Technologies developed to extend well-known programming environments (Ct, Exo and STM)
  - Tools and technologies are already being deployed now:
    - TBB threadingbuildingblocks.org
    - STM on Whatif.intel.com
- Model-based computing applications are compelling with market projections supporting potential for broad adoption





## More information on Tera-scale...



#### Tera-scale Computing Research Chalk Talk

Chair: Jerry Bautista, Co-Director, Tera-scale Computing Research TCRC001: Wed, Sept. 19, 4:40 - 5:30, Chalk Talk Room

#### Other Tera-scale Sessions (see IDF guide for full info)

- TCRS001 Energy Management Innovations for Future Multi-Core Processors
- TCRS002 Intelligent On-chip Interconnects: The 80-core Prototype and Beyond
- TCRS003 Data Parallel Programming for Tera-scale with Ct
- TCRS004 Modeling Reality: Ray Traced Graphics and other Apps of the Future
- TCRS005 Silicon Photonics: Enabling Terabit Data Pipes
- QATS003 Accelerator Exoskeleton: IA Look-n-Feel for Heterogeneous Cores

#### Tech & Research Pavilion Demos:

80-core Processor, Multi-core emulator, Ct, Log-based Architectures, Ray Tracing

Learn more at <u>www.intel.com/go/terascale</u> and <u>blogs.intel.com/research</u>



