Tera set to ship new MTA system

How does it address high-performance parallel systems' two major problems?

By Stephanie Steenbergen

December 1997

Mail this
article to
a friend

San Francisco (December 1, 1997) -- Later this month Tera Computer Company expects to deliver the first commercially available computer system based on its new multithreaded processor architecture. According to Tera, the system, called the MTA (Multithreaded Architecture), will scale from 16 to 256 processors.

Burton Smith, Tera's chief scientist, says Tera will deliver its first system to the University of California, San Diego (UCSD) thanks to a $1.9 million, 18-month DARPA (Defense Advanced Research Projects Agency) grant -- coupled with $4.2 million in NSF (National Science Foundation) money.

Multithreaded Architecture processors, unlike those based on vector or data caching architectures, switch contexts in less time than needed to store and reload the contents of its architectural registers.

According to Tera, high-performance parallel computers currently face two major problems: scalability and programmability. Major third-party software applications do not scale beyond a few processors. And because they do not scale, application performance is less than thrilling -- usually less than a gigaflop, no matter how many processors are in the system.

Smith claims Tera's MTA system can successfully solve these problems -- first, because the memory is shared rather than distributed among the processors; and second, because MTA minimizes the time that processors spend waiting by using multiple "threads," and MTA's compilers distribute tasks to take advantage of the threads.

Advertisements

What affects performance?
How a processor deals with memory and synchronization latency is paramount to its performance. The two alternatives to Tera's multithreaded architecture processors are vector processors and data caching processors.

Vector processors tolerate memory latency using vector load operations. The independence of the individual items in the vector -- vector parallelism -- tolerate latency.

Burton Smith, Tera's chief scientist, likens memory latency to a disease. "[Vector parallelism] is ibuprofen for latency disease," he says.

In contrast to vector processors, data caches try to avoid memory latency by copying data nearby. This is successful only to a certain point. It cannot scale indefinitely because it relies on both data reuse and unit stride memory access patterns.

The folks at UCSD and San Diego Supercomputer Center (SDSC) have spent a fair amount of time testing Tera's MTA. "The jury's still out," says Allan Snavely, a senior programmer analyst at SDSC. "Initially, we've got some pretty good results," he says.

The prototype on MTA is running at 245 MHz, and the production model will be 333 MHz. Tera's MTA has been tested against Cray's T90. "[MTA] is better on some codes and worse on others," says Snavely. "It's really kicking butt on integer sort, and it's approximately the same on conjugate gradient."

What are integer sort and conjugate gradient? "In a nutshell," says Snavely, "integer sort and conjugate gradient are two of the NAS2.3-serial benchmarks. This is a suite of standard programs used to evaluate the performance of the computer hardware and the sophistication of the compiler."

When asked about whom the MTA will benefit, Larry Carter, professor of Computer Science and Engineering at UCSD says, "I'm fairly convinced it will be good for the scientific community." UCSD has partnered with The Boeing Company, The California Institute of Technology (Caltech), Jet Propulsion Laboratory (JPL), Sanders (a Lockheed Martin Company), the Navel Command, Control and Ocean Surveillance Center, and Tera to test Tera's MTA.

Will this architecture make its way into more mainstream processors from companies such as Intel or Sun? Snavely says, "Thread-based multiprocessing is a technology that is applicable to mainstream processors. However, mainstream processors are not going to be doing massive multithread processing on the scale of the Tera any time soon."

Smith's opinion differs: "In the next 10 years nearly all architectures will be multithreaded, except for those processors not intended for high performance. I expect both Intel and Sun will be using multithreading soon."

Also in regards to the future of MTA, Snavely says, "What's likely is that the MTA will push the envelope on the power of the multithreaded paradigm. Lessons learned will trickle down to the commodity processor and affect the next generation of commodity microprocessors."

Victor Hazelwood, the manager of high-performance computer systems for SDSC, noted the fortuitous time that Tera's MTA is entering the market. "These are interesting times... Cray is stumbling. This is the best time for Tera to enter the market. There's a window of opportunity for someone like Tera to come into."

MTA's specifications:

Up to 128 threads per processor
Up to 8 concurrent memory references per thread
Memory capability ranges from 16 to 512 GB
I/O bandwidth ranges from 6 GB/s to 102 GB/s

Resources

The Tera MTA http://www.tera.com/web/mta.html

If you have technical problems with this magazine, contact webmaster@sunworld.com

URL: http://www.sunworld.com/swol-12-1997/swol-12-tera.html
Last modified:

Comments:
Name:
Email:
Company Name: