- Related Stories
-
On Mars, no life yet, but many blue screens of death
August 23, 2004 -
IBM finds ally for supercomputer-on-a-chip
August 27, 2003 -
PCI Express to usher in PC changes
February 25, 2003
RIKEN, an anglicized acronym for Japan's Research Institute of Physical and Chemical Research, described on Tuesday the MDGrape 3, a processor it thinks will become the cornerstone of a computer capable of operating at a petaflop, or a quadrillion operations per second--far faster than the 36 trillion ops supercomputers of today.
Samples of the chip, which was designed for life sciences research, can now perform 230 gigaflops, or 230 billion operations per second, while running at 350MHz, better than standard general-purpose chips. In a worst-case scenario, the chip performs 160 gigaflops at 250MHz, said Makoto Tanji, a researcher with RIKEN's high-performance computing group. Tanji spoke at the Hot Chips conference taking place at Stanford University.
The computational power comes, he said, because the chip is specialized for workloads that involve numerous, similar calculations on a comparatively small set of data. This sort of workload is common in the life sciences and bio-nanotechnology field, where researchers need to examine, for example, how a single protein interacts with thousands of different molecules. Consequently, the chip and the computers based on it can be directly compared with general purpose supercomputers only in a limited field, but the processor excels there.
"We can obtain about a 100 times better performance through specialization. The number of operations are more limited on a general purpose computer," Tanji said. For the MDGrape 3 to shine, "the amount of computation must be much larger than the data," he added.
The University of Tokyo initiated the MDGrape project 15 years ago to develop a chip for astrophysics. RIKEN, which is one of the world's largest biosciences institutes, has worked over the last several years to extend the chip's architecture to life sciences and molecular dynamics because the range of applications is wider, Tanji explained. The group will create computers based on the chip for its Protein 3000 project to determine the characteristics of 3,000 proteins. Those machines should appear sometime in 2007.
Commercial systems using the MDGrape 2, which can churn at 16 gigaflops and run at 100MHz, are currently on the market, Tanji said. Work on the MDGrape 3, also know as the Protein Explorer, began in 2002, and the chip should start to be used to run applications in 2006.
Research also continues at the University of Tokyo to develop a quasi general purpose chip capable of 1 teraflop, or a trillion calculations a second. IBM and the University of Texas have a similar teraflop-on-a-chip project.
Architecturally, the MDGrape 3 differs substantially from most other chips. It comes with 20 pipelines for calculations, the equivalent of an assembly line for a processor. Commercial chips typically have one or two. The chip also features what RIKEN calls a broadcast memory architecture, where data is force-fed to the different pipelines simultaneously. Parallelization, a design convention that aims to cut down on redundant or parallel calculations, is optimized in the chip's design.
Despite the differences from other chips, the MDGrape 3 is built on the 130-nanometer process, a manufacturing convention that has been in place for the past few years.
The enhancements lead to huge advantages over general purpose processors. Tanji said the 350MHz Grape 3 can provide a gigaflop of computing power for $15, compared with $400 per gigaflop for a Pentium 4, $640 per gigaflop for the chips inside IBM's Blue Gene/L and a whopping $4,000 per gigaflop from NEC's Earth Simulator, currently the world's most powerful supercomputer.
In terms of power consumption, the 350MHz MDGrape 3 consumers 14 watts of power, or 0.1 watts per gigaflop. A 3GHz Pentium 4 runs at 82 watts, or 14 watts per gigaflop, he said. The Blue Gene/L chip and Earth Simulator come in at 6 and 128 watts, he said.
RIKEN is also designing the computer that will house the MDGrape 3. Twelve chips will fit on a board, while two boards will fit into a 2U-high box (3.5 inches). The chips are all connected to each other through an 81-bit bus, and the boards are connected to the rest of the computer through PCI Express.
The petaflop computer will consist of 6,144 processors on 512 boards clustered together. In all, the system will fit into 32 boxes that will stand on 19-inch pedestals.
"It is very small," Tanji said.
See more CNET content tagged:
supercomputer,
life science,
calculation,
Japan,
designer



Subject: Re: 18 gigaflops?
Newsgroups: comp.sys.mac.advocacy
Date: 2003-01-09 22:06:28 PST
I ran the old flops 2.0 test some time ago on both of my own macs, a Pentium 4 1.7 GHz from work and an Itanium 733 Mhz (also from work). I got:
CPU flops(6) - was peak for all machines
Itanium 733 2022.8883 MFLOPS (2.2 GigaFlops)
P4-1700SSE 1332.1317 MFLOPS (1.3 GigaFlops)
P4-1700 475.3642 MFLOPS (0.475 GigaFlops)
G4-800DP 308.3056 MFLOPS (0.308 GigaFlops)
G3-700 311.9328 MFLOPS (0.311 GigaFlops)
In article <dmartinZZZ-0901032106520001@192.168.0.5>,
dmartinZZZ@ufl.edu (Danny Martin) wrote:
> Apple's web site touts the twin 1.25 ghz processor turning 18 gigaflops. Is it valid to compare gigaflops between PPC chips and Pentiums? If so, how many gigaflops does a 2 ghz Pentium turn?
A 2 GHz Pentium 4 would manage about 8 GFLOPS single precision peak. This has come and gone in this newsgroup a couple times and I'm not
really sure how apple comes up with their numbers.
This is all theoretical peak and it doesn't really have much to do with real performance, it's mostly just a matter of adding numbers together
Another words the performance would really suck if you tried to run an normal desktop application on it, but its really good at adding and multiplying huge mega-sets of data. Sort of like an idiot savant.
This phrase is really interesting. This seems to suggest the perfermance test case is just holding the data in its CPU's registers and performaning the same calculations over and over again without needing to access the external memories that would slow it down.
Its fast, but its not a very useful thing for a general purpose processor to do. Its kind of silly to compare this processor with a general purpose computer. Great, it can calculate lots of stuff if it sits in a black box and doesn't communiate with the outside world and your program needs to repeat the same caluation over and over again.