Supercomputers in the modern world. Subscribe to news

In recent years, computer design and production companies have been working tirelessly. As a result, the amount of technology in the world is growing exponentially.

The most powerful computers

Just recently, the world did not know about DirectX10, and the graphics of FarCry or NFS Underground 2 seemed to be the pinnacle of computer capabilities. Once upon a time, a disk capable of storing 600 megabytes of information seemed like a miracle of technology, but now terabyte memory cards are freely available.

In the field of supercomputers, much the same thing happens. In 1993, University of Tennessee professor Jack Dongarra came up with the idea of ​​creating a ranking of the most powerful computers in the world. Since then, this list, called the TOP500, has been updated twice a year: in June and November.

Time passes, and the leaders in the supercomputer ratings of the early 90s are already ungodly outdated even by the standards of ordinary PC users. So, the first in 1993 was the CM-5/1024, assembled by Thinking Machines: 1024 processors with a clock frequency of 32 MHz, a computing speed of 59.7 gigaflops - slightly faster than an ordinary 8-core PC under your desk. What is the best computer today?


Sunway TaihuLight

Just five years ago, the palm in terms of power was consistently held by supercomputers made in the USA. In 2013, Chinese scientists seized the leadership and, apparently, are not going to give it up.

At the moment, the most powerful computer in the world is considered to be the Sunway TaihuLight (translated as “The Divine Light Power of Lake Taihu”), a grandiose machine with a computing speed of 93 petaflops (maximum speed - 125.43 petaflops). This is 2.5 times more powerful than the previous record holder - the Tianhe-2 supercomputer, which was considered the most powerful until June 2016.


Sunway Taihulight has 10.5 million built-in cores (40,960 processors, each with 256 computing and 4 control cores).

This is what the most powerful computer of 2016 looks like

All equipment was developed and manufactured in China, while the processors of the previous most powerful computer were produced by the American company Intel. The cost of Sunway TaihuLight is estimated at $270 million. The supercomputer is located at the National Supercomputer Center of Wuxi County.

Record holders of past years

Until June 2016 (and the TOP500 list is updated every June and November), the most powerful and fastest computer was the Tianhe-2 supermachine (translated from Chinese as “Milky Way”), developed in China at the Defense Science and Technology University in Changsha with the help of the company Inspur.


The power of Tianhe-2 provides 2507 trillion operations per second (33.86 petaflops per second), peak performance is 54.9 petaflops. The Chinese development has topped this ranking since its launch in 2013 – an incredibly impressive figure!

Supercomputer Tianhe-2

The characteristics of Tianhe-2 are as follows: 16 thousand nodes, 32 thousand 12-core Intel Xeon E5-2692 processors and 48 thousand 57-core Intel Xeon Phi 31S1P accelerators, which means 3,120,000 cores in total; 256 thousand DDR3 RAM sticks of 4 GB each and 176,000 GDDR5 8 GB sticks - 2,432,000 GB of RAM in total. Hard disk capacity is more than 13 million GB. However, you won’t be able to play on it - it is intended solely for computing, and Milky Way 2 does not have a video card installed. In particular, it helps with calculations for laying subways and urban development.

Jaguar

For a long time, Jaguar, a supercomputer from the USA, was at the top of the ranking. How is it different from the others and what are its technical advantages?


The supercomputer, called Jaguar, consists of a large number of independent cells divided into two sections - XT4 and XT5. The last section contains exactly 18688 computational cells. Each cell contains two six-core AMD Opteron 2356 processors with a frequency of 2.3 GHz, 16 GB of DDR2 RAM, as well as a SeaStar 2+ router. Even one cell from this section would be enough to create the most powerful computer for gaming. The section contains only 149,504 computing cores, a huge amount of RAM - more than 300 TB, as well as a performance of 1.38 Petaflops and more than 6 Petabytes of disk space.

Building a computer monster

The XT4 partition contains 7832 cells. Their characteristics are more modest than those of the previous XT5 section: each cell contains one six-core processor with a frequency of 2.1 GHz, 8 GB of RAM and a SeaStar 2 router. In total, the section has 31,328 computing cores and more than 62 TB of memory, as well as a peak performance of 263 TFLOPS and more than 600 TB of disk space. The Jaguar supercomputer runs on its own operating system, Cray Linux Environment.

Another computer is breathing in the back of Jaguar, the brainchild of IBM - Roadrunner. The most powerful computing monster is capable of calculating up to 1000,000,000,000 operations per second. It was developed specifically for the Department of Energy's National Nuclear Security Administration at Los Alamos. With the help of this supercomputer they planned to monitor the operation of all nuclear installations located in the United States.


The Road Runner's peak processing speed is about 1.5 petaflops. We are talking about a total capacity of 3,456 original tri-blade servers, each of which is capable of performing about 400 billion operations per second (that is, 400 gigaflops). Inside the Roadrunner there are about 20 thousand high-performance dual-core processors - 12,960 Cell Broadband Engine and 6948 AMD Opteron, the brainchild of IBM itself. Such a supercomputer has a system memory of 80 terabytes.

So how much space does this miracle of technology take up? The machine is located on an area of ​​560 square meters. And all the department’s equipment is packaged in servers of the original architecture. All equipment weighs about 23 tons. So to transport it, the National Nuclear Security Administration staff will need at least 21 large tractors.

A few words about what petaflops is. One petaflop is approximately equal to the total power of 100 thousand modern laptops. If you try to imagine, they can pave a road almost two and a half kilometers long. Another accessible comparison: within 46 years, the entire population of the planet will be using calculators to do calculations that Roadrunner can do in one day. Can you imagine how little Sunway TaihuLigh, the leader of our rating, will need?

Titan

In 2012, the U.S. Department of Energy's Oak Ridge National Laboratory launched the Titan supercomputer, which is rated at 20 petaflops—in other words, it can perform a quadrillion floating point operations in one second.


Titan was developed by Cray. In addition to Titan, American specialists have developed two more supercomputers in recent years. One of them - Mira - is intended for industrial and scientific research needs, and with the help of the other - Sequoia - they simulate nuclear weapons tests. IBM Corporation is behind all these developments.

The most powerful computer in Russia

Alas, the Russian development “Lomonosov-2”, recognized as the most powerful computer in Russia, is only in 41st place in the TOP500 (as of June 2016). It is based at the Scientific Computing Center of Moscow State University. The power of the domestic supercomputer is 1,849 petaflops, peak power is about 2.5 petaflops. Number of cores: 42,688.



Subscribe to our channel in Yandex.Zen

Supercomputer Titan

People still don’t fly to Mars, cancer hasn’t been cured yet, and we haven’t gotten rid of oil addiction. And yet there are areas where humanity has made incredible progress in recent decades. The computing power of computers is just one of them.

Twice a year, experts from the Lawrence Berkeley National Laboratory and the University of Tennessee publish the Top 500, which offers a list of the most powerful supercomputers in the world.

Looking ahead a little, we suggest you taste these numbers in advance: the productivity of the representatives of the top ten is measured in tens of quadrillions of flops. For comparison: ENIAC, the first computer in history, had a power of 500 flops; Today, the average personal computer has hundreds of gigaflops (billions of flops), the iPhone 6 has approximately 172 gigaflops, and the PS4 has 1.84 teraflops (trillion flops).

Armed with the latest Top 500 from November 2014, Naked Science decided to figure out what the 10 most powerful supercomputers in the world are, and what problems require such tremendous computing power to solve.

  • Location: USA
  • Performance: 3.57 petaflops
  • Theoretical maximum performance: 6.13 petaflops
  • Power: 1.4 MW

Like almost all modern supercomputers, including each of those presented in this article, CS-Storm consists of many processors united into a single computer network based on the principle of a massively parallel architecture. In reality, this system consists of many racks (“cabinets”) with electronics (nodes consisting of multi-core processors), which form entire corridors.

Cray CS-Storm is a whole series of supercomputer clusters, but one of them still stands out from the rest. Specifically, it is the mysterious CS-Storm, which is being used by the US government for unknown purposes and in an unknown location.

What is known is that American officials bought extremely efficient in terms of energy consumption (2386 megaflops per 1 Watt) CS-Storm with a total number of cores of almost 79 thousand from the American company Cray.

The manufacturer's website, however, says that CS-Storm clusters are suitable for high-performance computing in the fields of cybersecurity, geospatial intelligence, pattern recognition, seismic data processing, rendering and machine learning. Somewhere in this series, the use of the government CS-Storm probably settled down.

CRAY CS-STORM

9. Vulcan – Blue Gene/Q

  • Location: USA
  • Performance: 4.29 petaflops
  • Theoretical maximum performance: 5.03 petaflops
  • Power: 1.9 MW

“Vulcan” was developed by the American company IBM, belongs to the Blue Gene family and is located at the Lawrence Livermore National Laboratory. The supercomputer, owned by the US Department of Energy, consists of 24 racks. The cluster began operating in 2013.

Unlike the already mentioned CS-Storm, the scope of application of Vulcan is well known - various scientific research, including in the field of energy, such as modeling natural phenomena and analyzing large amounts of data.

Various scientific groups and companies can gain access to the supercomputer by submitting an application to the High Performance Computing Innovation Center (HPC Innovation Center), based at the same Livermore National Laboratory.

Supercomputer Vulcan

8. Juqueen – Blue Gene/Q

  • Location: Germany
  • Performance: 5 petaflops
  • Theoretical maximum performance: 5.87 petaflops
  • Power: 2.3 MW

Since its launch in 2012, Juqueen has been the second most powerful supercomputer in Europe and the first in Germany. Like Vulcan, this supercomputer cluster was developed by IBM as part of the Blue Gene project, and belongs to the same generation Q.

The supercomputer is located in one of the largest research centers in Europe in Jülich. It is used accordingly - for high-performance computing in various scientific research.

Juqueen supercomputer

7. Stampede – PowerEdge C8220

  • Location: USA
  • Performance: 5.16 petaflops
  • Theoretical maximum performance: 8.52 petaflops
  • Power: 4.5 MW

Located in Texas, Stampede is the only cluster in the top ten of the Top 500 that was developed by the American company Dell. The supercomputer consists of 160 racks.

This supercomputer is the most powerful in the world among those used exclusively for research purposes. Access to Stampede facilities is open to scientific groups. The cluster is used in a wide range of scientific fields - from precise tomography of the human brain and earthquake prediction to identifying patterns in music and language structures.

Supercomputer Stampede

6. Piz Daint – Cray XC30

  • Location: Switzerland
  • Performance: 6.27 petaflops
  • Theoretical maximum performance: 7.78 petaflops
  • Power: 2.3 MW

The Swiss National Supercomputing Center (CSCS) boasts the most powerful supercomputer in Europe. The Piz Daint, named after the Alpine mountain, was developed by Cray and belongs to the XC30 family, within which it is the most productive.

Piz Daint is used for various research purposes, such as computer simulations in the field of high energy physics.

Supercomputer Piz Daint

5. Mira – Blue Gene/Q

  • Location: USA
  • Performance: 8.56 petaflops
  • Theoretical maximum performance: 10.06 petaflops
  • Power: 3.9 MW

The Mira supercomputer was developed by IBM as part of the Blue Gene project in 2012. Argonne National Laboratory's High Performance Computing Division, which houses the cluster, was created with government funding. The rise in interest in supercomputing technology from Washington in the late 2000s and early 2010s is believed to be due to rivalry with China in this area.

Located on 48 racks, Mira is used for scientific purposes. For example, the supercomputer is used for climate and seismic modeling, which allows obtaining more accurate data on predicting earthquakes and climate change.

Supercomputer Mira

4. K Computer

  • Location: Japan
  • Performance: 10.51 petaflops
  • Theoretical maximum performance: 11.28 petaflops
  • Power: 12.6 MW

Developed by Fujitsu and located at the Institute of Physicochemical Research in Kobe, the K Computer is the only Japanese supercomputer to appear in the top ten of the Top 500.

At one time (June 2011), this cluster took first position in the ranking, becoming the most productive computer in the world for one year. And in November 2011, K Computer became the first in history to achieve power above 10 petaflops.

The supercomputer is used in a number of research tasks. For example, for forecasting natural disasters (which is important for Japan due to the increased seismic activity of the region and the high vulnerability of the country in the event of a tsunami) and computer modeling in the field of medicine.

Supercomputer K

3. Sequoia – Blue Gene/Q

  • Location: USA
  • Performance: 17.17 petaflops
  • Theoretical maximum performance: 20.13 petaflops
  • Power: 7.8 MW

The most powerful of the four supercomputers of the Blue Gene/Q family, which are in the top ten of the rating, is located in the United States at the Livermore National Laboratory. IBM developed Sequoia for the National Nuclear Security Administration (NNSA), which needed a high-performance computer for a very specific purpose: simulating nuclear explosions.

It is worth mentioning that real nuclear tests have been banned since 1963, and computer simulation is one of the most acceptable options for continuing research in this area.

However, the power of the supercomputer was used to solve other, much more noble problems. For example, the cluster managed to set performance records in cosmological modeling, as well as in creating an electrophysiological model of the human heart.

Sequoia supercomputer

2. Titan – Cray XK7

  • Location: USA
  • Performance: 17.59 petaflops
  • Theoretical maximum performance: 27.11 petaflops
  • Power: 8.2 MW

The most productive supercomputer ever created in the West, as well as the most powerful computer cluster under the Cray brand, is located in the United States at the Oak Ridge National Laboratory. Despite the fact that the supercomputer at the disposal of the US Department of Energy is officially available for any scientific research, in October 2012, when Titan was launched, the number of applications exceeded all limits.

Because of this, a special commission was convened at the Oak Ridge Laboratory, which selected only 6 of the most “advanced” projects out of 50 applications. Among them, for example, modeling the behavior of neutrons in the very heart of a nuclear reactor, as well as forecasting global climate changes for the next 1-5 years.

Despite its computing power and impressive dimensions (404 square meters), Titan did not last long on the pedestal. Just six months after the triumph in November 2012, American pride in the field of high-performance computing was unexpectedly supplanted by a native of the East, surpassing the previous leaders of the ranking in an unprecedented way.

Supercomputer Titan

1. Tianhe-2 / Milky Way-2

  • Location: China
  • Performance: 33.86 petaflops
  • Theoretical maximum performance: 54.9 petaflops
  • Power: 17.6 MW

Since its first launch, Tianhe-2, or Milky Way-2, has been the leader of the Top-500 for about two years. This monster is almost twice as powerful as the No. 2 in the ranking – the TITAN supercomputer.

Developed by the People's Liberation Army Defense Science and Technology University and Inspur, Tianhe-2 consists of 16 thousand nodes with a total number of cores of 3.12 million. The RAM of this colossal structure, which occupies 720 square meters, is 1.4 petabytes, and the storage device is 12.4 petabytes.

Milky Way 2 was designed at the initiative of the Chinese government, so it is not surprising that its unprecedented power appears to serve the needs of the state. It was officially stated that the supercomputer is engaged in various simulations, analyzing huge amounts of data, as well as ensuring the national security of China.

Considering the secrecy inherent in Chinese military projects, one can only guess what kind of use the Milky Way-2 receives from time to time in the hands of the Chinese army.

Supercomputer Tianhe-2

Home → History of domestic computer technology → Supercomputers

Supercomputers

Andrey Borzenko

Supercomputers are the fastest computers. Their main difference from mainframes is the following: all the resources of such a computer are usually aimed at solving one or at least several tasks as quickly as possible, while mainframes, as a rule, perform a fairly large number of tasks that compete with each other. The rapid development of the computer industry determines the relativity of the basic concept - what ten years ago could be called a supercomputer, today no longer falls under this definition. There is also a humorous definition of a supercomputer: it is a device that reduces the computing problem to an input-output problem. However, there is some truth in it: often the only bottleneck in a high-speed system is the I/O devices. You can find out which supercomputers currently have the maximum performance from the official list of the five hundred most powerful systems in the world - Top500 (http://www.top500.org), which is published twice a year.

In any computer, all the main parameters are closely related. It is difficult to imagine a universal computer that has high performance and scanty RAM, or huge RAM and a small disk space. For this reason, supercomputers are currently characterized not only by maximum performance, but also by the maximum amount of RAM and disk memory. Providing such technical characteristics is quite expensive - the cost of supercomputers is extremely high. What tasks are so important that they require systems costing tens or hundreds of millions of dollars? As a rule, these are fundamental scientific or engineering computing problems with a wide range of applications, the effective solution of which is possible only with the availability of powerful computing resources. Here are just a few areas where this type of problem arises:

  • predictions of weather, climate and global changes in the atmosphere;
  • materials science;
  • construction of semiconductor devices;
  • superconductivity;
  • structural biology;
  • development of pharmaceuticals;
  • human genetics;
  • quantum chromodynamics;
  • astronomy;
  • automotive industry;
  • transport tasks;
  • hydro- and gas dynamics;
  • controlled thermonuclear fusion;
  • efficiency of fuel combustion systems;
  • oil and gas exploration;
  • computational problems in ocean sciences;
  • speech recognition and synthesis;
  • image recognition.

Supercomputers calculate very quickly thanks not only to the use of the most modern element base, but also to new solutions in system architecture. The main place here is occupied by the principle of parallel data processing, which embodies the idea of ​​simultaneous (parallel) execution of several actions. Parallel processing has two types: pipeline and actual parallelism. The essence of pipeline processing is to highlight individual stages of performing a general operation, and each stage, having completed its work, passes the result to the next one, while simultaneously accepting a new portion of input data. An obvious gain in processing speed is obtained by combining previously spaced operations.

If a certain device performs one operation per unit of time, then it will perform a thousand operations in a thousand units. If there are five identical independent devices capable of operating simultaneously, then a system of five devices can perform the same thousand operations not in a thousand, but in two hundred units of time. Similarly, a system of N devices will perform the same work in 1000/N units of time.

Of course, today few people are surprised by parallelism in computer architecture. All modern microprocessors use some form of parallel processing, even within the same chip. At the same time, these ideas themselves appeared a very long time ago. Initially, they were implemented in the most advanced, and therefore single, computers of their time. Here, special credit goes to IBM and Control Data Corporation (CDC). We are talking about such innovations as bit-parallel memory, bit-parallel arithmetic, independent input/output processors, command pipeline, pipeline independent functional units, etc.

Usually the word “supercomputer” is associated with Cray computers, although today this is far from the case. The developer and chief designer of the first supercomputer was Seymour Cray, one of the most legendary figures in the computer industry. In 1972, he left CDC and founded his own company, Cray Research. The first supercomputer, CRAY-1, was developed four years later (in 1976) and had a vector-pipeline architecture with 12 pipelined functional units. The Cray-1's peak performance was 160 MT/s (12.5 ns clock time), and the 64-bit RAM (which could be expanded to 8 MB) had a cycle time of 50 ns. The main innovation was, of course, the introduction of vector commands that work with entire arrays of independent data and allow efficient use of pipeline functional devices.

Throughout the 60-80s, the attention of the world's leaders in the production of supercomputers was focused on the production of computing systems that were good at solving large-volume floating-point problems. There was no shortage of such tasks - almost all of them were related to nuclear research and aerospace modeling and were carried out in the interests of the military. The desire to achieve maximum performance in the shortest possible time meant that the criterion for assessing the quality of a system was not its price, but its performance. For example, the Cray-1 supercomputer then cost from 4 to 11 million dollars, depending on the configuration.

At the turn of the 80-90s. The Cold War ended and military orders were replaced by commercial ones. By that time, the industry had made great strides in the production of serial processors. They had approximately the same computing power as custom ones, but were significantly cheaper. The use of standard components and a variable number of processors made it possible to solve the scalability problem. Now, as the computing load increased, it was possible to increase the performance of the supercomputer and its peripheral devices by adding new processors and I/O devices. Thus, in 1990, the Intel iPSC/860 supercomputer appeared with the number of processors equal to 128, which showed a performance of 2.6 Gflops on the LINPACK test.

Last November, the 18th edition of the list of the 500 most powerful computers in the world - Top500 - was published. The leader of the list is still IBM Corporation (http://www.ibm.com), which owns 32% of installed systems and 37% of total productivity. Interesting news was the emergence of Hewlett-Packard in second place in terms of the number of systems (30%). Moreover, since all these systems are relatively small, their total performance is only 15% of the entire list. Following the merger with Compaq, the new company is expected to dominate the list. Next in terms of number of computers on the list are SGI, Cray and Sun Microsystems.

The most powerful supercomputer in the world was still the ASCI White system (we will return to it later), installed at the Livermore Laboratory (USA) and showing a performance of 7.2 Tflops on the LINPACK test (58% of peak performance). In second place was the Compaq AlphaServer SC system installed at the Pittsburgh Supercomputing Center with a performance of 4 Tflops. The Cray T3E system closes the list with LINPACK performance of 94 Gflops.

It is worth noting that the list already included 16 systems with a performance of more than 1 teraflops, half of which were installed by IBM. The number of systems that are clusters of small SMP blocks is steadily increasing - there are now 43 such systems on the list. However, the majority of the list is still for massively parallel systems (50%), followed by clusters consisting of large SMP systems (29%).

Types of architectures

The main parameter for classifying parallel computers is the presence of shared or distributed memory. Something in between are architectures where memory is physically distributed but logically shared. From a hardware point of view, two main schemes suggest themselves for implementing parallel systems. The first is several separate systems, with local memory and processors, interacting in some environment by sending messages. The second is systems that interact through shared memory. Without going into technical details for now, let's say a few words about the types of architectures of modern supercomputers.

The idea of ​​massively parallel systems with distributed memory (Massively Parallel Processing, MPP) is quite simple. For this purpose, ordinary microprocessors are taken, each of which is equipped with its own local memory and connected through some kind of switching medium. There are many advantages to such an architecture. If you need high performance, you can add more processors, and if finances are limited or the required computing power is known in advance, then it is easy to select the optimal configuration. However, MPP also has disadvantages. The fact is that interaction between processors is much slower than data processing by the processors themselves.

In parallel computers with shared memory, all the RAM is shared among several identical processors. This removes the problems of the previous class, but adds new ones. The fact is that the number of processors with access to shared memory cannot be made large for purely technical reasons.

The main features of vector-pipeline computers are, of course, pipeline functional units and a set of vector instructions. Unlike the traditional approach, vector commands operate on entire arrays of independent data, which allows efficient loading of available pipelines.

The last direction, strictly speaking, is not independent, but rather a combination of the previous three. A computing node is formed from several processors (traditional or vector-pipeline) and their common memory. If the obtained computing power is not enough, then several nodes are combined with high-speed channels. As you know, such an architecture is called cluster.

MPP systems

Massively parallel scalable systems are designed to solve application problems that require a large amount of computing and data processing. Let's take a closer look at them. As a rule, they consist of homogeneous computing nodes, including:

  • one or more central processing units;
  • local memory (direct access to the memory of other nodes is not possible);
  • communication processor or network adapter;
  • sometimes hard drives and/or other input/output devices.

In addition, special I/O nodes and control nodes can be added to the system. They are all connected through some communication medium (high-speed network, switch, etc.). As for the OS, there are two options. In the first case, a full-fledged OS runs only on the control machine, while each node runs a greatly reduced version of the OS, providing only the operation of the branch of the parallel application located in it. In another case, each node runs a full-fledged UNIX-like OS.

The number of processors in distributed memory systems is theoretically unlimited. Using such architectures, it is possible to build scalable systems whose performance increases linearly with the number of processors. By the way, the term “massively parallel systems” itself is usually used to refer to such scalable computers with a large number (tens and hundreds) of nodes. Scalability of a computing system is necessary to proportionally speed up calculations, but, alas, it is not enough. To obtain an adequate gain in solving a problem, a scalable algorithm is also required that can load all the processors of a supercomputer with useful calculations.

Let us recall that there are two models of program execution on multiprocessor systems: SIMD (single instruction stream - multiple data streams) and MIMD (multiple instruction streams - multiple data streams). The first assumes that all processors execute the same command, but each on its own data. In the second, each processor processes its own command stream.

In distributed memory systems, in order to transfer information from processor to processor, a mechanism for passing messages over a network connecting computing nodes is required. To abstract from the details of the functioning of communication equipment and program at a high level, message passing libraries are usually used.

Intel Supercomputers

Intel Corporation (http://www.intel.com) is well known in the world of supercomputers. Its distributed-memory Paragon multiprocessor computers have become as classic as Cray Research's vector-pipeline computers.

Intel Paragon uses five i860 XP processors with a clock frequency of 50 MHz in one node. Sometimes processors of different types are placed in one node: scalar, vector and communication. The latter serves to relieve the main processor from performing operations related to message transmission.

The most significant characteristic of the new parallel architecture is the type of communication equipment. The two most important indicators of a supercomputer’s operation depend on it: the speed of data transfer between processors and the overhead of transmitting one message.

The interconnect is designed to provide high messaging speeds with minimal latency. It provides the connection of more than a thousand heterogeneous nodes along a two-dimensional rectangular lattice topology. However, for most application development, any node can be considered to be directly connected to all other nodes. The interconnect is scalable: its throughput increases with the number of nodes. When designing, the developers sought to minimize the participation in message transmission of those processors that execute user processes. For this purpose, special message processing processors have been introduced, which are located on the node board and are responsible for processing the messaging protocol. As a result, the main processors of the nodes are not distracted from solving the problem. In particular, there is no expensive switching from task to task, and the solution of applied problems occurs in parallel with the exchange of messages.

The actual transmission of messages is carried out by a routing system based on the components of the router of network nodes (Mesh Router Components, MRC). For MRC access of a given node to its memory, the node also has a special interface network controller, which is a custom VLSI that provides simultaneous transmission to and from the node’s memory, as well as monitoring errors during message transmission.

The modular design of Intel Paragon does more than just support scalability. It allows us to count on the fact that this architecture will serve as the basis for new computers based on other microprocessors or using new messaging technologies. Scalability also relies on balancing the various blocks of a supercomputer at a variety of levels; otherwise, as the number of nodes increases, a bottleneck may appear somewhere in the system. Thus, the speed and memory capacity of the nodes are balanced with the bandwidth and latency of the interconnect, and the performance of the processors inside the nodes is balanced with the bandwidth of the cache memory and RAM, etc.

Until recently, one of the fastest computers was Intel ASCI Red - the brainchild of the Accelerated Strategic Computing Initiative ASCI (Accelerated Strategic Computing Initiative). The three largest US national laboratories (Livermore, Los Alamos and Sandia) participate in this program. Built for the US Department of Energy in 1997, ASCI Red combines 9152 Pentium Pro processors, has 600 GB of total RAM and a total performance of 1800 billion operations per second.

IBM supercomputers

When universal systems with scalable parallel architecture SP (Scalable POWER parallel) from IBM Corporation (http://www.ibm.com) appeared on the computer market, they quickly gained popularity. Today, such systems operate in a variety of application areas, such as computational chemistry, accident analysis, electronic circuit design, seismic analysis, reservoir modeling, decision support, data analytics, and online transaction processing. The success of SP systems is determined primarily by their versatility, as well as the flexibility of the architecture, based on a distributed memory model with message passing.

Generally speaking, an SP supercomputer is a scalable, massively parallel general-purpose computing system consisting of a set of RS/6000 base stations connected by a high-performance switch. Indeed, who doesn’t know, for example, the supercomputer Deep Blue, which managed to beat Garry Kasparov at chess? But one of its modifications consists of 32 nodes (IBM RS/6000 SP), based on 256 P2SC (Power Two Super Chip) processors.

The RS/6000 family is IBM's second generation of computers, based on the limited instruction set architecture (RISC) developed by the corporation in the late 1970s. With this concept, a very simple set of commands is used to do all the work in a computer system. Because the commands are simple, they can be executed at very high speeds and also provide a more efficient implementation of the executable program. The RS/6000 family is based on the POWER architecture (Performance Optimized by Advanced RISC architecture) and its derivatives - PowerPC, P2SC, POWER3, etc. Because the POWER architecture combines RISC architecture concepts with some more traditional concepts, the result is system with optimal overall performance.

The RS/6000 SP system provides the power of multiple processors to solve the most complex computing problems. The SP switching system is IBM's latest innovation in high-bandwidth, latency-free interprocessor communication for efficient parallel computing. Several types of processor nodes, variable frame (rack) sizes and a variety of additional I/O capabilities ensure the selection of the most suitable system configuration. SP is supported by leading software vendors in areas such as parallel databases and real-time transaction processing, as well as major technical software vendors in areas such as seismic processing and engineering design.

IBM RS/6000 SP enhances application capabilities with parallel processing. The system removes performance limitations and helps avoid problems associated with scaling and the presence of indivisible, separately executed fragments. With over a thousand customers installed worldwide, SPs provide solutions for complex and high-volume technical and commercial applications.

The SP main unit is a processor node that has an RS/6000 workstation architecture. There are several types of SP nodes: Thin, Wide, High, differing in a number of technical parameters. For example, High nodes based on POWER3-II include up to 16 processors and up to 64 GB of memory, but Thin nodes allow no more than 4 processors and 16 GB of memory.

The system is scalable up to 512 nodes, and it is possible to combine different types of nodes. The nodes are installed in racks (up to 16 nodes in each). SP can scale disks almost linearly along with processors and memory, allowing true access to terabytes of memory. This increase in power makes it easier to build and expand the system.

The nodes are interconnected by a high-performance switch (IBM high-performance switch), which has a multi-stage structure and operates with packet switching.

Each SP node runs a full AIX operating system, allowing you to leverage thousands of pre-existing AIX applications. In addition, system nodes can be combined into groups. For example, several nodes can act as Lotus Notes servers, while all the others process a parallel database.

Managing large systems is always a challenging task. SP uses a single graphical console for this purpose, which displays hardware and software states, running tasks, and user information. The system administrator, using such a console (control workstation) and the PSSP (Parallel Systems Support Programs) software product attached to the SP, solves management tasks, including managing password protection and user permissions, accounting for performed tasks, print management, system monitoring, launching and turning off the system.

The best

As already noted, according to Top500 (table), the most powerful supercomputer of our time is ASCI White, which occupies an area the size of two basketball courts and is installed at the Livermore National Laboratory. It includes 512 SMP nodes based on 64-bit POWER3-II processors (for a total of 8192 processors) and uses new Colony communications technology with a throughput of approximately 500 MB/s, which is almost four times faster than the SP high-performance switch.

Top ten Top500 (18th edition)

Position Manufacturer Computer Where installed A country Year Number of processors
1 IBM ASCI White USA 2000 8192
2 Compaq AlphaServer SC Pittsburgh Supercomputing Center USA 2001 3024
3 IBM SP Power3 NERSC Energy Research Institute USA 2001 3328
4 Intel ASCI Red Sandia National Laboratory USA 1999 9632
5 IBM ASCI Blue Pacific Livermore National Laboratory USA 1999 5808
6 Compaq AlphaServer SC USA 2001 1536
7 Hitachi SR8000/MPP Tokyo University Japan 2001 1152
8 SGI ASCI Blue Mountain Los Alamos National Laboratory USA 1998 6144
9 IBM SP Power3 Oceanographic Center NAVOCEANO USA 2000 1336
10 IBM SP Power3 German weather service Germany 2001 1280

The architecture of the new supercomputer is based on the proven massively parallel RS/6000 architecture and provides performance of 12.3 teraflops (trillion operations per second). The system includes a total of 8 TB of RAM distributed across 16-processor SMP nodes and 160 TB of disk memory. Delivering the system from IBM laboratories in New York state to Livermore, California, required 28 truck-trailers.

All system nodes run the AIX OS. The supercomputer is being used by US Department of Energy scientists to run complex 3D models to keep nuclear weapons safe. Actually, ASCI White is the third step in ASCI's five-stage program, which plans to create a new supercomputer in 2004. Generally speaking, ASCI White consists of three separate systems, of which White is the largest (512 nodes, 8192 processors), and there is also Ice (28 nodes, 448 processors) and Frost (68 nodes, 1088 processors).

The predecessor of ASCI White was the Blue Pacific supercomputer (another name for ASCI Blue), which included 1464 four-processor nodes based on PowerPC 604e/332 MHz crystals. The nodes are connected into a single system using cables totaling nearly five miles, and the computer room area is 8 thousand square feet. The ASCI Blue system consists of a total of 5856 processors and provides peak performance of 3.88 teraflops. The total amount of RAM is 2.6 TB.

A supercomputer consists of kilometers of cables.

The US National Center for Atmospheric Research (NCAR) has selected IBM as the supplier of the world's most powerful supercomputer designed to predict climate change. The system, known as Blue Sky, will increase NCAR's climate modeling capabilities by an order of magnitude when fully operational this year. The core of Blue Sky will be the IBM SP supercomputer and IBM eServer p690 systems, the use of which will achieve peak performance of almost 7 Tflops with a volume of IBM SSA disk subsystem of 31.5 TB.

The supercomputer, called Blue Storm, is being created by order of the European Center for Medium-Range Weather Forecasts (ECMWF). Blue Storm will be twice as powerful as ASCI White. To create it, you need 100 IBM eServer p690 servers, also known as Regatta. Each system unit, the size of a refrigerator, contains more than a thousand processors. In 2004, Blue Storm will be equipped with new generation p960 servers, which will make it twice as powerful. The supercomputer will run the AIX OS. Initially, the total capacity of Blue Storm drives will be 1.5 petabytes, and the computing power will be about 23 teraflops. The system will weigh 130 tons, and will be 1,700 times more powerful than the Deep Blue chess supercomputer.

IBM researchers are working with Livermore National Laboratory on the Blue Gene/L and Blue Gene/C computers. These computers are part of the 5-year Blue Gene project, which began back in 1999 to study proteins, in which $100 million was invested. The creation of a new supercomputer Blue Gene/L (200 teraflops) will be completed in 2004 - for six months - a year earlier than the expected completion of work on the more powerful Blue Gene/C computer (1000 teraflops). The design performance of Blue Gene/L will thus exceed the combined performance of the 500 most powerful computers in the world. At the same time, the new supercomputer occupies an area equal to only half a tennis court. IBM engineers also worked to reduce energy consumption - they managed to reduce it by 15 times.

Notes

LINPACK tests.
LINPACK benchmarks are based on solving a system of linear equations with a dense matrix of coefficients over a real number field using Gaussian elimination. Real numbers are usually represented with full precision. Due to the large number of operations on real numbers, LINPACK results are considered to be the benchmark for the performance of hardware and software configurations in areas that intensively use complex mathematical calculations.

Earth Simulator.
According to New Scientist magazine, in the new, 19th version of the Top500 list of supercomputers, the supercomputer system for the NEC Corporation's Earth Simulator project will take first place. It is installed at the Japanese Institute of Earth Sciences (Yokohama Institute for Earth Sciences) in Kanagawa, Yokohama Prefecture. The developers claim that its peak performance can reach 40 Tflops.

The Earth Simulator supercomputer is designed to simulate climate change based on data received from satellites. According to NEC representatives, high computer performance is achieved through the use of specially designed vector processors. The system is based on 5120 such processors, combined into 640 SX-6 nodes (8 processors each). The supercomputer runs SUPER-UX OS. Development tools include compilers for C/C++, Fortran 90 and HPF languages, as well as automatic vectorization tools, an implementation of the MPI-2 interface and the ASL/ES mathematical library. The entire machine occupies the area of ​​three tennis courts (50 by 65 m) and uses several kilometers of cable.

The K Computer supercomputer, which previously occupied first place, has been pushed to third place. Its performance is 11.28 Pflops (see Figure 1). Let us recall that FLOPS (FLoating-point Operations Per Second, FLOPS) is a unit of measurement of computer performance, which shows how many floating point operations per second a given computing system is capable of performing.

K Computer is a joint development of the Rikagaku Kenkiyo Institute of Physical and Chemical Research (RIKEN) and Fujitsu. It was created as part of the High-Performance Computing Infrastructure initiative led by the Japanese Ministry of Education, Culture, Sports, Science and Technology (MEXT). The supercomputer is installed on the territory of the Institute of Advanced Computing Sciences in the Japanese city of Kobe.

The supercomputer is based on a distributed memory architecture. The system consists of more than 80,000 compute nodes and is housed in 864 racks, each of which accommodates 96 compute nodes and 6 I/O nodes. The nodes, each containing one processor and 16 GB of RAM, are interconnected in accordance with the “six-dimensional loop / torus” topology. The system uses a total of 88,128 eight-core SPARC64 VIIIfx processors (705,024 cores) manufactured by Fujitsu using 45 nm technology.

This general purpose supercomputer provides high levels of performance and support for a wide range of applications. The system is used to conduct research in the fields of climate change, disaster prevention and medicine.

The unique water cooling system reduces the likelihood of equipment failure and reduces overall energy consumption. Energy savings are achieved through the use of highly efficient equipment, a heat and electricity cogeneration system and an array of solar panels. In addition, the mechanism for reusing waste water from the cooler reduces the negative impact on the environment.

The building in which K Computer is located is earthquake-resistant and can withstand earthquakes of magnitude 6 or more on the Japanese scale (0-7). To more efficiently accommodate equipment racks and cables, the third floor, measuring 50 × 60 m, is completely free of load-bearing columns. Modern construction technologies have made it possible to ensure an acceptable load level (up to 1 t/m2) for the installation of racks, the weight of which can reach 1.5 tons.

SEQUOIA SUPERCOMPUTER

The Sequoia supercomputer installed at the Lawrence Livermore National Laboratory. Lawrence, has a performance of 16.32 Pflops and ranks second in the ranking (see Figure 2).

This petaflop supercomputer, developed by IBM based on Blue Gene/Q, was created for the US National Nuclear Security Administration (NNSA) as part of the Advanced Simulation and Computing program.

The system consists of 96 racks and 98,304 compute nodes (1024 nodes per rack). Each node includes a 16-core PowerPC A2 processor and 16 GB of DDR3 RAM. In total, 1,572,864 processor cores and 1.6 PB of memory are used. The nodes are connected to each other in accordance with the “five-dimensional torus” topology. The area occupied by the system is 280 m2. Total energy consumption is 7.9 MW.

The Sequoia supercomputer was the first in the world to carry out scientific calculations that required more than 10 Pflops of computing power. Thus, the HACC cosmology simulation system required about 14 Pflops when running in the 3.6 trillion particle mode, and when running the Cardiod project code for simulating the electrophysiology of the human heart, performance reached almost 12 Pflops.

TITAN SUPERCOMPUTER

The Titan supercomputer, installed at the Oak Ridge National Laboratory (ORNL) in the USA, was recognized as the world's fastest supercomputer. In Linpack benchmark tests, its performance was 17.59 Pflops.

Titan implements a hybrid CPU-GPU architecture (see Figure 3). The system consists of 18,688 nodes, each equipped with a 16-core AMD Opteron processor and an Nvidia Tesla K20X graphics accelerator. A total of 560,640 processors are used. Titan is an update to ORNL's previously operated Jaguar supercomputer and occupies the same server cabinets (total area of ​​404 m2).

The ability to use existing power and cooling systems saved approximately $20 million during construction. The power consumption of the supercomputer is 8.2 MW, which is 1.2 MW more than the Jaguar, while its performance in floating point operations is almost 10 times higher.

Titan will primarily be used to conduct research in materials science and nuclear energy, as well as research related to improving the efficiency of internal combustion engines. In addition, it will be used to model climate change and analyze potential strategies to address its negative impacts.

THE "GREENEST" SUPERCOMPUTER

In addition to the Top500 rating, aimed at identifying the most high-performance system, there is a Green500 rating, which recognizes the “greenest” supercomputers. Here, the energy efficiency indicator (Mflops/W) is taken as a basis. At the moment (the latest release of the rating is November 2012), the leader of the Green500 is the supercomputer Beacon (253rd place in the Top500). Its energy efficiency indicator is 2499 Mflops/W.

Beacon is powered by Intel Xeon Phi 5110P coprocessors and Intel Xeon E5-2670 processors, so peak performance can reach 112,200 Gflops with a total power consumption of 44.9 kW. Xeon Phi 5110P coprocessors provide high performance with low power consumption. Each coprocessor has 1 teraflops of power (double precision) and supports up to 8 GB of GDDR5 memory with 320 Gbps bandwidth.

The Xeon Phi 5110P's passive cooling system is rated at 225W TDP, which is ideal for high-density servers.

SUPERCOMPUTER EURORA

However, in February 2013, reports emerged that the Eurora supercomputer, located in Bologna (Italy), surpassed Beacon in energy efficiency (3150 Mflops/watt versus 2499 Mflops/W).

Eurora is built by Eurotech and consists of 64 nodes, each of which includes two Intel Xeon E5-2687W processors, two Nvidia Tesla K20 GPU accelerators and other hardware. The dimensions of such a node do not exceed the dimensions of a laptop, but their performance is 30 times higher and power consumption is 15 times lower.

High energy efficiency in Eurora is achieved through the use of several technologies. Water cooling makes the greatest contribution. Thus, each supercomputer node is a kind of sandwich: central equipment at the bottom, a water heat exchanger in the middle, and another electronics unit at the top (see Figure 4).

Such high results are achieved thanks to the use of materials with good thermal conductivity, as well as an extensive network of cooling channels. When installing a new computing module, its channels are combined with the channels of the cooling system, which allows you to change the configuration of the supercomputer depending on specific needs. According to the manufacturers, the risk of leaks is eliminated.

The Eurora supercomputer elements are powered by 48-volt DC sources, the introduction of which has reduced the number of energy conversions. Finally, the warm water removed from computing equipment can be used for other purposes.

CONCLUSION

The supercomputer industry is actively developing and setting more and more new records for performance and energy efficiency. It should be noted that it is in this industry, like nowhere else, that liquid cooling and 3D modeling technologies are widely used today, since specialists are faced with the task of assembling a super-powerful computing system that would be able to function in a limited volume with minimal energy losses.

Yuri Khomutsky- Chief Project Engineer at I-Teco. He can be contacted at: [email protected]. The article uses materials from the Internet portal about data centers “www.AboutDC.ru - Solutions for Data Centers”.

Reading time: 2 minutes.

Until now, humanity has not reached the waste heaps of Mars, has not invented the elixir of youth, cars cannot yet soar above the ground, but there are several areas in which we have still succeeded. The creation of powerful supercomputers is just such an area. To evaluate the power of a computer, you need to determine which key parameter is responsible for this characteristic. This parameter is flops - a value that shows how many operations a PC can perform in one second. It is on the basis of this value that our Big Rating magazine ranked the most powerful computers in the world for 2017.

Supercomputer power - 8.1 Pflop/sec

This computer stores data that is responsible for the security of the United States military structure, and it is also responsible for the state of readiness for a nuclear attack, if necessary. Two years ago this machine was one of the most powerful and expensive in the world, but today Trinity has been replaced by newer devices. The system on which this supercomputer runs is Cray XC40, thanks to which the device can “perform” such a number of operations per second.

Mira

Supercomputer power – 8.6 Pflop/sec

Cray has released another supercomputer, Mira. The US Department of Energy ordered the production of this machine to coordinate its work. The area in which Mira operates is industry and the development of research potential. This supercomputer can calculate 8.6 petaflops per second.

Supercomputer power – 10.5 Pflop/sec

The name of this device immediately describes the power, the Japanese word “kei” (K) means ten quadrillion. This figure almost exactly describes its production capacity - 10.5 petaflops. The highlight of this supercomputer is its cooling system. Water cooling is used, which reduces the consumption of energy reserves and reduces the assembly speed.

Supercomputer power – 13.6 Pflop/sec

Fujitsu, a company from the Land of the Rising Sun, did not stop working, having released the K Computer supercomputer, they immediately began a new project. This project was the Oakforest-Pacs supercomputer, which is classified as a new generation of machines (Knights landing generation). Its development was commissioned by the University of Tokyo and Tsukuba University. According to the original plan, the device's memory was supposed to be 900 TB, and the performance of Oakforest-Pacs would be 25 quadrillion operations per second. But due to a lack of funding, many aspects were not finalized, so the power of the supercomputer was 13.6 petaflops per second.

Cori

Supercomputer power – 14 Pflop/sec

Last year, Cori was in sixth place on the list of the most powerful supercomputers in the world, but with the crazy speed of technology development, it lost one position. This supercomputer is located in the United States, at the Lawrence Berkeley National Laboratory. Scientists from Switzerland, with the help of Cori, were able to develop a 45-qubit quantum computing machine. The production capacity of this supercomputer is 14 petaflops per second.

Supercomputer power – 17.2 Pflop/sec

Scientists from all over the world have long agreed that Sequoia is the fastest supercomputer on the planet. And this is not just so, because he is able to perform arithmetic calculations that would take 6.7 billion people 320 years, in one second. The size of the machine is truly amazing - it occupies more than 390 square meters and includes 96 racks. Sixteen thousand trillion operations or in other words 17.2 petaflops is the production capacity of this supercomputer.

Titan

Supercomputer power – 17.6 Pflop/sec

In addition to being one of the fastest supercomputers on the planet, it is also very energy efficient. The energy efficiency indicator is 2142.77 megaflops per watt of energy required for consumption. The reason for this low power consumption is the Nvidia accelerator, which provides up to 90% of the power needed for computing. In addition, the Nvidia accelerator has significantly reduced the area occupied by this supercomputer, now it only needs 404 square meters.

Supercomputer power – 19.6 Pflop/sec

The first launch of this device took place in 2013, in Switzerland, in the city of Lugano. Now the geolocation of this supercomputer is the Swiss National Supercomputing Center. Piz Daint is a combination of all the best features of the above machines, it has a very high energy efficiency and is very fast in calculations. Only one characteristic leaves much to be desired - the dimensions of this supercomputer; it occupies 28 huge racks. Piz Daint is capable of 19.6 petaflops of computing power per second.

Supercomputer power – 33.9 Pflop/sec

This device has the romantic name Tianhe, which in Chinese means “Milky Way”. Tianhe-2 was the fastest computer on the list of the 500 fastest and most powerful supercomputers. It can calculate 2507 arithmetic operations, which in petaflops is 33.9 Pflops/sec. The specialization in which this computer is used is construction; it calculates operations related to building and laying roads. Since its first launch in 2013, this computer has not lost its position in the lists, which proves that it is one of the best machines in the world.

Supercomputer power – 93 Pflop/sec

Sunway TaihuLight is the fastest supercomputer in the world, in addition to its enormous computing speed, it is also famous for its huge dimensions - it occupies an area of ​​more than 1000 square meters. The 2016 international conference, which took place in Germany, recognized this supercomputer as the fastest in the world, and it still does not have a serious competitor in this regard. Its speed is three times higher than Tianhe-2, the closest supercomputer in this regard!

Technological progress does not stand still, it develops at cosmic speed, affects many aspects of human life, and has many both positive and negative sides. Technology of various types has now become available to humans: computers, robots and instruments. But the main purpose of any equipment is to simplify a person’s life; technology should not become meaningless entertainment that will only waste your time.



Share