Sunway SW26010 Pro Marks a Milestone in Domestic Innovation
China has unleashed its latest marvel in the world of supercomputing, introducing a powerhouse processor boasting 384 cores and an astounding capability of over 13 trillion floating-point operations per second (TFLOPS).
The Sunway SW26010 Pro CPU, a domestically developed chip, marks a significant stride in China's quest to bolster its supercomputing prowess while minimizing dependence on foreign technology. Nevertheless, experts caution that the processor grapples with challenges in cache and memory performance.
Revealed at the SC23 conference by the National Supercomputing Center in Wuxi, the Sunway SW26010 Pro CPU's architecture and design were brought into focus. According to insights from Chips and Cheese, the CPU adopts a proprietary 64-bit RISC instruction set, featuring six core groups (CG) and a protocol processing unit (PPU). Each CG consists of 64 compute processing elements (CPEs) and one management processing element (MPE).
SUNWAY SW26010 PRO SPECS AND CORE DIAGRAM [IMAGE COURTESY OF CHIPS AND CHEESE]
The CPEs are equipped with a 512-bit vector engine and a 256 KB scratchpad cache, while the MPEs boast a scalar engine and a 256 KB L2 cache. Additionally, each CG includes a 128-bit DDR4-3200 memory interface and 16 GB of DDR4 memory.
An upgraded iteration of the Sunway SW26010, previously employed in the world-beating Sunway TaihuLight supercomputer of 2016 and 2017, the new CPU enhances clock speed, instruction set, and memory bandwidth. This evolution results in a remarkable four-fold increase in FP64 performance, with the Sunway SW26010 Pro CPU achieving a peak FP64 performance of 13.8 TFLOPS.
Comparatively, this performance outshines AMD's 96-core EPYC 9654 CPU, which peaks at around 5.4 TFLOPS.
Despite its groundbreaking features, the Sunway SW26010 Pro CPU grapples with limitations. Notably, its restricted cache and memory hierarchy pose challenges, impacting performance for certain applications. The need for a larger scratchpad cache for CPEs and the absence of a proper L2 cache necessitate frequent data fetching from the main memory.
Moreover, the CPU's memory subsystem requires expansion to support the high bandwidth demand of its 384 cores, each capable of up to 16 FP64 FLOPS/cycle. These bottlenecks could potentially curtail the scalability and efficiency of both the CPU and the supercomputer it powers.
The Sunway SW26010 Pro CPU is a very important development in China's strides in supercomputing innovation, showcasing the nation's ambition and prowess in domains such as scientific research, artificial intelligence, and national security. However, the processor's limitations in cache and memory design underscore the need for further advancements to secure China's supercomputing supremacy.
While the Sunway SW26010 Pro CPU is a formidable addition to China's technological arsenal, addressing current challenges is imperative to ensure sustained excellence in the dynamic realm of supercomputing.
 COMMENTS