Optimization of Linpack for Loongson 3B processor

Gang LIU, Heng ZHANG, Dian ZHANG, Rui MAO*

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review


High performance Linpack (HPL) is a linpack benchmark package widely adopted in high performance computing. An efficient partition strategy is introduced by Loongson 3B processor's architectural features in the matrix multiplication, and the cache lock mechanism which locks the frequently used data blocks into the locked cache is introduced to reduce the missing cache. To make the computation cost hides the memory access cost, a new prefetching algorithm is included in the memory access acceleration device. Other functions, such as dtrsm and line swapping, are optimized, and the optimal value is achieved for each parameter by training. Experimental results indicate that both single-node (4 cores) and double-node (8 cores) have achieved about 60% of theoretical peak performance, which are nearly 10 times performance improvement compared with non-optimized Linpack.


Original languageEnglish
Pages (from-to)286-292
Number of pages7
JournalShenzhen Daxue Xuebao (Ligong Ban)/Journal of Shenzhen University Science and Engineering
Issue number3
Publication statusPublished - May 2014
Externally publishedYes

Bibliographical note

Foundation: National High-Tech Research and Development Program of China (2012AA01A30904); Academician Workstation Construction Projects in Guangdong Province (2012B090500020).


  • Computer architecture
  • Data prefetching
  • Linear system package
  • Loongson 3B processor
  • Matrix multiplication
  • 计算机系统结构
  • 龙芯3B处理器
  • 线性系统软件包
  • 矩阵来法
  • 数据预取


Dive into the research topics of 'Optimization of Linpack for Loongson 3B processor'. Together they form a unique fingerprint.

Cite this