High Performance Computing Laboratory

    Home | Research | Publication | People | Links

High Performance On-Chip Interconnects Design for Multicore Accelerators

High Performance Computing Laboratory

Bandwidth Efficient On-Chip Interconnect Designs

GPGPUs are characterized by numerous programmable computational cores which allow for thousands of simultaneous active threads to execute in parallel. The advent of parallel programming models, such as CUDA and OpenCL, makes it easier to program graphics/non-graphics applications, making GPGPUs an excellent computing platform. The growing quantity of parallelism and the fast scaling of GPGPUs have fueled an increasing demand for performance-efficient on-chip fabrics finely tuned for GPGPU cores and memory systems.

Ideal interconnects should minimize message blocking by efficiently exploiting limited network resources such as virtual channels (VCs) and physical channels (PCs) while ensuring deadlock freedom. Switch-based Networks-on-Chip (NoCs) have been useful in manycore chip-multiprocessor (CMP) environments for their scalability and flexibility. Unlike CMP systems where NoC traffic tends to be uniformly divided up across cores communicating with distributed on-chip caches, the communication in GPGPUs is highly asymmetric, mainly between many compute cores and a few memory controllers (MCs) on a chip. Thus the MCs often become hot spots, leading to skewed usage of significant portions of the NoC resources such as wires and buffers. Specifically, heavy reply traffic from MCs to cores potentially causes a network bottleneck, degrading the overall system performance. Therefore when we design a bandwidth-efficient NoC, the asymmetry of its on-chip traffic must be considered.

The throughput-effectiveness is a crucial metric for improving the overall performance in throughput-oriented architectures, thus designing a high bandwidth NoC in GPGPUs is of primary importance. Thus, to achieve such a goal, we quantitatively analyze the impact of network traffic patterns in GPGPUs with different MC placements and dimension order routing algorithms. Then, motivated by the detailed analysis, we propose VC monopolizing and partitioning schemes which dramatically improve NoC resource utilization without causing protocol deadlocks. We also investigate the impact of different routing algorithms under diverse MC placements.

Packet Type Distribution for GPGPU Benchmarks


Network traffic examples with XY and XY-YX routing


  • Bandwidth Efficient On-Chip Interconnect Designs for GPGPUs,
    H. Jang, J. Kim, P. Gratz, K. H. Yum, and E. J. Kim,
    The 52nd Design Automation Conference (DAC), June 2015

    © 2004 High Performance Computing Laboratory, Department of Computer Science, Texas A&M University
    427C Harvey R. Bright Bldg, College Station, TX 77843-3112