Essential for Performance Optimization
February 21, 2012
What distinguishes “Programming Massively Parallel Processors: A Hands-On Approach” from other books is the precise description of how the hardware works.
This book was published about two years ago. For this reason, parts of the book are outdated, as many constants are provided for the G80 and G200 architectures. The currently up-to-date Fermi architecture is only briefly introduced in Chapter 12.
But: What sets this book apart from others is the detailed description of how the hardware functions, such as warp scheduling and memory access. Furthermore, using two case studies, sequential code is transformed step-by-step into the most optimal CUDA code possible. Here, the reader learns how to rewrite and adapt code for the GPU and gains important tips such as “loop fission,” “latency hiding,” and “memory coalescing.” One also learns that writing optimal code requires significant (mental) effort.
There are, however, a few points of criticism: the examples require knowledge of mathematics, the index is incomplete, and the section on OpenCL doesn’t offer much, as it is too brief for an introduction. Additionally, CUDA arrays (2D, 3D) and textures are not treated in sufficient detail.
Anyone who wants to write CUDA programs that utilize the GPU as efficiently as possible cannot afford to skip this book.
- David Kirk, Wen-Mei W. Hwu
- Programming Massively Parallel Processors: A Hands-On Approach
- Morgan Kaufman
- 2010
See also the review on Amazon