The best introduction to CUDA
March 13, 2011
“Cuda by Example” is the perfect start to CUDA for beginners. However, even some advanced users and professionals might enjoy the clear explanations and examples.
There are those books – especially on new topics – where you get the impression that the authors simply copied the documentation and just embellished it a bit. That is definitely not the case here.
The book title “CUDA by Example” says it all: learning is done through examples. And while numerical algorithms, such as vector addition or matrix multiplication, are covered, they aren’t considered “fun” by many. Therefore, the authors have put together graphical examples: Julia sets, a simple ray tracer, and heat transfer.
Parallel algorithms used to be an “esoteric” topic for theoretical computer scientists. With CUDA, everyone can now program massively in parallel and achieve 10- to 100-fold speed increases. Parallelism is introduced step-by-step, slowly and carefully, and is always explained in detail. The starting point is sequential code for the CPU, which is then transformed.
The goal of the book is to teach the reader the basics of CUDA so they can develop their own applications. The theory of parallel algorithms is not covered, but it is also not required within the context of this book. For parallel reading and as a reference, you will need the “NVIDIA CUDA Programming Guide” and the “NVIDIA CUDA Best Practices,” both of which are provided by NVIDIA with the CUDA Toolkit.
However, it is an introductory book, and the following advanced topics are not covered:
- Performance optimization of kernels
- Optimization of memory access (“coalescing”)
- CUDA Arrays, 3D memory
- CUDA hardware in detail, e.g., warp scheduling
- Avoiding divergence
- New features in Fermi, e.g., caching
- CUDA Driver API
Advanced knowledge of C or C++ is necessary, meaning you should have already written and read programs. Experience with the command line or a C development environment is also an advantage. Basic knowledge of parallel algorithms is helpful, but not strictly necessary.
And here is a bulleted list to justify the 4 out of 5 rating.
Pros
- Good examples, clear explanations, manageable pace
- The authors use grid blocks first and thread blocks second to introduce parallelism. This is a great idea and makes the most pedagogical sense.
- Visually appealing examples, such as Julia sets, a ray tracer as an example for constant memory, shared memory and bitmaps, and heat transfer for 1D and 2D textures.
- Nice introduction covering history and application examples.
- Installation of the driver, toolkit, and SDK are explained.
- The source code is downloadable and (mostly) works.
- Good example for atomic operations in the appendix (hash table).
Cons
- The code was always printed in full. This leads to repetitions throughout the book. While this might be the right decision for beginners, it annoys advanced users to have to read a function definition more than once. This is particularly bad with events, where the entire ray tracer is printed two more times (once with global, once with constant memory).
- Information that becomes outdated very quickly was included, such as the list of devices on page 15f.
- The filenames of the example programs are not in the book. You sometimes have to search a bit.
- Section 3.3 “Querying Devices” is too detailed for that stage of the book. You don’t want to know all that yet, and you can’t do anything with 95% of the terms at that point. A reference to the documentation would have sufficed.
- For beginners, a short section on basic design patterns or parallel algorithms would have been very practical.
Conclusion: Anyone wanting to learn CUDA will find this book to be the best introduction currently available.
- Sanders, Kandrot
- Cuda by Example
- Addison Wesley
- 2010
See also the review on Amazon.