Posts
Implementing a fast Tensor Core matmul on the Ada Architecture
Using Tensor Cores is now a prerequisite to get anywhere near peak performance on NVIDIA GPUs. In this post we work through the process of developing an efficient Tensor Core matrix multiplication kernel targeting the Ada architecture.
subscribe via RSS