KernelTuner/kernel_float
CUDA/HIP header-only library for low-precision (16 bit, 8 bit) and vectorized GPU kernel development
GitHub repository with 23 stars and 3 forks.
Language: C++
Topics: bfloat16, cpp, cuda, floating-point, gpu, half-precision, header-only-library, hip, kernel-tuner, low-precision