Running GPT-2 in WebGL: Rediscovering the Lost Art of GPU Shader Programming
a year ago
- #GPU Programming
- #Machine Learning
- #WebGL
- NVIDIA introduced programmable shaders in the early 2000s, enabling sophisticated visual effects and laying the foundation for GPU computing.
- Researchers found that computations like linear algebra could be accelerated using GPU shaders, leading to the development of CUDA and OpenCL for general-purpose GPU programming.
- Traditional graphics APIs like OpenGL are tailored for rendering, while compute APIs like OpenCL and CUDA provide a direct compute model for parallel processing.
- Textures and framebuffers in WebGL can be repurposed to store and manipulate numerical data, acting as a data bus for GPU-based computations.
- Fragment shaders are used as compute kernels, with each fragment invocation acting as a parallel thread to perform neural network operations.
- A shared vertex shader draws a full-screen quad, mapping every pixel to a tensor element, and is reused across different operations.
- Each neural network operation is executed as a GPU pass, chaining together multiple operations without transferring data back to the CPU until the final step.
- The GPT-2 forward pass involves embedding, transformer layers (attention and feed-forward), normalization, and output generation, all performed on the GPU.
- Shader-based GPU computing has limitations, including no shared memory, texture size constraints, lack of synchronization, and precision overhead, making it less practical for real-world use compared to CUDA or OpenCL.