Hasty Briefsbeta

Bilingual

Running GPT-2 in WebGL: Rediscovering the Lost Art of GPU Shader Programming

a year ago
  • #GPU Programming
  • #Machine Learning
  • #WebGL
  • NVIDIA introduced programmable shaders in the early 2000s, enabling sophisticated visual effects and laying the foundation for GPU computing.
  • Researchers found that computations like linear algebra could be accelerated using GPU shaders, leading to the development of CUDA and OpenCL for general-purpose GPU programming.
  • Traditional graphics APIs like OpenGL are tailored for rendering, while compute APIs like OpenCL and CUDA provide a direct compute model for parallel processing.
  • Textures and framebuffers in WebGL can be repurposed to store and manipulate numerical data, acting as a data bus for GPU-based computations.
  • Fragment shaders are used as compute kernels, with each fragment invocation acting as a parallel thread to perform neural network operations.
  • A shared vertex shader draws a full-screen quad, mapping every pixel to a tensor element, and is reused across different operations.
  • Each neural network operation is executed as a GPU pass, chaining together multiple operations without transferring data back to the CPU until the final step.
  • The GPT-2 forward pass involves embedding, transformer layers (attention and feed-forward), normalization, and output generation, all performed on the GPU.
  • Shader-based GPU computing has limitations, including no shared memory, texture size constraints, lack of synchronization, and precision overhead, making it less practical for real-world use compared to CUDA or OpenCL.