When does fragmentation occur in the CUDA caching allocator?
3 days ago
- #Fragmentation
- #CUDA
- #Memory Allocation
- Fragmentation occurs in the CUDA caching allocator when allocation patterns prevent merging of free memory blocks, even if total free memory is sufficient.
- Without expandable segments, each allocation triggers separate cudaMalloc calls, creating independent segments that cannot merge across boundaries.
- Expandable segments use CUDA's virtual memory API to create contiguous virtual address ranges, allowing blocks within a segment to merge when freed, reducing fragmentation.
- Allocation order matters without expandable segments: allocating small then large blocks can cause fragmentation, while large then small allocations can be more efficient.
- Expandable segments mitigate fragmentation by merging all freed blocks into a single contiguous free space, but fragmentation can still occur if live allocations block merging.
- A 1 MiB boundary separates small and large block pools; crossing this boundary prevents memory sharing between pools, even with expandable segments.