When does fragmentation occur in the CUDA caching allocator?

3 days ago

Fragmentation occurs in the CUDA caching allocator when allocation patterns prevent merging of free memory blocks, even if total free memory is sufficient.
Without expandable segments, each allocation triggers separate cudaMalloc calls, creating independent segments that cannot merge across boundaries.
Expandable segments use CUDA's virtual memory API to create contiguous virtual address ranges, allowing blocks within a segment to merge when freed, reducing fragmentation.
Allocation order matters without expandable segments: allocating small then large blocks can cause fragmentation, while large then small allocations can be more efficient.
Expandable segments mitigate fragmentation by merging all freed blocks into a single contiguous free space, but fragmentation can still occur if live allocations block merging.
A 1 MiB boundary separates small and large block pools; crossing this boundary prevents memory sharing between pools, even with expandable segments.

Hasty Briefsbeta