Quit Emailing Yourself

1 link tagged with all of: performance + tensor-loading

Click any tag below to further narrow down your results

Links

Let Tensors Fly — Accelerating Large Model Weight Loading with R-Fork | LMSYS Org

This article introduces Tensor R-Fork, a method for quickly loading model weights in SGLang instances using GPU-Direct RDMA. It significantly reduces loading times and storage requirements while allowing uninterrupted inference services. The article details the implementation using two backends: NCCL and TransferEngine.

Saved by tldr-importer · Last saved February 14, 2026 · 6 min read

tensor-loading ✓ + gpu-direct + sglang + weight-transfer performance ✓