Quit Emailing Yourself

# performance → sglang

2 links tagged with all of: performance + sglang

Click any tag below to further narrow down your results

Links

Let Tensors Fly — Accelerating Large Model Weight Loading with R-Fork | LMSYS Org

This article introduces Tensor R-Fork, a method for quickly loading model weights in SGLang instances using GPU-Direct RDMA. It significantly reduces loading times and storage requirements while allowing uninterrupted inference services. The article details the implementation using two backends: NCCL and TransferEngine.

Saved by tldr-importer · Last saved February 14, 2026 · 6 min read

+ tensor-loading + gpu-direct sglang ✓ + weight-transfer performance ✓

Transformers backend integration in SGLang

SGLang has integrated Hugging Face transformers as a backend, enhancing inference performance for models while maintaining the flexibility of the transformers library. This integration allows for high-throughput, low-latency tasks and supports models not natively compatible with SGLang, streamlining deployment and usage. Key features include automatic fallback to transformers and optimized performance through mechanisms like RadixAttention.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

sglang ✓ + transformers + inference performance ✓ + integration