Click any tag below to further narrow down your results
Links
This article outlines how teams can switch their inference infrastructure to FriendliAI for improved efficiency and cost savings. FriendliAI claims 99.99% reliability, up to 90% lower costs, and faster throughput with minimal code changes required for migration. Users can get up to $50,000 in credits when they switch.
This article presents SGLang's new Pipeline Parallelism (PP) approach designed for large language models with ultra-long context windows. It combines techniques like Chunked Pipeline Parallelism and Dynamic Chunking to enhance throughput and reduce latency in multi-node deployments. The implementation shows significant performance improvements over traditional methods.
This article discusses how Apache Hudi's Non-Blocking Concurrency Control (NBCC) improves write throughput in data lakehouses by allowing concurrent writers to append data without conflicts. It contrasts NBCC with Optimistic Concurrency Control (OCC), highlighting the inefficiencies of retries in high-frequency streaming scenarios. The piece also explains how to configure NBCC in your data pipelines.
Tokasaurus is a newly released LLM inference engine designed for high-throughput workloads, outperforming existing engines like vLLM and SGLang by more than 3x in benchmarks. It features optimizations for both small and large models, including dynamic prefix identification and various parallelism techniques to enhance efficiency and reduce CPU overhead. The engine supports various model families and is available as an open-source project on GitHub and PyPI.
Scalability and performance are often confused, but they represent different concepts in distributed systems. While performance typically refers to throughput, scalability is the ability to adjust system capacity according to demand. Achieving scalability is crucial and often leads organizations to rely on cloud providers, even at a higher cost, to manage varying workloads effectively.
Celestia has introduced mamo-1, a public testnet that features 128MB blocks and achieves a throughput of 21.33MB/s, significantly surpassing the mainnet's capabilities. This testnet allows developers to test high-throughput applications in a more realistic environment, leveraging innovations like the Vacuum! data propagation protocol for enhanced performance and robustness.