Quit Emailing Yourself

# llms → latency

2 links tagged with all of: llms + latency

Click any tag below to further narrow down your results

Links

GitHub - twerkmeister/tokenflood: Tokenflood is a load testing framework for simulating arbitary loads on instruction-tuned LLMs

Tokenflood is a tool designed for load testing instruction-tuned large language models (LLMs). It allows users to define various parameters like prompt lengths and request rates without needing specific prompt data, making it easier to assess latency and performance across different providers and configurations. Users should be cautious of potential costs when using pay-per-token services.

Saved by tldr-importer · Last saved February 14, 2026 · 7 min read

+ load-testing llms ✓ + performance latency ✓ + tokenflood

Prompt caching: 10x cheaper LLM tokens, but how? | ngrok blog

The article explains how prompt caching works in large language models (LLMs) like those from OpenAI and Anthropic. It details the process of tokenization and embedding, illustrating how caching reduces costs and latency. The author shares insights from personal testing and dives into the mechanics behind LLM operations.

Saved by tldr-importer · Last saved February 14, 2026 · 7 min read

+ prompt-caching llms ✓ + tokenization + embedding latency ✓