Quit Emailing Yourself

# benchmarking → llm

2 links tagged with all of: benchmarking + llm

Click any tag below to further narrow down your results

Links

Brain the Size of a Planet: Are LLMs Thonking too Hard?

The author reruns security vulnerability triage experiments across 26 combinations of Claude and GPT-5 models with varying reasoning effort and context sizes. A four-model “council” achieved 86.2% unanimous votes, and GPT-5.4 at medium/high effort led overall performance, though full-chain solutions remained rare. The study also found higher reasoning sometimes backfires and function-level inputs outperformed whole-file analysis.

Last saved Jun 18, 2026 · 7 min read

llm + security-triage benchmarking + vulnerability-analysis + context-windows

Semantic Layer vs. Text-to-SQL: 2026 Benchmark Update

This article reruns a 2023 benchmark with the latest LLMs, comparing direct SQL generation against querying through a structured dbt Semantic Layer. It finds that while text-to-SQL accuracy has jumped, a modeled Semantic Layer still delivers near-perfect, deterministic results for covered queries, making it ideal for complex or critical use cases.

Last saved Apr 11, 2026 · 3 min read

+ semantic-layer + text-to-sql benchmarking llm + data-modeling