Click any tag below to further narrow down your results
Links
The author reruns security vulnerability triage experiments across 26 combinations of Claude and GPT-5 models with varying reasoning effort and context sizes. A four-model “council” achieved 86.2% unanimous votes, and GPT-5.4 at medium/high effort led overall performance, though full-chain solutions remained rare. The study also found higher reasoning sometimes backfires and function-level inputs outperformed whole-file analysis.
This article reruns a 2023 benchmark with the latest LLMs, comparing direct SQL generation against querying through a structured dbt Semantic Layer. It finds that while text-to-SQL accuracy has jumped, a modeled Semantic Layer still delivers near-perfect, deterministic results for covered queries, making it ideal for complex or critical use cases.