Quit Emailing Yourself

# models → benchmarking → evaluation → scientific-research → ai

1 link tagged with all of: models + benchmarking + evaluation + scientific-research + ai

Links

GitHub - InternScience/SGI-Bench: Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

SGI-Bench is a benchmark designed to assess AI systems' capabilities in scientific inquiry, covering stages like deliberation, conception, action, and perception. It includes over 1,000 expert-curated samples from 10 disciplines, focusing on tasks such as deep research, idea generation, and experimental reasoning.

Saved by tldr-importer · Last saved February 14, 2026 · 4 min read

ai ✓ benchmarking ✓ scientific-research ✓ evaluation ✓ models ✓