1 link tagged with all of: models + benchmarking + evaluation + scientific-research + ai
Links
SGI-Bench is a benchmark designed to assess AI systems' capabilities in scientific inquiry, covering stages like deliberation, conception, action, and perception. It includes over 1,000 expert-curated samples from 10 disciplines, focusing on tasks such as deep research, idea generation, and experimental reasoning.
ai ✓
benchmarking ✓
scientific-research ✓
evaluation ✓
models ✓