Click any tag below to further narrow down your results
Links
A recent study highlights that most users rely on AI agents for cognitive tasks rather than simple chores. The data shows a shift from low-stakes queries to productivity and learning, indicating AI's growing role in enhancing work and decision-making. Key industries driving this trend include finance, marketing, and management.
OpenAI has launched BrowseComp, a new benchmark designed to evaluate the browsing capabilities of AI agents in locating difficult-to-find information across the internet. This benchmark includes 1,266 challenging questions that require persistence and creativity, distinguishing it from existing benchmarks that focus on simpler fact retrieval. Researchers are invited to utilize BrowseComp to improve the reliability and performance of AI systems.