1 link tagged with all of: language-models + benchmarking + auctions
Click any tag below to further narrow down your results
Links
PACT (Pairwise Auction Conversation Testbed) is a benchmark designed to evaluate conversational bargaining skills of language models through 20-round matches where a buyer and seller exchange messages and bids. The benchmark allows for analysis of negotiation strategies and performance, offering insights into how agents adapt and negotiate over time. With over 5,000 games played, it provides a comprehensive view of each model's bargaining capabilities through metrics like the Composite Model Score (CMS) and Glicko-2 ratings.