OLMoTrace is a new feature in the Ai2 Playground that allows users to trace the outputs of language models back to their extensive training data, enhancing transparency and trust. It enables researchers and the public to inspect how specific word sequences were generated, facilitating fact-checking and understanding model capabilities. The tool showcases Ai2's commitment to an open ecosystem by making training data accessible for scientific research and public insight into AI systems.
EleutherAI has released the Common Pile v0.1, an 8 TB dataset of openly licensed and public domain text for training large language models, marking a significant advancement from its predecessor, the Pile. The initiative emphasizes the importance of transparency and openness in AI research, aiming to provide researchers with essential tools and a shared corpus for better collaboration and accountability in the field. Future collaborations with cultural heritage institutions are planned to enhance the quality and accessibility of public domain works.