Click any tag below to further narrow down your results
Links
This article explains Slonk, a system developed at Character.ai that combines SLURM and Kubernetes to manage GPU research clusters effectively. It addresses the challenges of providing a reliable scheduling environment for researchers while maintaining the operational benefits of Kubernetes. The open-source snapshot offers tools and configurations for others to implement similar systems.