A pull request (PR) is being developed to add a CUDA backend to the MLX project, with the goal of improving developer experience for local testing and deployment to supercomputers. While the CUDA backend is still in progress, optimizations have led to significant performance improvements, and collaboration is encouraged for further development and testing across different environments, including ROCm support.
+ cuda
+ mlx
optimization ✓
development ✓
collaboration ✓