1 link tagged with all of: performance + machine-learning + quantization + gpt-oss
Click any tag below to further narrow down your results
Links
OpenAI's GPT-OSS models introduce several efficiency upgrades for transformers, including MXFP4 quantization and specialized kernels that enhance performance during model loading and execution. The updates allow for faster inference and fine-tuning while maintaining compatibility across major models in the transformers library. Additionally, community-contributed kernels are integrated to streamline usage and performance optimization.