StableToken is introduced as a noise-robust semantic speech tokenizer that addresses the fragility of existing tokenizers when faced with irrelevant acoustic perturbations. By leveraging a multi-branch architecture and a consensus-driven bit-wise voting mechanism, StableToken significantly enhances token stability and improves the performance of SpeechLLMs across various tasks, reducing Unit Edit Distance under noisy conditions.
speech-tokenization ✓
robustness ✓
+ noise-resilience
machine-learning ✓
language-models ✓