UNiON Scholar: eess.SY

2 scholarly results for eess.SY

Scholar iON Academic Synthesis

The scholarly works by Botros et al. focus on optimizing the Recurrent Neural Network-Transducer (RNN-T) models, particularly the decoder component, by employing strategies such as weight tying and a simplified prediction network architecture. By reducing the context size and utilizing a weighted averaging of input embeddings, the research achieves a significant reduction in model parameters—from 23 million to 2 million—while maintaining recognition accuracy, as evidenced by a stable word-error rate (WER). This optimization is particularly advantageous for on-device applications, enhancing model efficiency in terms of size, inference speed, and power consumption. The studies demonstrate that such architectural simplifications, coupled with Edit-based Minimum Bayes Risk (EMBR) training, can be implemented without compromising performance, highlighting a key advancement in the field of speech recognition technology.

🎓 Deep dive with Scholar iON →

semanticscholar.org · scholarly article

Tied & Reduced RNN-T Decoder

Rami Botros; Tara N. Sainath; R. David; Emmanuel Guzman; Wei Li; Yanzhang He

2021 Interspeech 📖 Cited 56 times Open Access DOI: 10.21437/Interspeech.2021-212

Previous works on the Recurrent Neural Network-Transducer (RNN-T) models have shown that, under some conditions, it is possible to simplify its prediction network with little or no loss in recognition accuracy (arXiv:2003.07705 [eess.AS], [2], arXiv:2012.06749 [cs.CL]). This is done by limiting the context size of previous labels and/or using a simpler architecture for its layers instead of LSTMs. The benefits of such changes include reduction in model size, faster inference and power savings, which are all useful for on-device applications. In this work, we study ways to make the RNN-T decoder (prediction network + joint network) smaller and faster without degradation in recognition performance. Our prediction network performs a simple weighted averaging of the input embeddings, and shares its embedding matrix weights with the joint network's output layer (a.k.a. weight tying, commonly used in language modeling arXiv:1611.01462 [cs.LG]). This simple design, when used in conjunction with additional Edit-based Minimum Bayes Risk (EMBR) training, reduces the RNN-T Decoder from 23M parameters to just 2M, without affecting word-error rate (WER).

View Paper ↗ 📄 PDF DOI 🎓 Explain with iON

arxiv.org · scholarly article

Tied & Reduced RNN-T Decoder

Rami Botros; Tara N. Sainath; Robert David; Emmanuel Guzman; Wei Li; Yanzhang He

2021 arXiv Open Access DOI: 10.21437/Interspeech.2021-212

View Paper ↗ 📄 PDF DOI 🎓 Explain with iON