UNiON Scholar
UNiON Web Scholar iON AI About Scholar
2 scholarly results for eess.SY
Scholar iON Academic Synthesis
The research on "Tied & Reduced RNN-T Decoder" explores the optimization of Recurrent Neural Network-Transducer (RNN-T) models to enhance efficiency without compromising recognition accuracy. Both papers emphasize simplifying the prediction network through methods such as reducing context size and employing weight tying, which significantly decreases model parameters from 23 million to 2 million, while maintaining word-error rate (WER) performance. The studies highlight the potential for these optimizations to facilitate faster inference and power savings, which are particularly beneficial for on-device applications. This work underscores the importance of balancing model complexity and computational efficiency in advancing speech recognition technologies.
πŸŽ“ Deep dive with Scholar iON β†’
semanticscholar.org Β· scholarly article
Tied & Reduced RNN-T Decoder
Rami Botros; Tara N. Sainath; R. David; Emmanuel Guzman; Wei Li; Yanzhang He
2021 Interspeech πŸ“– Cited 56 times Open Access DOI: 10.21437/Interspeech.2021-212
Previous works on the Recurrent Neural Network-Transducer (RNN-T) models have shown that, under some conditions, it is possible to simplify its prediction network with little or no loss in recognition accuracy (arXiv:2003.07705 [eess.AS], [2], arXiv:2012.06749 [cs.CL]). This is done by limiting the context size of previous labels and/or using a simpler architecture for its layers instead of LSTMs. The benefits of such changes include reduction in model size, faster inference and power savings, which are all useful for on-device applications. In this work, we study ways to make the RNN-T decoder (prediction network + joint network) smaller and faster without degradation in recognition performance. Our prediction network performs a simple weighted averaging of the input embeddings, and shares its embedding matrix weights with the joint network's output layer (a.k.a. weight tying, commonly used in language modeling arXiv:1611.01462 [cs.LG]). This simple design, when used in conjunction with additional Edit-based Minimum Bayes Risk (EMBR) training, reduces the RNN-T Decoder from 23M parameters to just 2M, without affecting word-error rate (WER).
arxiv.org Β· scholarly article
Tied & Reduced RNN-T Decoder
Rami Botros; Tara N. Sainath; Robert David; Emmanuel Guzman; Wei Li; Yanzhang He
2021 arXiv Open Access DOI: 10.21437/Interspeech.2021-212
Previous works on the Recurrent Neural Network-Transducer (RNN-T) models have shown that, under some conditions, it is possible to simplify its prediction network with little or no loss in recognition accuracy (arXiv:2003.07705 [eess.AS], [2], arXiv:2012.06749 [cs.CL]). This is done by limiting the context size of previous labels and/or using a simpler architecture for its layers instead of LSTMs. The benefits of such changes include reduction in model size, faster inference and power savings, which are all useful for on-device applications. In this work, we study ways to make the RNN-T decoder (prediction network + joint network) smaller and faster without degradation in recognition performance. Our prediction network performs a simple weighted averaging of the input embeddings, and shares its embedding matrix weights with the joint network's output layer (a.k.a. weight tying, commonly used in language modeling arXiv:1611.01462 [cs.LG]). This simple design, when used in conjunction with additional Edit-based Minimum Bayes Risk (EMBR) training, reduces the RNN-T Decoder from 23M parameters to just 2M, without affecting word-error rate (WER).