UNiON Scholar: eess.IV

semanticscholar.org · scholarly article

Rami Botros; Tara N. Sainath; R. David; Emmanuel Guzman; Wei Li; Yanzhang He

2021 Interspeech 📖 Cited 56 times Open Access DOI: 10.21437/Interspeech.2021-212

Previous works on the Recurrent Neural Network-Transducer (RNN-T) models have shown that, under some conditions, it is possible to simplify its prediction network with little or no loss in recognition accuracy (arXiv:2003.07705 [eess.AS], [2], arXiv:2012.06749 [cs.CL]). This is done by limiting the context size of previous labels and/or using a simpler architecture for its layers instead of LSTMs. The benefits of such changes include reduction in model size, faster inference and power savings, which are all useful for on-device applications. In this work, we study ways to make the RNN-T decoder (prediction network + joint network) smaller and faster without degradation in recognition performance. Our prediction network performs a simple weighted averaging of the input embeddings, and shares its embedding matrix weights with the joint network's output layer (a.k.a. weight tying, commonly used in language modeling arXiv:1611.01462 [cs.LG]). This simple design, when used in conjunction with additional Edit-based Minimum Bayes Risk (EMBR) training, reduces the RNN-T Decoder from 23M parameters to just 2M, without affecting word-error rate (WER).

View Paper ↗ 📄 PDF DOI 🎓 Explain with iON

arxiv.org · scholarly article

Tied & Reduced RNN-T Decoder

Rami Botros; Tara N. Sainath; Robert David; Emmanuel Guzman; Wei Li; Yanzhang He

2021 arXiv Open Access DOI: 10.21437/Interspeech.2021-212

Previous works on the Recurrent Neural Network-Transducer (RNN-T) models have shown that, under some conditions, it is possible to simplify its prediction network with little or no loss in recognition accuracy (arXiv:2003.07705 [eess.AS], [2], arXiv:2012.06749 [cs.CL]). This is done by limiting the context size of previous labels and/or using a simpler architecture for its layers instead of LSTMs. The benefits of such changes include reduction in model size, faster inference and power savings, which are all useful for on-device applications. In this work, we study ways to make the RNN-T decoder (prediction network + joint network) smaller and faster without degradation in recognition performance. Our prediction network performs a simple weighted averaging of the input embeddings, and shares its embedding matrix weights with the joint network's output layer (a.k.a. weight tying, commonly used in language modeling arXiv:1611.01462 [cs.LG]). This simple design, when used in conjunction with additional Edit-based Minimum Bayes Risk (EMBR) training, reduces the RNN-T Decoder from 23M parameters to just 2M, without affecting word-error rate (WER).

View Paper ↗ 📄 PDF DOI 🎓 Explain with iON

arxiv.org · scholarly article

Mathematical Theory of Computational Resolution Limit in Multi-dimensions

Ping Liu; Hai Zhang

2021 arXiv Open Access DOI: 10.1088/1361-6420/ac245b

Resolving a linear combination of point sources from their band-limited Fourier data is a fundamental problem in imaging and signal processing. With the incomplete Fourier data and the inevitable noise in the measurement, there is a fundamental limit on the separation distance between point sources that can be resolved. This is the so-called resolution limit problem. Characterization of this resolution limit is still a long-standing puzzle despite the prevalent use of the classic Rayleigh limit. It is well-known that Rayleigh limit is heuristic and its drawbacks become prominent when dealing with data that is subjected to delicate processing, as is what modern computational imaging methods do. Therefore, more precise characterization of the resolution limit becomes increasingly necessary with the development of data processing methods. For this purpose, we developed a theory of "computational resolution limit" for both number detection and support recovery in one dimension in [arXiv:2003.02917[cs.IT], arXiv:1912.05430[eess.IV]]. In this paper, we extend the one-dimensional theory to multi-dimensions. More precisely, we define and quantitatively characterize the "computational resolution limit" for the number detection and support recovery problems in a general k-dimensional space. Our results indicate that there exists a phase transition phenomenon regarding to the super-resolution factor and the signal-to-noise ratio in each of the two recovery problems. Our main results are derived using a subspace projection strategy. Finally, to verify the theory, we proposed deterministic subspace projection based algorithms for the number detection and support recovery problems in dimension two and three. The numerical results confirm the phase transition phenomenon predicted by the theory.

View Paper ↗ 📄 PDF DOI 🎓 Explain with iON

arxiv.org · scholarly article

Compressive radio-interferometric sensing with random beamforming as rank-one signal covariance projections

Olivier Leblanc; Yves Wiaux; Laurent Jacques

2024 arXiv Open Access

Radio-interferometry (RI) observes the sky at unprecedented angular resolutions, enabling the study of several far-away galactic objects such as galaxies and black holes. In RI, an array of antennas probes cosmic signals coming from the observed region of the sky. The covariance matrix of the vector gathering all these antenna measurements offers, by leveraging the Van Cittert-Zernike theorem, an incomplete and noisy Fourier sensing of the image of interest. The number of noisy Fourier measurements -- or visibilities -- scales as $\mathcal O(Q^2B)$ for $Q$ antennas and $B$ short-time integration (STI) intervals. We address the challenges posed by this vast volume of data, which is anticipated to increase significantly with the advent of large antenna arrays, by proposing a compressive sensing technique applied directly at the level of the antenna measurements. First, this paper shows that beamforming -- a common technique of dephasing antenna signals -- usually used to focus some region of the sky, is equivalent to sensing a rank-one projection (ROP) of the signal covariance matrix. We build upon our recent work arXiv:2306.12698v3 [eess.IV] to propose a compressive sensing scheme relying on random beamforming, trading the $Q^2$-dependence of the data size for a smaller number $P$ ROPs. We provide image recovery guarantees for sparse image reconstruction. Secondly, the data size is made independent of $B$ by applying $M$ Bernoulli modulations of the ROP vectors obtained for the STI. The resulting sample complexities, theoretically derived in a simpler case without modulations and numerically obtained in phase transition diagrams, are shown to scale as $\mathcal O(K)$ where $K$ is the image sparsity. This illustrates the potential of the approach.

View Paper ↗ 📄 PDF 🎓 Explain with iON