Kernel eigenvoice
From The Right Wiki
Jump to navigationJump to search
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
Speaker adaptation is an important technology to fine-tune either features or speech models for mis-match due to inter-speaker variation. In the last decade, eigenvoice (EV) speaker adaptation has been developed. It makes use of the prior knowledge of training speakers to provide a fast adaptation algorithm (in other words, only a small amount of adaptation data is needed). Inspired by the kernel eigenface idea in face recognition, kernel eigenvoice (KEV) is proposed.[1] KEV is a non-linear generalization to EV. This incorporates Kernel principal component analysis, a non-linear version of Principal Component Analysis, to capture higher order correlations in order to further explore the speaker space and enhance recognition performance.
See also
References
- ↑ "Kernel Eigenvoice Thesis" (PDF). Archived from the original (PDF) on 2011-06-10. Retrieved 2009-07-17.
External links
- Kernel Eigenvoice Speaker Adaptation, ScientificCommons
- Mak, B.; Ho, S. (2005). "Various Reference Speakers Determination Methods for Embedded Kernel Eigenvoice Speaker Adaptation". IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. Proceedings. ICASSP '05. Vol. 1. pp. 981–984. doi:10.1109/ICASSP.2005.1415280.
- Mak, B.; Kwok, J. T.; Ho, S. (September 2005). "Kernel Eigenvoice Speaker Adaptation". IEEE Transactions on Speech and Audio Processing. 13 (5): 984–992. doi:10.1109/TSA.2005.851971. ISSN 1063-6676. S2CID 7361772. Retrieved 2017-11-15.
- Speedup of Kernel Eigenvoice Speaker Adaptation by Embedded Kernel PCA, ICSLP 2004.
- Speaker Adaptation via Composite Kernel PCA, NIPS 2003.
- Mak, Brian Kan-Wing; Hsiao, Roger Wend-Huu; Ho, Simon Ka-Lung; Kwok, J. T. (July 2006). "Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting". IEEE Transactions on Audio, Speech, and Language Processing. 14 (4): 1267–1280. CiteSeerX 10.1.1.206.4596. doi:10.1109/TSA.2005.860836. S2CID 7527119.