Find topic
WS06 topics
Members' area
Tools
Help!
-- KarenLivescu - 15 Dec 2005
|
Papers
An archive of papers that may be relevant. Papers are marked as follows:
-
indicates a paper that we should probably all read and have in our collective consciousness
-
indicates a useful paper that may be relevant to some parts of the project
-
indicates a paper that may be of interest, but probably does not directly impact project planning
Some papers don't have a "relevance marker" yet. Please feel free to add one, or to change an existing one, if you are familiar with the paper.
This list is incomplete--please add papers, links, comments, new sections, etc.
Some of these references are taken from Katrin Kirchhoff's compilation for the 2001 JHU workshop.
Some basic background
- F. Jelinek, Statistical Methods for Speech Recognition. Cambridge, MA: The MIT Press, 1997.
- Brief, somewhat dense, well-written introduction. The only prerequisite is basic probability. The most relevant chapters for us are 1-3, 9, and 12.
- F. V. Jensen, Bayesian Networks and Decision Graphs. Springer, 2001.
- I haven't read this in a while, but I believe it's the best truly introductory BN book out there. -Karen
Articulatory feature classification/recognition
-
M.R. Schroeder, ``Determination of the geometry of the human vocal tract by acoustic measurements'', JASA 41(2), pp. 1002-1010, 1967.
-
K. Shirai and M. Honda, "Estimation of Articulatory Motion". in Dynamic Aspects of Speech Production, pp. 279-302, Tokyo University Press, 1976.
-
B.S. Atal, J.J. Chang, M.V. Mathews & J.W. Tukey, ``Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique'', JASA 63(5), pp. 1535-1555, 1978.
- ... so acoustic-to-articulatory mapping goes back at least to the 60s.
- T. Kobayashi, M. Yagyu & K. Shirai, ``Application of neural networks to articulatory motion estimation'', Proceedings ICASSP-91, pp. 489-492.
- J. Papcun, T.R. Hochberg, T.R. Thomas, F. Larouche, J. Zacks & S. Levy, ``Inferring articulation and recognizing gestures from acoustics with a neural network trained on X-ray microbeam data'', JSASA 92, pp. 688-700, 1992.
- K. Elenius and M. Blomberg, "Comparing phoneme and feature based speech recognition using artificial neural networks", Proceedings ICSLP-92 1992, 1279-1282.
- E. Eide, J.R. Rohlicek. H. Gish and S. Mitter, "A linguistic feature representation of the speech waveform", Proceedings ICASSP-93, 1993, pp. 483-486.
- M.G. Rahim, W.B. Kleijn, J. Schroeter & C.C. Goodyear, ``Acoustic to articulatory parameter mapping using an assembly of neural networks'', Proceedings of ICASSP-91, pp. 485-488, 1991.
- H.B. Richards, J.S. Mason, M.J. Hunt & J.S. Bridle, ``Deriving articulatory representations of speech'', Proceedings Eurospeech-95, pp. 761-764, 1995.
- C.S. Blackburn & S.J. Young, ``Towards improved speech recognition using a speech production model'', Proceedings Eurospeech-95, pp. 1623-1626, Madrid, Spain, 1995.
- C.S. Blackburn, Articulatory Methods for Speech Production and Recognition, Ph.D. Thesis, Cambridge University Engineering Department, 1996.
- J. Hogden et al., ``Accurate recovery of articulator positions from acoustics: new conclusions based on human data'', Journal of the Acoustical Society of America 100(3),1996, pp. 1819-1834
- A.V. Hansen, ``Acoustic parameters optimised for recognition of phonetic features'', Proceedings of Eurospeech-97, pp. 397-400, Rhodes, Greece, 1997.
-
S. Dusan and L. Deng, ``Acoustic-to-articulatory inversion using dynamical and phonological constraints'', Proceedings of the 5th Speech Production Workshop: models and data, Kloster Seeon, Germany, 2000.
-
S. Dusan & L. Deng, ``Estimation of articulatory parameters from speech acoustics by Kalman filtering'', Proceedings of CITO Researcher Retreat, Hamilton, Canada, 1998.
-
S. Dusan & L. Deng, ``Recovering vocal tract shapes from MFCC parameters'', Proceedings ICSLP-98, Sydeny, Australia, 1998.
-
S. Dusan, Statistical Estimation of Articulatory Trajectories from the Speech Signals Using Dynamic and Phonological Constraints, University of Waterloo, Canada, 2000.
-
P. Niyogi and M. M. Sondhi. Detecting Stop Consonants in Continuous Speech. Journal of the Acoustical Society of America. 111, 1063 (2002).
-
C. Burges and P. Niyogi. Detecting and Interpreting Acoustic Features with Support Vector Machines. Tech. Report TR-2002-02. Computer Science Dept., Univ. of Chicago. 2002.
-
P. Niyogi and P. Ramesh. The Voicing Feature for Stop Consonants: Recognition Experiments with Continuously Spoken Alphabets. Speech Communication. Vol. 41, pp. 349-367, 2003.
-
P. Niyogi, C. Burges, P. Ramesh. Distinctive Feature Detection using Support Vector Machines, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Phoenix, Arizona, 1999.
- A few papers on feature classification with SVMs.
-
J. Frankel, M. Wester, and S. King. Articulatory feature recognition using dynamic Bayesian networks. In Proc. ICSLP, September 2004.
-
M. Wester, J. Frankel, and S. King. Asynchronous articulatory feature recognition using dynamic Bayesian networks. In Proc. IEICI Beyond HMM Workshop, Kyoto, December 2004.
-
J. Frankel and S. King. A hybrid ANN/DBN approach to articulatory feature recognition. In Proc. Eurospeech, Lisbon, September 2005.
- These three papers describe work that we are likely to base some of our acoustic modeling on. The second is listed as most relevant because it is more detailed and talks about embedded training, which we plan to use as well.
- M. Rajamanohar and E. Fosler-Lussier, "An Evaluation of Hierarchical Articulatory Feature Detectors," IEEE Automatic Speech Recognition and Understanding Workkshop (ASRU 2005), San Juan, Puerto Rico, 2005.
Articulatory feature-based ASR
-
L. Deng & K. Erler, ``Structural design of hidden Markov model speech recognizer using multivalued phonetic features: Comparison with segmental speech units'', JASA 92(6), pp. 3058-3066, 1992.
-
L. Deng & D. Sun, ``Speech recognition using atomic speech units constructed from overlapping articulatory features'', Proceedings Eurospeech-93, pp. 1635-1638, Berlin, Germany, 1993.
-
L. Deng & D. Sun, ``Phonetic classification and recognition using HMM representation of overlapping articulatory features for all classes of English sounds'', Proceedings ICSSP-94, pp. I-45-48, Adelaide, Australia, 1994.
-
L. Deng, G. Ramsay & D. Sun, ``Production models as a structural basis for automatic speech recognition'', ETRW-96, 1996.
-
K. Erler & G. H. Freeman, ``An HMM-based speech recognizer using overlapping articulatory features'', JASA 100(4), pp. 2500-2513, 1996.
- A series of papers using HMMs in which each state corresponds to a combination of feature values.
- J. Zacks & T.R. Thomas, ``A new neural network for articulatory speech recognition and its application to vowel identification'', Computer, Speech and Language 8, pp. 189-209, 1994.
-
K. Kirchhoff. "Syllable-level desynchronisation of phonetic features for speech recognition", International Conference on Spoken Language Processing, Philadelphia, USA, October, 1996.
- Two-pass recognition approach allowing for asynchrony between articulatory features within syllable boundaries.
- D.J. Iskra & W.H. Edmondson, ``Feature-based approach to speech recognition'', Proceedings ICSLP-98, Sydey, Australia, 1998.
-
K. Kirchhoff, G.A. Fink and G. Sagerer. ``Combining acoustic and articulatory feature information for robust speech recognition.'' Speech Communication 37, 2002, pp. 303-319.
-
K. Kirchhoff, "Integrating Articulatory Features into Acoustic Models for Speech Recognition", Workshop PhonASR?, Saarbruecken, Germany, May 2000.
-
K. Kirchhoff, G.A. Fink and G. Sagerer, "Conversational Speech Recognition Using Acoustic and Articulatory Input", ICASSP 2000, Istanbul, Turkey, June 2000.
-
K. Kirchhoff, "Combining Articulatory and Acoustic Information for Speech Recognition in Noisy and Reverberant Environments", Proceedings of the International Conference on Spoken Language Processing, Sydney, Australia, December, 1998, 891-894 (Postscript)
-
K. Kirchhoff, Robust Speech Recognition Using Articulatory Information, Ph.D. thesis, University of Bielefeld, Germany, July 1999.
- Katrin Kirchhoff's Ph.D. research using neural network-based articulatory feature classifiers in a hybrid HMM/ANN-like approach.
- T. Stephenson et al., ``Automatic speech recognition using dynamic Bayesian networks with both acoustic and articulatory variables'', Proceedings ICSLP-00, Beijing, China, 2000.
- S. King, P. Taylor, J. Frankel, and K. Richmond. Speech recognition via phonetically-featured syllables. In PHONUS, volume 5, pages 15-34, Institute of Phonetics, University of the Saarland, 2000.
-
J. Frankel, K. Richmond, S. King, and P. Taylor. An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces. In Proc. ICSLP, 2000.
- J. Frankel and S. King. ASR - articulatory speech recognition. In Proc. Eurospeech, pages 599-602, Aalborg, Denmark, September 2001.
-
J. Frankel. Linear dynamic models for automatic speech recognition. PhD? thesis, The Centre for Speech Technology Research, Edinburgh University, April 2003.
- A set of related papers using neural networks + dynamical system models.
-
Florian Metze and Alex Waibel, A Flexible Stream Architecture for ACR Using Articulatory Features, ICSLP 2002, Denver.
-
Hagen Soltau, Florian Metze, and Alex Waibel, Compensating for Hyperarticulation by Modeling Articulatory Properties, ICSLP 2002, Denver.
-
Florian Metze, Articulatory Features for "Meeting" Speech Recognition, ICSLP 2006, Pittsburgh, PA.
- HMM-based speech recognition with observation model consisting of product of Gaussian mixture terms P(obs|feature i). Showed particularly improved results on hyperarticulated speech.
-
M. Richardson, J. Bilmes, and C. Diorio, Hidden-Articulatory Markov Models for Speech Recognition. ISCA ITRW Conference on Automatic Speech Recognition, Paris, August 2000.
-
M. Richardson, J. Bilmes, and C. Diorio, Hidden-Articulator Markov Models: Performance Improvements and Robustness to Noise. Int. Conf. on Spoken Language Processing, Beijing, October 2000.
-
M. Richardson, J. Bilmes, and C. Diorio Hidden-Articulator Markov Models for Speech Recognition Speech Communications 41(2), October 2003.
- A model in which HMM states correspond to all possible combinations of articulatory feature values (similarly to the Deng papers above).
-
M. Hasegawa-Johnson, J. Baker, S. Borys, K. Chen, E. Coogan, S. Greenberg, A. Juneja, K. Kirchhoff, K. Livescu, K. Sonmez, S. Mohan, J. Muller, and T. Wang, ``Landmark-based speech recognition: Report of the 2004 Johns Hopkins Summer Workshop,'' Proc. ICASSP, Philadelphia, March 2005.
-
M. Hasegawa-Johnson, J. Baker, S. Greenberg, K. Kirchhoff, J. Muller, K. Sonmez, S. Borys, K. Chen, A. Juneja, K. Livescu, S. Mohan, E. Coogan, and T. Wang, ``Landmark-based Speech Recognition: Report of the 2004 Johns Hopkins Summer Workshop,'' Johns Hopkins University 2004 Summer Workshop final report.
- Shorter and longer descriptions of the 2004 JHU project on landmark-based ASR. We will use DBNs similar to the ones here, the idea of mapping between pronunciation model features to acoustic model features, and maybe similar SVMs, but not the landmark-specific ideas.
Graphical models
-
G. Zweig and S. Russell, Speech Recognition with Dynamic Bayesian Networks, AAAI 1998.
-
G. Zweig and S. Russell, ``Probabilistic modeling with Bayesian networks for automatic speech recognition.'' Australian Journal of Intelligent Information Processing Systems, 5(4), 253-60, 1999.
-
G. Zweig and M. Padmanabhan, Dependency Modeling with Bayesian Networks in a voicemail transcription system, Eurospeech 1999.
-
G. Zweig, Speech Recognition with Dynamic Bayesian Networks, Ph.D. thesis, UC Berkeley.
- Geoff Zweig's thesis research, which showed how HMM-based speech recognizers can be formulated as DBNs, and experimented with DBNs incorporating additional hidden variables. His thesis also proposes a structure for articulatory ASR.
-
J. Bilmes. Natural Statistical Models for Automatic Speech Recognition. Ph.D. Thesis, Dept. of EECS, CS Division, U.C. Berkeley 1999.
-
J. Bilmes, What HMMs can do. (also available in postscript), UWEETR-2002-0003, Feb, 2002.
-
J. Bilmes, Graphical Models and Automatic Speech Recognition, in "Mathematical Foundations of Speech and Language Processing", Institute of Mathematical Analysis Volumes in Mathematics Series, Springer-Verlag, 2003.
- Not really introductory, but almost so.
-
G. Zweig, J. Bilmes, T. Richardson, K. Filali, K. Livescu, P. Xu, K. Jackson, Y. Brandman, E. Sandness, E. Holtz, J. Torres, and B. Byrne, "Structurally discriminative graphical models for automatic speech recognition -- results from the 2001 Johns Hopkins Summer Workshop." Proc. ICASSP, Orlando, Florida, May 2002.
-
J. Bilmes, G. Zweig, T. Richardson, K. Filali, K. Livescu, P. Xu, K. Jackson, Y. Brandman, E. Sandness, E. Holtz, J. Torres, and B. Byrne, "Discriminatively Structured Graphical Models for Speech Recognition." Johns Hopkins University 2001 Summer Workshop final report.
- Shorter/longer descriptions of the 2001 JHU workshop project on DBNs for ASR. The final report describes a DBN for articulatory ASR (based on the one suggested in Geoff Zweig's thesis above), which was implemented but not used in experiments as part of this project.
-
K. Livescu, "Graphical models and speech recognition", guest lecture in MIT 6.345 Automatic Speech Recognition. PDF and PPT.
-
Homework assignment associated with above lecture
- A lecture-plus-lab unit on graphical models in speech. Might be useful for background. We should certainly all have the "warm-up exercises" in the homework assignment down pat
I might be able to find the files for the actual lab part too if necessary. --Karen
-
J. Bilmes, Graphical Models in Speech and Language Research, tutorial presented during the 2004 Human Language Technology conference / North American chapter of the Association for Computational Linguistics(HLT/NAACL'04) conference.
- Another tutorial that includes more information on inference and other applications besides ASR.
Multi-stream models for ASR
- H.J. Nock, S.J. Young, Loosely Coupled HMMs for ASR. In Proc of ICSLP 2000, Beijing, China.
- H.J. Nock and S.J. Young, Modelling Asynchrony in Automatic Speech Recognition Using Loosely-Coupled HMMs. Cognitive Science. May-June 2002.
- Özgür Çetin. Multi-rate Modeling, Model Inference, and Estimation for Statistical Classifiers, Ph.D. thesis, University of Washington, 2004.
Articulatory phonology
- Browman, C. P., & Goldstein, L. (1986). Towards an articulatory phonology. Phonology Yearbook, 3, 219-252.
- Browman, C. P., & Goldstein, L. (1989). Articulatory gestures as phonological units. Phonology, 6, 201-251.
- Browman, C. P., & Goldstein, L. (1990a). Gestural specification using dynamically-defined articulatory structures. Journal of Phonetics, 18, 299-320.
- Browman, C. P., & Goldstein, L. (1990b). Representation and reality: Physical systems and phonological structure. Journal of Phonetics, 18, 411-424.
-
Browman, C. P., & Goldstein, L. (1990c). Tiers in articulatory phonology, with some implications for casual speech. In T. Kingston & M. E. Beckman (Eds.), Papers in Laboratory Phonology I: Between the Grammar and Physics of Speech (pp. 341-376). Cambridge University Press.
-
Browman, C. P., & Goldstein, L. (1992). Articulatory phonology: An overview. Phonetica, 49, 155-180.
- A series of papers on different aspects of articulatory phonology. The last one is fairly introductory and good to start with.
- Byrd, D., and Saltzman, E.L. (2003) "The Elastic Phrase: modeling the dynamics of boundary-adjacent lengthening." Journal of Phonetics 31:149-180, 2003.
- Nam, H. and Saltzman, E.L. (2003) "A Competitive, Coupled Oscillator Model of Syllable Structure." ICPhS (International Congress on the Phonetic Sciences), Barcelona, 2003.
Pronunciation modeling
- G. Tajchman, E. Fosler, and D. Jurafsky. "Building Multiple Pronunciation Models for Novel Words using Exploratory Computational Phonology," Fourth European Conference on Speech Communication and Technology (Eurospeech '95), Madrid, Spain, 1995.
- M. Ostendorf, B. Byrne, M. Bacchiani, M. Finke, A. Gunawardana, K. Ross, S. Roweis, E. Shriberg, D., Talkin, A. Waibel, B. Wheatley and T. Zeppenfeld, “Modeling Systematic Variations in Pronunciation via a Language-Dependent Hidden Speaking Mode,” Proc. of the International Conference on Spoken Language Processing, 1996, supplementary paper.
- Results from the 1996 JHU workshop.
- Byrne W, Finke M, Khudanpur S, McDonough J, Nock H, Riley M, Saraclar M, Wooters C and Zavaliagkos G, "Pronunciation Modelling Using a Hand-Labelled Corpus for Conversational Speech Recognition," IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, WA, 1998.
- Byrne W, Finke M, Khudanpur S, McDonough J, Nock H, Riley M, Saraclar M, Wooters C and Zavaliagkos G, "Pronunciation Modelling for Conversational Speech Recognition: A Status Report from WS97," Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, Santa Barbara, CA, December 1997.
- Riley, M., Byrne, W., Finke, M., Khudanpur, S., Ljolje, A. McDonough, J., Nock, H., Saraclar, M., Wooters, C., Zavaliagkos. "Stochastic pronunciation modeling from hand-labeled phonetic corpora," in the Proceedings of the Workshop on Modeling Pronunciation Variation for Automatic Speech Recognition, Rolduc, The Netherlands, May 4-6, 1998. pp 109-116
- Michael Riley, William Byrne, Michael Finke, Sanjeev Khudanpur, Andrej Ljolje, John McDonough, Harriet Nock, Murat Saraclar, Charles Wooters, George Zavaliagkos "Stochastic pronunciation modelling from hand-labelled phonetic corpora," Speech Communication, to appear.
- Papers resulting from the 1997 JHU workshop.
- D. Jurafsky, A. Bell, E. Fosler-Lussier, C. Girand, and W. Raymond. "Reduction of English function words in Switchboard," International Conference on Spoken Language Processing (ICSLP-98), Sydney, Australia, 1998.
-
E. Fosler-Lussier and N. Morgan. "Effects of Speaking Rate and Word Predictability on Conversational Pronunciations," ESCA Tutorial and Research Workshop on Modeling Pronunciation Variation for Automatic Speech Recognition, Kerkrade, Netherlands, 1998.
-
E. Fosler-Lussier. "Contextual word and syllable pronunciation models," International Workshop on Automatic Speech Recognition and Understanding (ASRU '99), Keystone, Colorado, 1999.
-
J. E. Fosler-Lussier. "Dynamic Pronunciation Models for Automatic Speech Recognition," Ph.D. thesis, University of California, Berkeley, 1999. Reprinted as International Computer Science Institute technical report TR-99-015.
-
Murat Saraclar. Pronunciation Modeling for Conversational Speech Recognition. Ph.D. thesis, Johns Hopkins University, Baltimore, MD, USA, 2000.
-
H. J. Nock, Techniques for Modelling Phonological Processes in Automatic Speech Recognition, Ph.D. Thesis, Cambridge University Engineering Department. August 2001.
- The first few chapters of these theses give a lot of good background. Harriet Nock's thesis also goes into details of multistream DBNs for ASR.
- Murat Saraclar and Sanjeev Khudanpur. Pronunciation change in conversational speech and its implications for automatic speech recognition. Computer Speech and Language, 18(4):375-395, October 2004.
- Murat Saraclar, Harriet Nock, and Sanjeev Khudanpur. Pronunciation modeling by sharing gaussian densities across phonetic models. Computer Speech and Language, 14(2):137-160, April 2000.
- Papers showing that pronunciation changes are often partial, rather than wholesale substitutions of one phone for another.
- Jurafsky, Daniel, Alan Bell, Michelle Gregory, and William D. Raymond. 2001. Probabilistic Relations between Words: Evidence from Reduction in Lexical Production. In Bybee, Joan and Paul Hopper (eds.). Frequency and the emergence of linguistic structure. Amsterdam: John Benjamins. 229-254.
- Jurafsky, Daniel, Alan Bell, Michelle Gregory, and William D. Raymond. 2001. The Effect of Language Model Probability on Pronunciation Reduction. In Proceedings of ICASSP-01 II.801--804, Salt Lake City, Utah.
-
Jurafsky, Dan, Wayne Ward, Zhang Jianping, Keith Herold, Yu Xiuyang, and Zhang Sen. 2001. What Kind of Pronunciation Variation is Hard for Triphones to Model? Proceedings of ICASSP-01, I.577-580, Salt Lake City, Utah.
- E. Fosler-Lussier. "A Tutorial on Pronunciation Modeling for Large Vocabulary Speech Recognition,", in S. Renals and G. Grefenstette (eds), Text and Speech Triggered Information Access, Springer Verlag, Berlin, 2003.
- E. Fosler-Lussier, W. Byrne, and D. Jurafsky, eds. Speech Communication Special Issue on Pronunciation Modleing and Lexicon Adaptation, 46:2, June 2005.
Pronunciation modeling with articulatory features
-
K. Livescu and J. Glass, "Feature-based pronunciation modeling for speech recognition." Proc. HLT/NAACL, Boston, May 2004.
-
K. Livescu and J. Glass, "Feature-based pronunciation modeling with trainable asynchrony probabilities." Proc. ICSLP, Jeju, South Korea, October 2004.
-
K. Livescu, Feature-based pronunciation modeling for automatic speech recognition. PhD thesis, MIT Department of Electrical Engineering and Computer Science, September 2005.
- The pronunciation modeling part of the project is based on the models in these papers. The first is more of an introduction; the second presents a modified model more similar to the ones we plan to use, and discusses training of the model. The thesis gives more background and details.
Visual/Audio-visual ASR
-
A. Mashari, J. Sison, C. Neti, G. Potamianos, J.Luettin, Modeling visual co-articulation for large vocabulary continuous visual speech recognition. ICASSP 2001 Conference Student Forum.
- Explores visually meaningful co-articulation models using decision trees. Part of the JHSU Workshop 2000 (see below).
-
C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin, and D. Vergyri, Large-vocabulary audio-visual speech recognition: A summary of the Johns Hopkins Summer 2000 Workshop. Proc. Works. Signal Processing 2001.
- Focused on integration of audio and visual speech signals for large-vocabulary recognition (workshop homepage).
-
A. Nefian, L. Liang, X. Pi, X. Liu and K. Murphy, Dynamic Bayesian Networks for Audio-Visual Speech Recognition. EURASIP, Journal of Applied Signal Processing, 11:1-15, 2002.
-
A. Nefian, L. Liang, X. Pi, L. Xiaoxiang, C. Mao and K. Murphy, A Coupled HMM for Audio-Visual Speech Recognition. ICASSP 2002.
-
Chu, S.M. and Huang, T.S., Audio-visual speech modeling using coupled hidden Markov models, ICASSP '02, Volume 2, Page(s):2009 - 2012.
-
Chu, S.M. and Huang, T.S., An experimental study of coupled hidden Markov models, ICASSP '02, Volume 4, Page(s):IV-4100 - IV-4103.
Hybrid/Tandem ASR
The institutes currently pursuing these approaches include IDIAP, ICSI and SRI. Key authors to search for are Bourlard, Morgan, Hermansky.
- Bourlard, H., and Morgan, N. (1998), “Hybrid HMM/ANN Systems for Speech Recognition: Overview and New Research Directions,” in Adaptive Processing of Sequences and Data Structures, C.L. Giles and M. Gori (Eds.), Lecture Notes in Artificial Intelligence (1387), Springer Verlag (ISBN 3-540-64341-9), pp. 389-417.
- Morgan, N. and Bourlard, H. (1995), “Continuous Speech Recognition: An Introduction to the Hybrid HMM/Connectionist Approach,” IEEE Signal Processing Magazine, Invited Paper, vol. 12, no. 3, pp. 25-42, May 1995 (IEEE Award paper).
- Renals, S., Morgan, N., Bourlard, H., Cohen, M. and Franco, H. (1994), “Connectionist Probability Estimators in HMM Speech Recognition,” IEEE Trans. on Speech and Audio Processing, vol. 2, no. 1, pp. 161-174.
Corpora
-
Bowon Lee, Mark Hasegawa-Johnson, Camille Goudeseune, Suketu Kamdar, Sarah Borys, Ming Liu, and Thomas Huang, "AVICAR: An Audiovisual Speech Corpus in a Car Environment," ICSLP 2004
Review papers/position papers/idea papers
- O. Schmidbauer, F. Casacuberta, M.J. Castro, G. Hegerl, H. Hoge, J.A. Sanchez & I. Zlokarnik, ``Articulatory representation and speech technology'', Language and Speech 36, pp. 331-351, 1993.
- R.C. Rose, J. Schroeter & M.M. Sondhi, ``An investigation of the potential role of speech production models in automatic speech recognition'', Proceedings ICSLP-94, pp. 575-578.
- R.S. McGowan & A. Faber, ``Introduction to papers on speech recognition and perception from an articulatory point of view'', JASA 99(3), pp. 1680-1681, 1996.
-
M. Ostendorf, “Moving beyond the ‘beads-on-a-string’ model of speech,” Proc. IEEE ASRU Workshop, 1999.
-
M. Ostendorf, ``Incorporating linguistic theories of phonological variation into speech recognition models,'' Phil. Trans. Royal Society, vol. 358, no. 1769, pp. 1325-1338, 2000.
- These two papers give some good background and motivate models beyond phone-based HMMs.
Other
- N. Morgan and E. Fosler-Lussier. "Combining Multiple Estimators of Speaking Rate," International Conference on Acoustic, Speech, and Signal Processing (ICASSP-98), Seattle, Washington, 1998.
- N. Morgan, E. Fosler, and N. Mirghafori. "Speech Recognition using On-line Estimation of Speaking Rate," Fifth European Conference on Speech Communication and Technology (Eurospeech '97), Rhodes, Greece, 1997.
- N. Mirghafori, E. Fosler, and N. Morgan. "Towards Robustness to Fast Speech in ASR," International Conference on Acoustic, Speech, and Signal Processing (ICASSP-96), Atlanta, Georgia, 1996.
- N. Mirghafori, E. Fosler, and N. Morgan. "Why Is ASR Harder For Fast Speech and What Can We Do About It?" Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '95), Snowbird, Utah, 1995.
- N. Mirghafori, E. Fosler, and N. Morgan. "Fast Speakers in Large Vocabulary Continuous Speech Recognition: Analysis & Antidotes," Fourth European Conference on Speech Communication and Technology (Eurospeech '95), Madrid, Spain, 1995.
- May be useful if we want to do experiments with varying speaking rates.
Discussion
Enter any comments, questions, or discussion regarding relevant literature in the comment box below. New comments will be appended below existing ones and will be signed with your user name.
-- KarenLivescu - 11 Dec 2005
|