alt text 

David Harwath

Assistant Professor, University of Texas at Austin Computer Science Department
Alumni, MIT CSAIL Spoken Language Systems Group

About Me

As of September, 2020, I have joined the computer science department at UT Austin as an assistant professor. You can find my new homepage here. I am looking for prospective graduate students interested in machine learning applied to speech, audio, and natural language, especially within a multimodal context (e.g. in conjunction with vision). If you would like to pursue your graduate studies with me, please apply to UTCS and mention your interest in working with my group in your statement of purpose.

My research interests are in the area of machine learning for speech and language processing. The ultimate goal of my work is to discover the algorithmic mechanisms that would enable computers to learn and use spoken language the way that humans do. My approach emphasizes the multimodal and grounded nature of human language, and thus has a strong connection to other machine learning disciplines such as computer vision.

While modern machine learning techniques such as deep learning have made impressive progress across a variety of domains, it is doubtful that existing methods can fully capture the phenomenon of language. State-of-the-art deep learning models for tasks such as speech recognition are extremely data hungry, requiring many thousands of hours of speech recordings that have been painstakingly transcribed by humans. Even then, they are highly brittle when used outside of their training domain, breaking down when confronted with new vocabulary, accents, or environmental noise. Because of its reliance on massive training datasets, the technology we do have is completely out of reach for all but several dozen of the 7,000 human languages spoken worldwide.

In contrast, human toddlers are able to grasp the meaning of new word forms from only a few spoken examples, and learn to carry a meaningful conversation long before they are able to read and write. There are critical aspects of language that are currently missing from our machine learning models. Human language is inherently multimodal; it is grounded in embodied experience; it holistically integrates information from all of our sensory organs into our rational capacity; and it is acquired via immersion and interaction, without the kind of heavy-handed supervision relied upon by most machine learning models. My research agenda revolves around finding ways to bring these aspects into the fold.

I hold a B.S. in electrical and computer engineering from the University of Illinois at Urbana-Champaign (2010), a S.M. in computer science from MIT (2013), and a Ph.D. in computer science from MIT (2018).

Datsets and Code




Media Coverage