Suwon Shon

32 Vassar St., 32-G436
Cambridge, MA, USA

swshon (at) csail (dot) mit (dot) edu

Curriculum Vitae / CSAIL Profile / Google Scholar

I'm a post-doctoral associate at Spoken Language Systems group of MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) working with Dr. James Glass. I received B.S and integrated Ph.D. degree in electrical engineering from Korea University at South Korea in 2010 and 2017, respectively. My research focuses on machine learning technologies for speech signal processing and I have been working on speaker and language recognition and related pre-processing techniques.

Recent work and projects

  • (May~Oct. 2018) Participated NIST Speaker Recognition Evaluation (SRE) 2018 as member of JHU-MIT Team [system description]
  • (May 2018) Organizing MCE 2018 [website] [plan] [code] [dataset]
  • (Mar. 2018) Organizing task in Vardial 2018 [code]
    • If you want to start the Arabic Dialect Identification task with dialect embeddings, you can download here
    • The complete program and papers are available at workshop of COLING 2018 [link]
  • (Feb. 2018) Real-time Arabic dialect identification is online! you can find the system here
    This system had demo session on ICASSP 2018 [slide]
    Detailed system architecture can be found on github repo.
  • (Dec. 2017) I lead MIT-QCRI team for the 3rd Multi-Genre Broadcast (MGB-3) Challenge on Arabic Dialect Identification (ADI) task and we won the challenge!
    • MIT-QCRI team paper was presented in ASRU 2017
    • We marked 75% overall accuracy which is significantly higher than second team (70%) and detail results can be found on the summary paper.
    • We further improved performance of the system and now it is 81% on MGB-3 Test set which is best accuracy reported until now. Check paper [14] below. (Feb.28 2018)


  • "Analyzing hidden representation of end-to-end speaker recognition system", KAIST, Daejeon, South Korea, Jul. 5, 2018
    • Same topic was presented at Kookmin University (Jul. 7, 2018), Korea University (Jul. 7, 2018), NCSOFT (Jul. 6, 2018) and Naver (Jul. 11, 2018).
  • "Recent Speaker Recognition progress", Philips Visit Day, Cambridge, MA, USA, Apr. 11 2018
  • "Speaker / Dialect Recognition under Limited Resources", Qatar Computing Research Institute, Doha, Qatar, Nov. 14, 2017
  • “Autoencoder based Domain Adaptation for Speaker Recognition under Insufficient Channel Information”, Interspeech 2017, Stockholm, Sweden, Aug. 22, 2017

Publications (Peer-reviewed)

[-]“VoiceID Loss: Speech Enhancement for Speaker Verification”, in preparation
[-]“Learning pronunciation from a foreign language in speech synthesis networks”, in preparation
[-]“MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation”, in preparation
[-]“State-of-the-art Speaker Recognition for Telephone and Video Speech: the JHU-MIT Submission for NIST SRE18”, in preparation
[-]“Large-scale Speaker Retrieval on Random Speaker Variability Subspace”, in preparation
[20]Suwon Shon, Tae-Hyun Oh, James Glass, “Noise-tolerant Audio-Visual Online Person Verification using an Attention-based Neural Network Fusion”, to appear in ICASSP, Brighton, UK, May 2019 [arxiv]
[19]Suwon Shon, Ahmed Ali, James Glass, “Domain Attentive Fusion for End-to-end Dialect Identification with Unknown Target Domain”, to appear in ICASSP, Brighton, UK, May 2019 [arxiv]
[18]Seongkyu Mun, Suwon Shon, “Domain Mismatch Robust Acoustic Scene Classification using Channel Information Conversion”, to appear in ICASSP, Brighton, UK, May 2019 [arxiv]
[17]Suwon Shon, Wei-Ning Hsu and James Glass, “Unsupervised Representation Learning of Speech for Dialect Identification”, IEEE Workshop on Spoken Language Technology (SLT), pp 105-111, Athens, Greece, Dec. 2018, [pdf][arxiv][poster]
[16]Suwon Shon, Hao Tang and James Glass, “Frame-level Speaker Embeddings for Text-independent Speaker Recognition and Analysis of End-to-end Model”, IEEE Workshop on Spoken Language Technology (SLT), pp 1007-1013, Athens, Greece, Dec. 2018, [pdf][arxiv][poster][Supplementary figures]
[15]Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Ahmed Ali, Suwon Shon, James Glass and others, “ Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign”, In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial) of COLING, pp. 1-17, Santa Fe, USA, Aug. 2018 [pdf]
[14]Suwon Shon, Ahmed Ali and James Glass, “Convolutional Neural Networks and Language Embeddings for End-to-End Dialect Recognition”, Speaker Odyssey 2018, The Speaker and Language Recognition Workshop, pp 98-104, Les Sables d'Olonne, France, June 2018 [pdf][poster][code]
[13]Maryam Najafian, Sameer Khurana, Suwon Shon, Ahmed Ali and James Glass, “Exploiting Convolutional Neural Network for Phonotactic based Dialect Identification”,IEEE International Conference on Acoustics, Speechand Signal Processing (ICASSP), pp.5174-5178, Calgary, Canada, April 2018 [pdf] [poster]
[12]Suwon Shon, Ahmed Ali and James Glass, “MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre Broadcast Challenge”, IEEE Automatic Speech Recognition and Understanding (ASRU) Workshop,pp.374-380, Okinawa, Japan, December 2017 [pdf] [poster] [code]
[11]Suwon Shon, Seongkyu Mun, Wooil Kim and Hanseok Ko, “Autoencoder based Domain Adaptation for Speaker Recognition under Insufficient Channel Information”,Interspeech, pp.1014-1018, Stockholm, Sweden, August 2017 [pdf] [slide]
[10]Suwon Shon, Seongkyu Mun and Hanseok Ko, “Recursive Whitening Transformation for Speaker Recognition on Language Mismatched Condition”, Interspeech, pp.2869-2873, Stockholm, Sweden, August 2017 [pdf] [poster]
[9]Seongkyu Mun, Suwon Shon, Wooil Kim, David Han and Hanseok Ko, “Deep Neural Network based Learning and Transferring Mid-level Auto Features for Acoustic Scene Classification”,IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.796-800, New Orleans, USA, March 2017 [pdf]
[8]Seongkyu Mun, Suwon Shon, Wooil Kim and Hanseok Ko, “Deep Neural Network Bottleneck Features for Acoustic Event Recognition”,Interspeech, pp.2954-2957, San Francisco, CA, USA, September 2016 [pdf]
[7]Suwon Shon, Seongkyu Mun, David Han and Hanseok Ko, “A non-negative matrix factorization based subband decomposition for acoustic source localization”,Electronics Letters, Vol.51, No.22, pp.1723-1724, 2015 [pdf]
[6]Suwon Shon, Seungkyu Mun, David Han, Hanseok Ko, “Maximum Likelihood Linear Dimension Reduction of Heteroscedastic Feature for Robust Speaker Recognition” IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS), Karlsruhe, Germany, August 25-28, 2015 [pdf]
[5]Seungkyu Mun, Suwon Shon, Wooil Kim, Hanseok Ko, “Generalized cross-correlation based noise robust abnormal acoustic event localization utilizing non-negative matrix factorization” IEEE International Conference on Advanced Video and Signal-based Surveillance (AVSS), Seoul, South Korea, September 26-29, 2014 [pdf]
[4]Suwon Shon, David K. Han, and Hanseok Ko, “Abnormal Acoustic Event Localization based on Selective Frequency Bin in High Noise Environment for Audio Surveillance”,IEEE International Conference on AdvancedVideo and Signal-based Surveillance(AVSS), pp.87-92 , Krakow, Poland, August 2013 [pdf]
[3]Suwon Shon, David K. Han, Jounghoon Beh and Hanseok Ko, “Full Azimuth Multiple Sound Source localization with 3-channel microphone array”,IEICE Trans. on Fundamentals, Vol. E95-A, No. 4 pp 745-750, April 2012 [pdf]
[2]Suwon Shon, Eric Kim, Jongsung Yoon and Hanseok Ko, “Sudden Noise Source Localization System for intelligent Automobile application with Acoustic Sensors”,IEEE International Conference on Consumer Electronics, pp.237-238, Las Vegas, NV, USA, January 2012 [pdf]
[1]Suwon Shon, Jounghoon Beh, Cheoljong Yang, David K. Han and Hanseok Ko, “Motion Primitives for Designing Flexible Gesture Set in Human-Robot Interface”,International Conference on Control, Automation and Systems, pp.1501-1504, Il-san, South Korea, October 2011 [pdf]

Manuscripts (Non-peer-reviewed)

[3]J. Villalba, N. Chen, D. Snyder, D. Garcia-Romero, A. McCree, G. Sell, J. Borgstrom, F. Richardson, S. Shon, F. Grondin, R. Dehak, L. P. Garcia-Perera, P. A. Torres-Carrasquillo, and N. Dehak, "The JHU-MIT System Description for NIST SRE18", Proc. NIST Speaker Recognition Evaluation Workshop, Athens, Greece, December 2018. [pdf]
[2]Suwon Shon, Najim Dehak, Douglas Reynolds, James Glass, "MCE 2018: The 1st Multi-target speaker detection and identification Challenge Evaluation (MCE) Plan", MCE 2018 plan description [pdf] [website]
[1]Suwon Shon and Hanseok Ko, “KU-ISPL Speaker Recognition Systems under Language Mismatch Condition for NIST 2016 Speaker Recognition Evaluation”,NIST SRE16 workshop, San Diego, USA, December 2016 [pdf] [poster]