Dong Deng's Homepage

Dong Deng   My Name

Dong Deng is a postdoctoral associate in the Database Group at MIT CSAIL where he works with Mike Stonebraker and Sam Madden. He received his Ph.D. from Tsinghua University proudly under the guidance of Guoliang Li. His research interests include data management, data science, and data integration, and data cleaning. He is a Siebel scholar.

Database Group CSAIL MIT

Email: dongdeng AT

Office: The Stata Center, 32-G904B,
          32 Vassar Street,
          Cambridge MA 02139

What's New
  • 2018-04, serving as PC members for SIGMOD and CIKM this year. It's time to reward the community.

  • 2017-11, a paper got accepted by SIGMOD 2018. We designed an algorithm to find all set pairs
    with large enough overlap size. It is the first algorithm that escapes the quadratic trap.

  • 2017-08, "Approximate String Joins with Abbreviations" is accepted by VLDB 2018.

  • 2017-04, SILKMOTH: Finding Related Sets with Maximum Matching is accepted by VLDB 2017.
  • Postdoc: Massachusetts Institute of Technology, CSAIL, Jul 2016 - Now,

    Supervisor: Michael Stonebraker and Samuel Madden

  • Ph.D.: Tsinghua University, Department of Computer Science and Technology, Sep 2011 - Jun 2016,

    Supervisor: Guoliang Li and Jianhua Feng

  • Bachelor: Beihang University, Sep 2007 - July 2011
Research Experience
  • Research Assistant: Qatar Computing Research Institute, DA Group, Dec 2015 - Mar 2016 and Mar - Apr 2017,

    Supervisor: Mourad Ouzzani and Nan Tang

  • Research Assistant: University of Michigan, Ann Arbor, EECS, Jan 2014 - June 2014,

    Supervisor: H. V. Jagadish
  1. Overlap Set Similarity Joins with Theoretical Guarantees

    Dong Deng, Yufei Tao, Guoliang Li. SIGMOD 2018

  2. Approximate String Joins with Abbreviations.

    Wenbo Tao, Dong Deng, Michael Stonebraker. VLDB 2018

  3. SILKMOTH: An Efficient Method for Finding Related Sets with Maximum Matching Constraints.

    Dong Deng*, Albert Kim*, Samuel Madden, Michael Stonebraker. VLDB 2017 *equal contribution

  4. DIMA: A Distributed In-Memory Similarity-Based Query Processing System.

    Ji Sun, Zeyuan Shang, Guoliang Li, Dong Deng, Zhifeng Bao. VLDB 2017 (Demo)

  5. The Data Civilizer System.

    Dong Deng, Raul Castro Fernandez, Ziawasch Abedjan, Sibo Wang, Michael Stonebraker,
    Ahmed Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, Nan Tang. CIDR 2017 [News]

  6. A Demo of the Data Civilizer System.

    Raul Castro Fernandez, Dong Deng, Essam Mansour, Abdulhakim Ali Qahtan, Wenbo Tao, Ziawasch Abedjan, Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang. SIGMOD 2017 (Demo)

  7. A Unified Framework for String Similarity Search with Edit-Distance Constraint.

    Minghe Yu, Jin Wang, Guoliang Li, Yong Zhang, Dong Deng, Jianhua Feng. VLDB J. 2017

  8. Database Decay and How To Avoid It.

    Michael Stonebraker, Dong Deng, Michael L. Brodie. IEEE BigData 2016

  9. Detecting Data Errors: Where Are We and What Needs To Be Done?

    Ziawasch Abedjan, Xu Chu, Dong Deng, Raul Castro Fernandez,
    Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Michael Stonebraker, Nan Tang. VLDB 2016

  10. Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach.

    Chengliang Chai, Guoliang Li, Jian Li, Dong Deng, Jianhua Feng. SIGMOD 2016

  11. META: An Efficient Matching-Based Method for Error-Tolerant Autocompletion.

    Dong Deng, Guoliang Li, He Wen, H. V. Jagadish Jianhua Feng. VLDB 2016

  12. An Efficient Partition based Method for Set Similarity Join.

    Dong Deng, Guoliang Li, He Wen, Jianhua Feng. VLDB 2016 [More]

  13. String Similarity Search and Join: A Survey

    Minghe Yu, Guoliang Li, Dong Deng, Jianhua Feng. FCS 2016.

  14. Efficient Similarity Search and Join on Multi-Attribute Data.

    Guoliang Li, Jian He, Dong Deng, Jian Li, Jianhua Feng. SIGMOD 2015.

  15. A Unified Framework for Approximate Dictionary-based Entity Extraction

    Dong Deng, Guoliang Li, Jianhua Feng, Yi Duan, Zhiguo Gong. VLDB Journal 2015. [More]

  16. An Efficient Hierarchical Framework for Top-k and Threshold-based String Similarity Search.

    Jin Wang, Guoliang Li, Dong Deng, Yong Zhang, Jianhua Feng. ICDE 2015.

  17. A Pivotal Prefix Based Filtering Algorithm for String Similarity Search.

    Dong Deng, Guoliang Li, Jianhua Feng. SIGMOD 2014. [More]

  18. Distributed Graph Simulation: Impossibility and Possibility.

    Wenfei Fan, Xin Wang, Yinghui Wu, Dong Deng. VLDB 2014.

  19. State-of-the-art in String Similarity Search and Join.

    Sebastian Wandelt, Dong Deng, Stefan Gerdjikov, et. al. SIGMOD Record, 2014.

  20. MassJoin: A MapReduce-based Algorithm for String Similarity Joins.

    Dong Deng, Guoliang Li, Shuang Hao, Jiannan Wang, Jianhua Feng. ICDE 2014.

  21. Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases.

    Dong Deng, Yu Jiang, Guoliang Li, Jian Li, Cong Yu. VLDB 2014.

  22. A Partition-based Method for String Similarity Joins with Edit-Distance Constraints.

    Guoliang Li, Dong Deng, Jianhua Feng. ACM Transactions on Database Systems (TODS), 2013. [More]

  23. Efficient Parallel Partition-based Algorithms for Similarity Search and Join with Edit Distance Constraints.

    Yu Jiang, Dong Deng, Jiannan Wang, Guoliang Li, Jianhua Feng. EDBT/ICDT Workshop 2013. [More]

  24. Top-k String Similarity Search with Edit-Distance Constraints.

    Dong Deng, Guoliang Li, Jianhua Feng, Wen-Syan Li. ICDE 2013. [More]

  25. An Efficient Trie-based Method for Approximate Entity Extraction with Edit-Distance Constraints.

    Dong Deng, Guoliang Li, Jianhua Feng. ICDE 2012. [More]

  26. Pass-Join: A Partition based Method for Similarity Joins.

    Guoliang Li, Dong Deng, Jiannan Wang, Jianhua Feng. VLDB 2012. [More]

  27. Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction.

    Guoliang Li, Dong Deng, Jianhua Feng. SIGMOD 2011. [More]

  28. Extending dictionary-based entity extraction to tolerate errors.

    Guoliang Li, Dong Deng, Jianhua Feng. CIKM 2010. [More]

  • Program Committee Member, SIGMOD 2019

  • Program Committee Member, CIKM 2017, 2018

  • Program Committee Member, SISAP (International Workshop on Similarity Search and Its Application) 2014, 2015

  • Reviewer of The International Journal on Very Large Data Bases (The VLDB Journal)

  • Reviewer of IEEE Transaction on Knowledge and Data Engineering (TKDE)

  • Reviewer of ACM Transactions on Intelligent Systems and Technology (TIST)

  • Reviewer of IEEE Transactions on Systems, Man and Cybernetics: Systems (TMC)

  • Reviewer of Journal of Computer Science and Technology (JCST)

  • Reviewer of Computational Intelligence

Last modified on Dec 12, 2017