Dong Deng's Homepage
Photo

Dong Deng   My Name

Dong Deng is a postdoctoral associate in the Database Group at MIT CSAIL where he works with Mike Stonebraker and Sam Madden. He received his Ph.D. from Tsinghua University proudly under the guidance of Guoliang Li. His research interests include data management, data science, and data integration, and data cleaning. He is a Siebel scholar.



Database Group CSAIL MIT

Email: dongdeng AT csail.mit.edu

Office: The Stata Center, 32-G904B,
          32 Vassar Street,
          Cambridge MA 02139

What's New
  • I am recruiting PhD students and interns starting from Fall 2019. Send me an email if you feel interested in doing research with me at Rutgers University.

  • 2018-04, serving as PC members for SIGMOD and CIKM this year. It's time to serve the community.

  • 2017-11, a paper got accepted by SIGMOD 2018. We designed an algorithm to find all set pairs
    with large enough overlap size. It is the first algorithm that escapes the quadratic trap.

  • 2017-08, "Approximate String Joins with Abbreviations" is accepted by VLDB 2018.

  • 2017-04, SILKMOTH: Finding Related Sets with Maximum Matching is accepted by VLDB 2017.
Education
  • Postdoc: Massachusetts Institute of Technology, CSAIL, Jul 2016 - Jul 2018,

    Supervisor: Michael Stonebraker and Samuel Madden

  • Ph.D.: Tsinghua University, Department of Computer Science and Technology, Sep 2011 - Jun 2016,

    Supervisor: Guoliang Li and Jianhua Feng

  • Bachelor: Beihang University, Sep 2007 - July 2011
Research Experience
  • Senior Scientist: Inceptional Institute of Artificial Intelligence, Abu Dhabi UAE, Jul 2018 - now

  • Research Assistant: Qatar Computing Research Institute, Doha Qatar, Dec 2015 - Mar 2016 and Mar - Apr 2017

  • Research Assistant: University of Michigan, Ann Arbor MI, Jan 2014 - June 2014, Supervisor: H. V. Jagadish
Publications
  1. Overlap Set Similarity Joins with Theoretical Guarantees

    Dong Deng, Yufei Tao, Guoliang Li. SIGMOD 2018


  2. Approximate String Joins with Abbreviations.

    Wenbo Tao, Dong Deng, Michael Stonebraker. PVLDB 2018


  3. A Partial-Order-based Framework for Cost-Effective Crowdsourced Entity Resolution

    Chengliang Chai, Guoliang Li, Jian Li, Dong Deng, Jianhua Feng.VLDB Journal 2018


  4. Building Data Civilizer Pipelines with an Advanced Workflow Engine

    Essam Mansour, Dong Deng, Raul Castro Fernandez, Abdulhakim A. Qahtan, Wenbo Tao, Ziawasch Abedjan, Ahmed Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang.ICDE 2018 (Demo)


  5. SILKMOTH: An Efficient Method for Finding Related Sets with Maximum Matching Constraints.

    Dong Deng*, Albert Kim*, Samuel Madden, Michael Stonebraker. PVLDB 2017 *equal contribution


  6. DIMA: A Distributed In-Memory Similarity-Based Query Processing System.

    Ji Sun, Zeyuan Shang, Guoliang Li, Dong Deng, Zhifeng Bao. PVLDB 2017 (Demo)


  7. The Data Civilizer System.

    Dong Deng, Raul Castro Fernandez, Ziawasch Abedjan, Sibo Wang, Michael Stonebraker,
    Ahmed Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, Nan Tang. CIDR 2017 [News]


  8. A Demo of the Data Civilizer System.

    Raul Castro Fernandez, Dong Deng, Essam Mansour, Abdulhakim Ali Qahtan, Wenbo Tao, Ziawasch Abedjan, Ahmed K. Elmagarmid, Ihab F. Ilyas, Samuel Madden, Mourad Ouzzani, Michael Stonebraker, Nan Tang. SIGMOD 2017 (Demo)


  9. A Unified Framework for String Similarity Search with Edit-Distance Constraint.

    Minghe Yu, Jin Wang, Guoliang Li, Yong Zhang, Dong Deng, Jianhua Feng. VLDB Journal 2017


  10. Database Decay and How To Avoid It.

    Michael Stonebraker, Dong Deng, Michael L. Brodie. IEEE BigData 2016


  11. Detecting Data Errors: Where Are We and What Needs To Be Done?

    Ziawasch Abedjan, Xu Chu, Dong Deng, Raul Castro Fernandez,
    Ihab F. Ilyas, Mourad Ouzzani, Paolo Papotti, Michael Stonebraker, Nan Tang. PVLDB 2016


  12. Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach.

    Chengliang Chai, Guoliang Li, Jian Li, Dong Deng, Jianhua Feng. SIGMOD 2016


  13. META: An Efficient Matching-Based Method for Error-Tolerant Autocompletion.

    Dong Deng, Guoliang Li, He Wen, H. V. Jagadish Jianhua Feng. PVLDB 2016


  14. An Efficient Partition based Method for Set Similarity Join.

    Dong Deng, Guoliang Li, He Wen, Jianhua Feng. PVLDB 2016 [More]


  15. String Similarity Search and Join: A Survey

    Minghe Yu, Guoliang Li, Dong Deng, Jianhua Feng. FCS 2016.


  16. Efficient Similarity Search and Join on Multi-Attribute Data.

    Guoliang Li, Jian He, Dong Deng, Jian Li, Jianhua Feng. SIGMOD 2015.


  17. A Unified Framework for Approximate Dictionary-based Entity Extraction

    Dong Deng, Guoliang Li, Jianhua Feng, Yi Duan, Zhiguo Gong. VLDB Journal 2015. [More]


  18. An Efficient Hierarchical Framework for Top-k and Threshold-based String Similarity Search.

    Jin Wang, Guoliang Li, Dong Deng, Yong Zhang, Jianhua Feng. ICDE 2015.


  19. A Pivotal Prefix Based Filtering Algorithm for String Similarity Search.

    Dong Deng, Guoliang Li, Jianhua Feng. SIGMOD 2014. [More]


  20. Distributed Graph Simulation: Impossibility and Possibility.

    Wenfei Fan, Xin Wang, Yinghui Wu, Dong Deng. PVLDB 2014.


  21. State-of-the-art in String Similarity Search and Join.

    Sebastian Wandelt, Dong Deng, Stefan Gerdjikov, et. al. SIGMOD Record 2014.


  22. MassJoin: A MapReduce-based Algorithm for String Similarity Joins.

    Dong Deng, Guoliang Li, Shuang Hao, Jiannan Wang, Jianhua Feng. ICDE 2014.


  23. Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases.

    Dong Deng, Yu Jiang, Guoliang Li, Jian Li, Cong Yu. PVLDB 2014.


  24. A Partition-based Method for String Similarity Joins with Edit-Distance Constraints.

    Guoliang Li, Dong Deng, Jianhua Feng. ACM Transactions on Database Systems (TODS) 2013. [More]


  25. Efficient Parallel Partition-based Algorithms for Similarity Search and Join with Edit Distance Constraints.

    Yu Jiang, Dong Deng, Jiannan Wang, Guoliang Li, Jianhua Feng. EDBT/ICDT Workshop 2013. [More]


  26. Top-k String Similarity Search with Edit-Distance Constraints.

    Dong Deng, Guoliang Li, Jianhua Feng, Wen-Syan Li. ICDE 2013. [More]


  27. An Efficient Trie-based Method for Approximate Entity Extraction with Edit-Distance Constraints.

    Dong Deng, Guoliang Li, Jianhua Feng. ICDE 2012. [More]


  28. Pass-Join: A Partition based Method for Similarity Joins.

    Guoliang Li, Dong Deng, Jiannan Wang, Jianhua Feng. PVLDB 2012. [More]


  29. Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction.

    Guoliang Li, Dong Deng, Jianhua Feng. SIGMOD 2011. [More]


  30. Extending dictionary-based entity extraction to tolerate errors.

    Guoliang Li, Dong Deng, Jianhua Feng. CIKM 2010. [More]

Awards
Service
  • Program Committee Member, SIGMOD 2019

  • Program Committee Member, ICDE 2019

  • Program Committee Member, CIKM 2017, 2018

  • Program Committee Member, SISAP (International Workshop on Similarity Search and Its Application) 2014, 2015

  • Reviewer of The International Journal on Very Large Data Bases (The VLDB Journal)

  • Reviewer of IEEE Transaction on Knowledge and Data Engineering (TKDE)

  • Reviewer of ACM Transactions on Intelligent Systems and Technology (TIST)

  • Reviewer of IEEE Transactions on Systems, Man and Cybernetics: Systems (TMC)

  • Reviewer of Journal of Computer Science and Technology (JCST)

  • Reviewer of Computational Intelligence

Last modified on Dec 12, 2017