December 15, 2015
This was the announcement for my thesis defense:
Language Technologies for Understanding Law, Politics, and Public Policy
Seminar Series: 2015 Thesis Defense
Speaker: William Li
Speaker Affiliation: MIT CSAIL
Host: Andrew Lo
Date: Tuesday, December 15, 2015
Time: 1:00 PM to 2:30 PM
Location: 32-G882
This thesis focuses on machine learning techniques to uncover patterns and insights from large, text-based government datasets. First, we present a authorship attribution model on unsigned U.S. Supreme Court opinions, offering insights about the authorship of important cases and the dynamics of Supreme Court decision-making. Second, we apply software engineering metrics to analyze the complexity of the United States Code, revealing the structure and evolution of the U.S. Code over the past century. Third, we trace policy trajectories of bills in Congress, making it possible to visualize the contents of four key bills during the Financial Crisis. Finally, this thesis presents a novel model, Probabilistic Text Reuse (PTR), for finding repeated passages of text. Because text reuse occurs in legal and political documents because documents present similar ideas, different versions of documents are often quite similar, or because legitimate reasons for copying text exists. We illustrate the utility of PTR by capturing the structure of a large collection of public comments on the FCC's proposed regulations on net neutrality.
Previous: Reflections on DESIGN DIS(ABILITY)
Next: An Update