Tarfah Alrashed

I am a Ph.D. student in Computer Science at MIT CSAIL, advised by David Karger, and a member of the Haystack Group at MIT. My research is in the field of Human Computer Interaction, Information Retrieval, Data Science, and Semantic Web. I work on designing and building systems that empower people to access and manipulate data on the web without programming.

  tarfah@mit.edu
     tarfahalrashed

Projects



Shapir: Standardizing and Democratizing Access to Web APIs

Today, many web sites offer third-party access to their data through web APIs. However, manually encoding URLs with arbitrary endpoints, parameters, authentication handshakes, and pagination, among other things, makes API use challenging and laborious for programmers, and untenable for novices. In addition, each site offers its own idiosyncratic data model, properties, and methods that a new user must learn, even when the sites manage the same common types of information as many others. In this work, we show how working with web APIs can be dramatically simplified by describing the APIs using a standardized, machine-readable ontology. By surveying a statistical sample of web APIs, we develop a simple ontology that can effectively describe the core functionality of nearly all of them. We then present Shapir, a system that includes a graphical, form-based authoring tool for the API description, from which Shapir can automatically generate a standardized JavaScript library for accessing data on the website as objects with readable and writeable properties. This enables programmers to access data without learning the details of each API, and indeed allows them to use the same unchanged code for multiple websites. We then integrate Shapir with Mavo, a declarative HTML-based programming language for novices, to also empower non-programmers to access these APIs.
Shapir: Standardizing and Democratizing Access to Web APIs. (UIST'21)
GitHub shapir.org




Dataset or Not? A study on the veracity of semantic markup for dataset pages

Semantic markup, such as Schema.org, allows providers on the Web to describe content using a shared controlled vocabulary. This markup is invaluable in enabling a broad range of applications, from vertical search engines, to rich snippets in search results, to actions on emails, to many others. Our team relied on Schema.org to build a search engine for datasets, providing search over metadata from Web pages with Schema.org/Dataset. While Schema.org was the core enabling technology for this vertical search, we also discovered that we need to address the following problem: pages from 61% of internet hosts that provide Schema.org/Dataset markup do not actually describe datasets. In this work, we analyze the veracity of dataset markup for a Web-scale corpus and categorize pages where this markup is not reliable. We then propose a way to drastically increase the quality of the dataset metadata corpus by developing a deep neural-network classifier that identifies whether or not a page with Schema.org/Dataset markup is a dataset page. Our classifier achieves 96.7% recall at the 95% precision point. This level of precision can enable a vertical search engine to circumvent the noise in semantic markup and to use the metadata to provide high quality results to users.
Dataset or Not? A study on the veracity of semantic markup for dataset pages. (ISWC'21)
Dataset & Code datasetsearch.google.com




ScrAPIr: Making Web Data APIs Accessible to End Users

Users have long struggled to extract and repurpose data from websites by laboriously copying or scraping content from web pages. An alternative is to write scripts that pull data through APIs. This provides a cleaner way to access data than scraping; however, APIs are effortful for programmers and nigh-impossible for non-programmers to use. In this work, we empower users to access APIs without programming. We evolve a schema for declaratively specifying how to interact with a data API. We then develop ScrAPIr, a standard query GUI that enables users to fetch data through any API for which a specification exists, and a second GUI that enables users to author the specification for a given API. From a lab evaluation, we find that even non-programmers can access APIs using ScrAPIr, while programmers can access APIs 3.8 times faster on average using ScrAPIr than using programming.
ScrAPIr: Making Web Data APIs Accessible to End Users. (CHI'20)
GitHub Video Demo scrapir.org




Evaluating User Actions as a Proxy for Email Significance

The number of emails people receive every day can be overwhelming. Having a good estimate of the significance of emails forms the foundation for many downstream tasks (e.g. email prioritization); but determining significance at scale is expensive and challenging. In this work, we hypothesize that the cumulative set of actions on any individual email can be considered as a proxy for the perceived significance of that email. We propose two approaches to summarize observed actions on emails, which we then evaluate against the perceived significance. First approach is a fixed-form utility function parameterized on a set of weights. Second, we build machine-learning models to capture users' significance directly based on the observed actions. Our analysis suggests that there is a positive correlation between actions and significance of emails and that actions performed on personal and work emails are different. We also find that the degree of correlation varies across people, which may reflect the individualized nature of email activity patterns or significance.
Evaluating User Actions as a Proxy for Email Significance. (WWW'19)




The Lifetime of Email Messages: A Large-Scale Analysis of Email Revisitation

Email continues to be one of the most important means of online communication, leading to a number of challenges related to information overload and email management. To better understand email management practices in detail, we examined the distribution of visits to emails over time. During their lifetime, emails may be visited once or several times, and with each visit different actions may be taken. Emails that are revisited over time are especially interesting because they represent an opportunity to improve email management and search. We examine a large-scale log analysis of email revisitation, the activities that people perform on revisited email messages, and the strategies they use to go back to these emails. Most emails have a short lifetime, with more than 33% having a lifetime of less than 5 minutes. Our findings have implications for designing email clients and intelligent agents that support both short- and long-term revisitation patterns.
The Lifetime of Email Messages: A Large-Scale Analysis of Email Revisitation. (CHIIR'18)




Extending a Reactive Expression Language with Data Update Actions for End-User Application Authoring

Mavo is a small extension to the HTML language that empowers non-programmers to create simple web applications. Authors can mark up any normal HTML document with attributes that specify data elements that Mavo makes editable and persists. But while applications authored with Mavo allow users to edit individual data items, they do not offer any programmatic data actions that can act in customizable ways on large collections of data simultaneously or that modify data according to a computation. We explore an extension to the Mavo language that enables non-programmers to author these richer data update actions. We show that it lets authors create a more powerful set of applications than they could previously, while adding little additional complexity to the authoring process. Through user evaluations, we assess how closely our data update syntax matches how novice authors would instinctively express such actions, and how well they are able to use the syntax we provided.
Extending a Reactive Expression Language with Data Update Actions for End-User Application Authoring. (UIST'18)
GitHub Video Demo mavo.io




CoTI: Collaborative Tangible Interface for Complex Decision Support Systems

CoTI is a Collaborative Tangible Interface for complex systems that provides multi-touch interactive capabilities with analytical and visualization components to facilitate the decision making process. In CoTI, stakeholders can interact with the 3D objects that we called smart blocks and the multi-touch surface to get an immediate feedback for the impact of their decisions not only on the system under study but also on other related systems affected by those decisions. This adds another dimension to the thinking process, which enhances the users’ experience and enable them to make more informed decisions by understanding the implication of their decisions on other systems. The objective of CoTI is to support the decision making in complex systems. Therefore, an integration with a simulation engine that performs real model analysis is essential.
Collaborative Tangible Interface (CoTI) for Complex Decision Support Systems. (DUXU'15)
An Observational Study of Usability in Collaborative Tangible Interfaces for Complex Planning Systems.(AHFE'15)
Coding Schemes for Observational Studies of Usability in Collaborative Tangible User Interfaces. (HCI'15)
GitHub Video Demo


Publications

2021

Shapir: Standardizing and Democratizing Access to Web APIs.
Tarfah Alrashed, Lea Verou, David R. Karger.
UIST '21: In Proceedings of the ACM Symposium on User Interface Software and Technology.

2021

Dataset or Not? A study on the veracity of semantic markup for dataset pages.
Tarfah Alrashed, Dimitris Paparas, Omar Benjelloun, Ying Sheng, Natasha Noy.
ISWC '21: In the International Semantic Web Conference.

2020

ScrAPIr: Making Web Data APIs Accessible to End Users.
Tarfah Alrashed, Jumana Almahmoud, Amy X. Zhang, David R. Karger.
CHI '20: In Proceedings of ACM Conference on Human Factors in Computing Systems.

2019

Evaluating User Actions as a Proxy for Email Significance.
Tarfah Alrashed, CJ Lee, Peter Bailey, Christopher Lin, Milad Shokouhi, Susan Dumais.
WWW '19': In The World Wide Web Conference.

2018

Extending a Reactive Expression Language with Data Update Actions for End-User Application Authoring.
Lea Verou, Tarfah Alrashed, David R. Karger.
UIST '18: In Proceedings of the ACM Symposium on User Interface Software and Technology.

2018

The lifetime of email messages: a large-scale analysis of email revisitation.
Tarfah Alrashed, Ahmed Hassan Awadallah, Susan Dumais.
CHIIR '18: In Proceedings of the 2018 Conference on Human Information Interaction & Retrieval.

2018

Perception of speaker personality traits using speech signals.
Leilani H. Gilpin, Danielle M. Olson, Tarfah Alrashed.
CHI '18: In Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems.

2016

Social Communities in Urban Mobility Systems.
Tarfah Alrashed, Jumana Almahmoud, Mohamad Alrished, Sattam Alsubaiee, Mansour Alsaleh, Carlos S. Olascoaga.
SCSM '16: In International Conference on Social Computing and Social Media.

2016

Prototyping Complex Systems: A Diary Study Approach to Understand the Design Process.
Jumana Almahmoud, Almaha Almalki, Tarfah Alrashed, Areej Alwabil.
DUXU '16: In International Conference of Design, User Experience, and Usability.

2015

An observational study of usability in collaborative tangible interfaces for complex planning systems.
Tarfah Alrashed, Almaha Almalki, Salma Aldawood, Tariq Alhindi, Ira Winder, Ariel Noyman, Anas Alfaris, Areej Alwabil.
AHFE '15: In 6th International Conference on Applied Human Factors and Ergonomics and the Affiliated Conferences.

2015

Coding Schemes for Observational Studies of Usability in Collaborative Tangible User Interfaces.
Tarfah Alrashed, Almaha Almalki, Salma Aldawood, Anas Alfaris, Areej Alwabil.
HCII '15: In International Conference on Human-Computer Interaction.

2015

Collaborative Tangible Interface (CoTI) for Complex Decision Support Systems.
Salma Aldawood, Faisal Aleissa, Almaha Almalki, Tarfah Alrashed, Tariq Alhindi, Riyadh Alnasser, Mohammad K. Hadhrawi, Anas Alfaris, and Areej Al-Wabi.
DUXU '15: In International Conference of Design, User Experience, and Usability.