# CCS'16 Key Sharing Datasets # WHOIS Email Record Dataset # Frank Cangialosi # November 2016 This directory contains two versions of the same data: 1. domain-to-emails: for each domain in our dataset, this file lists all of the emails* that appeared in that domain's WHOIS record. Two columns, separated by a space: [domain] [comma-separated list of emails] 2. email-to-domains is simply a reverse mapping of (1). For each email that appeared in the second column of (1), we list all of the domains whose WHOIS record this email appeared in. Again two columns, separated by a space: [email] [comma-separated list of domains] * NOTE: Some WHOIS records add specific qualifiers to email addresses (e.g. "technical", "administrative"), but we make no distinction between these in our dataset and treat all of them as equal. We found in practice that different registrars used these fields differently and some didn't include them at all, so they were often misleading and relying on them caused more harm than good. Sources: [1] S. Liu, I. Foster, S. Savage, G. M. Voelker, and L. K. Saul. Who is. com? Learning to Parse WHOIS Records. IMC, 2015. [2] whoisxmlapi.com [3] bulkwhoisapi.com For more information contact Frank Cangialosi (frankc@csail.mit.edu)