|Friday, February 22|
|CS11 Theme 2: Data Modeling and Analysis #4||
Fri, Feb 22, 3:15 PM - 4:45 PM
Uncovering the Truths Behind Internet Domain Registrations (302439)
*Edward J Mulrow, NORC at the University of Chicago
Steven Pedlow, NORC at the University of Chicago
Keywords: web-bot, WHOIS, DNSBL, Natural Language Processing, Machine Learning
The Internet Corporation for Assigned Names and Numbers (ICANN) requested a study of its internet domain registration data. In 2011, ICANN’s WHOIS databases cataloged more than 220 million website registrations. A representative sample was selected from the five most common generic top-level domain names, which cover 98.5 percent of ICANN registered websites, and WHOIS, DNSBL and organic data was extracted via a customized web-bot. WHOIS registrant name and registrant organization data were used to classify the types of entities that register domain names, such as natural persons, corporations, privacy and proxy service providers, and others. With these data, we analyze content associated with each domain name to classify the types of entities and identify the various types of activities associated with them. Entity and commercial activity classifications are developed using a variety of techniques ranging from manual coding to natural language processing and machine learning. Interrelationships among entity types and activities are examined to help ICANN better understand the wide variety of possible correlations that may emerge and their potential policy implications.