Science University Research Symposium (SURS)

Unifying Data In Publishing

Publication Date



Sciences and Mathematics, College of


Math and Computer Science, Department of

SURS Faculty Advisor

Dr Christina Davis

Presentation Type

Poster Presentation


There’s a significant struggle in the literary publishing industry with gathering data. With thousands of titles published each year, the industry would benefit greatly from more incorporation of metadata, yet a lack of standardization and structured data makes it difficult for mass datasets to be cultivated. The goal of this research project is to create a dataset from the New York Times Best Sellers list throughout the 2022. An html parser from the python package, Beautiful Soup, will be used to scrape a list of titles from the New York Times Best Sellers webpage in order to feed the list through the ISBNdb API. Column values such as publisher, author, publish date, language, etc. will be sourced from the ISBNdb API, and additional columns such as number of weeks spent on the list, rank on list, and category will be scraped directly from the New York Times webpage.

This document is currently not available here.