Unifying Data In Publishing
Sciences and Mathematics, College of
Math and Computer Science, Department of
SURS Faculty Advisor
Dr Christina Davis
There’s a significant struggle in the literary publishing industry with gathering data. With thousands of titles published each year, the industry would benefit greatly from more incorporation of metadata, yet a lack of standardization and structured data makes it difficult for mass datasets to be cultivated. The goal of this research project is to create a dataset from the New York Times Best Sellers list throughout the 2022. An html parser from the python package, Beautiful Soup, will be used to scrape a list of titles from the New York Times Best Sellers webpage in order to feed the list through the ISBNdb API. Column values such as publisher, author, publish date, language, etc. will be sourced from the ISBNdb API, and additional columns such as number of weeks spent on the list, rank on list, and category will be scraped directly from the New York Times webpage.
Aebi, Katherine E.; Davis, Christina PhD; and Wigal, Sara, "Unifying Data In Publishing" (2023). Science University Research Symposium (SURS). 132.