Abstract: Amazon product reviews can provide a rich source of data for natural language processing research. However, the available data sets have become dated and do not have more recently included review metadata. To support a related research project, we built a custom system for obtaining Amazon product reviews. We used this project to explore modern cloud-based services and practices. The system used a variety of cloud-based distributed services such as Azure Data Factory, Azure Functions, Azure Data Lake Storage, and a third party web scraping service. The system was used to obtain 17,962 product reviews and produce data sets in several formats. This paper fully describes the system, and offers lessons learned from the experience.
Download this article: JISAR - V15 N3 Page 24.pdf
Recommended Citation: Woodall, J., Kline, D., Vetter, R., Modaresnezhad, M., (2022). A Scalable Amazon Review Collection System. Journal of Information Systems Applied Research15(3) pp 24-34. http://JISAR.org/2022-3/ ISSN : 1946 - 1836. A preliminary version appears in The Proceedings of CONISAR 2021