Abstract: The preponderance of and rate of accumulation of textual data is now outstretching our ability to comprehend this text using conventional means. We extend our existing framework for the interactive visualization of textual data in digital format by including near-real-time text analysis using the R statistical and analytical package(s). We utilize R as a pre-processor to programmatically gather and preprocess raw textual data generated by social media and incorporate it into textual corpora. The extended framework’s back-end is a Django-based framework that relies on both the Natural Language Processing Toolkit (NLTK 2.0) and the R language and its rich set of packages. These tools are combined to present the user with a web-based “Interactive n-gram wordCloud” front end to visually and statistically analyze corpora built from our backend. We illustrate the use of this framework by utilizing the Twitter API to glean social trends that amount to visualizing “zeitgeist.” Our framework will allow subject-matter experts, typically in the humanities and social sciences, to develop alternative analyses of social phenomenon through text mining and visualization. The intent of our tool would be that subject-matter experts are able to manipulate text without the technical background in the tools typically used for these analyses, and without having to digest the entire works themselves, which is becoming impossible.
Keywords: Natural Language Processing, word cloud, social media, corpus linguistics, n-gram, data visualization
Download this article: JISAR - V7 N1 Page 33.pdf
Recommended Citation: Jafar, M. J., Babb, J. S., Dana, K. (2014). Decision-Making via Visual Analysis using the Natural Language Toolkit and R. Journal of Information Systems Applied Research, 7(1) pp 33-46. http://jisar.org/2014-7/ ISSN: 1946-1836. (A preliminary version appears in The Proceedings of CONISAR 2013)