Evaluation of GloWbE as a Tool for Big Data Corpus Linguistics
Keywords: Big Data, Corpus Linguistics, GloWbE, Digital Divide, Mark Davies, Robert Fuchs, Christian Mair, Digital Humanities
Categories: Artificial Intelligence, Modeling and Simulation
Researchers have been interested in media and their impact on spoken and written language prior to the rise of the internet and Web 2.0. Plato already discussed the relationship between written and spoken language more than 400 BC. The chances that came with the beginning of digitalization are technical possibilities of detection, analysis, and evaluation of language. Academic disciplines such as linguistics and digital humanities are dedicated not only to dealing with digitalization and language but also to exploring their interrelationship, opportunities, and occurring (linguistic) phenomena. This paper critically questions Big Data practices in corpus linguistics research. I examine the Big Data corpus GloWbE (Corpus of Global Web-based English) as a linguistic research tool by highlighting advantages, emphasizing critical aspects, and presenting constructive approaches and concepts in the context of this paper. I make some suggestions on how researchers can deal with the mentioned issues of ethics, geographical boundaries and non-native speakers, technological challenges, access and participation, and various authors. Finally, I then summarize the results as chances and challenges in working with Big Data corpora as linguistic research tools and propose topics for further research. As always, I am happy about constructive criticism.