Tuesday 17 September 2019

You are what you Tweet!

In the time that it takes you to read this article, millions of users will have sent a Snapchat, uploaded an Insta Story and updated their Twitter profile. The age of digital culture is very much upon us. For Linguists, the contemporary networked society offers a way to explore language use beyond the traditional method of recording and interviewing speakers. This includes those studies which examine the dialectal distribution of words and features across different parts of the country. One such paper is Grieve and colleagues’ recent Twitter-based analysis of lexical variation in British English.

Traditionally, linguists interested in researching dialectal variation (i.e., linguistic features specific to a particular geographic region or group) have set about researching this topic by conducting surveys and interviews with speakers of a particular variety. For instance, a linguist might ask someone to name the “a narrow passageway between or behind buildings”. If you’re from the south, you might say ‘alleyway’ but northern speakers might call it a ‘snicket’ or a ‘ginnel’.

With the advent of social media, however, linguists no longer have to elicit these words directly. Rather, they can extract massive datasets of social media data to examine where in the country these words are used most.

In their 2019 paper, Grieve and colleagues used a corpus (i.e., dataset) of 180 million Tweets to examine lexical variation in British English. Helpfully, since tweets include what is known as ‘metadata’ that relates to the location in which the tweet was sent, Grieve and colleagues were able to plot these tweets on maps to identify where these words were most frequent. They compared their analysis with the more traditional approach taken in the BBC Voices project.

Their analysis very convincingly shows that the lexical variation observed in the Twitter data mirrors that identified in more traditional analyses! This finding is shown in the graphic below, where for all of the 8 words, the Twitter maps look comparable to those created for the BBC Voices project. For instance, consider the maps for the word ‘bairn’ – a word that means ‘child’ is typically heard in northern UK dialects (second row, right). The BBC Voices project map and the Twitter map are virtually indistinguishable. Across both maps, this word appears largely confined to the north/north-east of the UK – as expected.

Whilst, for the most part, the traditional dialect maps and the Twitter dialect maps look very similar, Grieve and colleagues note some differences. For instance, in the Twitter dataset, ‘bairn’ is observed to account for a maximum of 7.2% instances of the word ‘child’, even in the areas where it is stereotypically associated with that dialect. This is in comparison to the BBC Voices dataset, which reports a maximum of 100% of instances of ‘bairn’ for ‘child’ in some areas. Discussing the reasons for this difference, Grieve and colleagues explore several possibilities. First, they suggest that the differences may be related to a decline in usage of this word. It is possible that 'bairn' has simply become less popular over time. However, the decline in the use of this word also might have something to do with the type of data we get from Twitter and the way it's analysed in large-scale studies such as this. In particular, the authors note that it is impossible to examine the conversational context of the tweet. A such, it’s possible that’s there’s some contexts where users would use ‘child’ for ‘bairn’ even if they use the dialectal term ‘bairn’ in speech. For instance, if a user is reporting someone else’s speech.

Nevertheless, with these issues aside, Grieve and colleagues’ analysis suggests that the findings observed in large-scale dialectal surveys are largely mirrored in the Twitter data. As such, we can expect more and more sociolinguistic research to examine data from social media sites, such as Twitter in the future! So, it seems, you really are what you tweet!


Grieve, Jack; Chris Montgomery; Andrea Nini; Akira Murakami & Diansheng Guo (2019) Mapping Lexical Dialect Variation in British English Using Twitter. Frontiers in Artificial Intelligence

This summary was written by Christian Ilbury