Do you think you would be able to guess the gender of the author of an anonymous tweet? David Bamman, Jacob Eisenstein and Tyler Schnoebelen found some distinct but complex differences in the way men and women use language on the Twitter microblogging site.
The researchers amassed a corpus of over 9 million tweets from more than 14,000 American users. They then assigned gender to each account using historical census information on given names. Using computational methods and statistical tests, they found that:
· pronouns are used more frequently by women. These include alternative spellings such as u and yr;
· emotion terms (sad, love, etc.) and emoticons are also associated with female authors.;
· although previous research has found that kinship terms are used more often by women, in this study it really is a mixed bag. Most kinship words including mom, sister, daughter, child, dad and husband are used more by women. However, a few words, including wife, bro, and brotha, are associated with male authors;
· some abbreviations such as lol and omg are used more by women, as are ellipses, expressive lengthening (e.g. coooooool), emoticons, exclamation marks, question marks, representations of sounds like ah, hmmm, and grr as well as hesitation words such as um;
· assent terms such as okay and yess, are all used more by women, but yessir is used more by men. Similarly, negation terms nooo, and cannot are associated with women, while nah, nobody, and ain’t suggest the author is likely to be a man.
· swearwords and taboo words are mostly used by male writers whereas women choose milder terms such as darn.
The researchers suggest that the male/female distinction used in much previous research is too simplistic. For example, some linguists claim that women use language in a more expressive way than men by lengthening words like yess and noo. However, using swearwords may also be seen as expressive, and this is done more frequently by men. Similarly, the fact that women use more abbreviations such as omg and lol goes against the common view that women prefer to use more standard language. And although men mention named 'entities' such as Apple or Steve about 30% more often than women, this does not support previous claims that men use language mainly to convey information while women tend to engage with others. When one looks more closely at the data it becomes clear that many of the named entities are sports figures and teams, and are used by men to engage with others with similar sports interests.
The researchers then identified groups of tweeters who used similar sets of words, regardless of their gender. Many groups turned out to have a substantial majority of either men or women. While some of these clusters matched the linguistic expectations for their gender, others didn’t. For example, although swearwords are generally preferred by men, some of the male-associated clusters used taboo terms far less often than women. On a closer look, many of these messages turned out to be work-related, where taboo language would be discouraged.
Finally, the researchers wanted to find out whether individuals with a greater proportion of same-gender people in their social networks use more linguistic items associated with their gender. In other words, do birds of a feather tweet together?
Well, sort of. There was a strong correlation between the use of gendered language and the composition of people’s social networks. The women in the dataset had networks which were on average 58% female. However, women whose tweets contained the most strongly marked female characteristics had social networks which were 77% female. Conversely, women who displayed the least gender-marked language had social networks that were on average only 40% female. The results for the men followed a similar pattern.
This fits with previous work showing that people change the way they communicate to match their addressees. People can use language to position themselves in relation to others, and they can do this by either conforming to or defying gendered expectations. So, it seems that it is not so straightforward to match language use with gender after all.
Bamman, David, Jacob Eisenstein & Tyler Schnoebelen (2014) Gender identity and lexical variation in social media. Journal of Sociolinguistics 18(2): 135–160.
This summary was written by Danniella Samos