I was reading a post on Ted Underwood’s blog and he linked an article by a group of linguists about using gendered language on social media sites and the use of computers in determining author gender/group.
The article, “Gender on Twitter: Syles, Stances, and Social Networks,” (Bamman, Eisenstein, Schnoebelen) discusses the multifaceted relationship between gender and language, discussing cases that both adhere to and deviate from gender language norms. Through a combination of computational techniques and social media theory, the authors provide a new perspective on how individuals “act out” language in social media forums.
The article begins by discussing a large amount of work that has been done by other researchers, covering a wide variety of research and theories. The authors discussed a variety of problems with previous research, indicating that one previous study focused largely on groups “with social network connections to unambiguously gendered entities: sororities, fraternities, and hygiene products.” Focusing on these groups undoubtedly lead researchers towards individuals with very specific gender identities, perhaps skewing their data. They also discuss the Eckert and McConnell-Ginet findings, illustrating that gender differences are not entirely stable throughout socioeconomic classes. The authors also use Eckert’s work to point to the fact that the social meaning of linguistic expression matters deeply on the social/linguistic context in which the expressions are used.
When the authors begin discussing their own findings, they list the three major contributions they are making to the current field:
“1. We attempt a large-scale replication of previous work on the gender distribution
of several word classes, and introduce new word classes specifically for corpora
of computer-mediated communication.
2. We show that clustering authors by their lexical frequencies reveals a range of
coherent styles and topical interests, many of which are strongly connected with
gender or other social variables. But while some of these styles replicate the aggregated correlations between gender and various linguistic resources, others are
in contradiction. This provides large-scale evidence for the existence of multiple gendered styles.
3. We examine the social network among authors in our dataset, and find that gender homophily correlates with the use of gendered language. Individuals with
many same-gender friends tend to use language that is strongly associated with
their gender (as measured by aggregated statistics), and individuals with more
balanced social networks tend not to. This provides evidence that the performance
of popular gender norms in language is but one aspect of a coherent gendered persona that shapes an individual’s social interactions.” (10-11)
The authors provided their findings in a graph (15-16):

Among those results, expressive lengthening was also determined to be a female marker (i.e.; nooo; yessss; coooool).
Though their findings initially seem to cohere with previous findings, the authors point out that categorizing these results will undoubtedly lead to problems: to say, for example, that females are more expressive (due to expressive lengthening, emoticons, and punctuation) would be difficult because swear words are also expressive forms of language and are predominately used by males. Women also utilize a lot of abbreviation (lol, omg) which prevents one from arguing that women must avoid swears because they wish to adhere to standard language. The authors point out that these differences and problems lead them to conclude that a much more nuanced manner of analysis, allowing for the formation of several different categories of language.
The authors then moved on to create “clusters,” organizing authors according to subject matter or area of interest. These groups were not organized using gender but the authors discovered that the clusters had a strong gender bias.

Cluster Results, p. 23
Fourteen of the sixteen clusters demonstrated a significant gender difference, measuring at least 60% for the dominant gender, even in the smallest sample set. The authors point out that a 60/40 spread is at a probability that is less than 1%, making their findings all the more indicative of an important relationship between gender and language use.
The cluster method allows the authors the ability to break down how gender is constructed in each of these individual groups. Men, for example, are largely represented in the ‘sports’ cluster; but do men that enjoy baseball define masculinity in the same way that men that enjoy wrestling do (26)? These differences in gender construction allow us the opportunity to determine the performance/construction of gender in each group and understand how and why it differs.
This article is definitely a fascinating read, bringing to light the ways that we use social media to communicate (and in a way, define) who we are. I can’t claim to have a huge understanding of linguistics (I took one course on it in college) but I think that this article is accessible enough that people without a linguistics background can understand it.
I hope that you get the chance to read the article. It is truly interesting and may impact the way that you think about gender, language, and social media sites.