Edinburgh Research Archive

The Language of Weblogs: A study of genre and individual differences

dc.contributor.advisor
Oberlander, Jon
en
dc.contributor.author
Nowson, Scott
en
dc.date.accessioned
2006-05-18T14:30:00Z
dc.date.available
2006-05-18T14:30:00Z
dc.date.issued
2006-06
dc.description
Institute for Communicating and Collaborative Systems
en
dc.description.abstract
This thesis describes a linguistic investigation of individual differences in online personal diaries, or 'blogs.' There is substantial evidence of gender differences in language (Lakoff, 1975), and to a lesser extent linguistic projection of personality (Pennebaker & King, 1999). Recent work has investigated these latter differences in the area of computer-mediated communication (CMC), specifically e-mail (Gill, 2004). This thesis employs a number of analytic techniques, both top-down (dictionary-based) and bottom-up (data-driven), in order to explore personality and gender differences in the language of blogs. A corpus was constructed by asking authors to submit a month of text and complete a sociobiographic questionnaire. The corpus consists of over 400,000 words and five-factor personality data (Buchanan, 2001) for 71 subjects. The thesis begins by framing blogs in the context of other genres, both CMC and traditional, in order to show both the distinctiveness and representativeness of the genre. Top-down content analysis techniques are then employed to investigate the relationship between personality and linguistic features. A number of features correlate with each trait, but upon regression, very little variance is explained. Bottom-up techniques are more successful. The corpus was stratified into high, low and neutral personality groups to identify distinctive collocations for each. Returning to the raw personality scores, it becomes clear that even a small amount of n-gram context helps account for much more variance in personality. A measure of contextuality (Heylighen & Dewaele, 2002) shows that authors considered high in Agreeableness pay more attention to differences between their extra-linguistic context and that of their audience. Attention turns to gender, where similar methods are applied to investigate gender differences in language. Many previous findings are confirmed in the blog corpus. In addition, women are found to write more in their blogs than men. More generally, using the British National Corpus, it is shown that women are more contextual, except in the least contextual of genres (academic writing) where there is no difference. The study concludes by confirming that both gender and personality are projected by language in blogs; furthermore, approaches which take the context of language features into account can be used to detect more variation than those which do not.
en
dc.format.extent
3284490 bytes
en
dc.format.mimetype
application/pdf
en
dc.identifier.uri
http://hdl.handle.net/1842/1113
dc.language.iso
en
dc.publisher
University of Edinburgh. College of Science and Engineering. School of Informatics.
en
dc.relation.hasversion
Nowson, S., Oberlander, J., & Gill, A. (2005). Weblogs, genres and individual differences. Proceedings of the 27th Annual Conference of the Cognitive Science Society, (pp. 1666-1671). Hillsdale, NJ: LEA.
en
dc.subject.other
Personality
en
dc.subject.other
Language
en
dc.subject.other
Gender
en
dc.subject.other
Weblogs
en
dc.title
The Language of Weblogs: A study of genre and individual differences
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
thesis.pdf
Size:
3.13 MB
Format:
Adobe Portable Document Format

This item appears in the following Collection(s)