Annotating speaker stance in discourse: the Brexit Blog Corpus (BBC)

Creator/Principal investigator(s):

Andreas Kerren - Linnaeus University orcid

Carita Paradis - Lund University, Center for Language and Literature orcid

Description:

In this study, we explore to what extent language users agree about what kind of stances are expressed in natural language use or whether their interpretations diverge. In order to perform this task, a comprehensive cognitive-functional framework of ten stance categories was developed based on previous work on speaker stance in the literature. A corpus of opinionated texts, where speakers take stance and position themselves, was compiled, the Brexit Blog Corpus (BBC). An analytical interface for the annotations was set up and the data were annotated independently by two annotators. The annotation procedure, the annotation agreement and the co-occurrence of more than one stance category in the utterances are described and discussed. The careful, analytical annotation process has by and large returned satisfactory inter- and intra-annotation agreement scores, resulting in a gold standard corpus, the final version of the BBC.

Principal organisation:

Homepage:

Download data:

Brexit blog corpus - Excel Brexit blog corpus - text files

Creator/Principal investigator(s):

Andreas Kerren - Linnaeus University orcid

Carita Paradis - Lund University, Center for Language and Literature orcid

Identifiers:

SND-ID: SND 1037

Purpose:

The aim of this study is to explore the possibility of identifying speaker stance in discourse, provide an analytical resource for it and an evaluation of the level of agreement across speakers in the area of stance-taking in discourse.

Description:

In this study, we explore to what extent language users agree about what kind of stances are expressed in natural language use or whether their interpretations diverge. In order to perform this task, a comprehensive cognitive-functional framework of ten stance categories was developed based on previous work on speaker stance in the literature. A corpus of opinionated texts, where speakers take stance and position themselves, was compiled, the Brexit Blog Corpus (BBC). An analytical interface for the annotations was set up and the data were annotated independently by two annotators. The annotation procedure, the annotation agreement and the co-occurrence of more than one stance category in the utterances are described and discussed. The careful, analytical annotation process has by and large returned satisfactory inter- and intra-annotation agreement scores, resulting in a gold standard corpus, the final version of the BBC.

Language:

English

Time period(s) investigated:

2015-06-01 — 2016-05-31

Unit of analysis:

Funding:

Swedish Research Council —2012-5659

Contact person for questions about the data:

Andreas Kerren

Publications

Sort by name | Sort by year

Vasiliki Simaki, Carita Paradis, Maria Skeppstedt, Magnus Sahlgren, Kostiantyn Kucher, and Andreas Kerren. Annotating speaker stance in discourse: the Brexit Blog Corpus. In Corpus Linguistics and Linguistic Theory, 2017. De Gruyter, published electronically before print. https://doi.org/10.1515/cllt-2016-0060

If you have published anything based on these data, please notify us with a reference to your publication(s).

Version 1.0:

2017-10-13 doi:10.5878/002925

Download data:

Brexit blog corpus - Excel Brexit blog corpus - text files

Brexit Blog Corpus (BBC)

Citation:

Andreas Kerren, Carita Paradis. Linnaeus University, Department of Computer Science (2017). Brexit Blog Corpus (BBC). Swedish National Data Service. Version 1.0. https://doi.org/10.5878/002925

Description:

The BBC is a collection of texts from blog sources. The corpus texts are thematically related to the 2016 UK referendum concerning whether the UK should remain members of the European Union or not. The texts were extracted from the Internet from June to August 2015. With the Gavagai API (https://developer.gavagai.se), the texts were detected using seed words, such as Brexit, EU referendum, pro-Europe, europhiles, eurosceptics, United States of Europe, David Cameron, or Downing Street. The retrie

... Show more..

Data format / data structure:

Text

Data collection:

Time period(s) for data collection: 2015-06-01 — 2016-05-31

Source of the data: Research data

Variables:

8

Number of individuals/objects:

1682