Gold standard for English-Swedish Europarl data (GES)

SND-ID: EXT 0283

Description Data and documentation

Creator/Principal investigator(s)

Lars Ahrenberg - Linköping University, Department of Computer and Information Science

Maria Holmqvist - Linköping University, Department of Computer and Information Science

Description

Reference corpus for word linking, divided into training data and test data. The sentences come from the English and Swedish parts of Europarl.
Research principal, contributors, and funding
Protection and ethical review
Method and time period
Language resources

Resource type

Corpus

Foreseen use

NLP application

Text corpus

  • Linguality

    Bilingual
  • Language

    • (eng)

    • (swe)

      Sentences: 1164

    More..
  • Modality

    Written Language
  • Size

    Sentences: 1164

  • Annotation

    • Alignment

      Manual annotation

Geographic coverage
Publications

Maria Holmqvist and Lars Ahrenberg (2011). A Gold Standard for English-Swedish Word Alignment. In Proceedings of the 18th Nordic Conference on Computational Linguistics, Riga, Latvia, May 11-13, 2011.

If you have published anything based on these data, please notify us with a reference to your publication(s). If you are responsible for the catalogue entry, you can update the metadata/data description in DORIS.

Dataset
Gold standard for English-Swedish Europarl data (GES)

Description

Data are created from the English-Swedish part of the Europarl corpus. For each sentence pair in the selected subset, token correspondences are stated as pairs of integral token identifiers

Data format / data structure

Numeric

Text

Creator/Principal investigator(s)

Lars Ahrenberg - Linköping University, Department of Computer and Information Science

Maria Holmqvist - Linköping University, Department of Computer and Information Science

License

Creative Commons  Attribution 4.0 International (CC BY 4.0)