Gold standard for English-Swedish Europarl data (GES)

Creator/Principal investigator(s)

Lars Ahrenberg - Linköping University, Department of Computer and Information Science

Maria Holmqvist - Linköping University, Department of Computer and Information Science

Description

Reference corpus for word linking, divided into training data and test data. The sentences come from the English and Swedish parts of Europarl.

Responsible department/unit

Linköping University, Department of Computer and Information Science

Creator/Principal investigator(s)

Lars Ahrenberg - Linköping University, Department of Computer and Information Science

Maria Holmqvist - Linköping University, Department of Computer and Information Science

Identifiers

SND-ID: EXT 0283

Description

Reference corpus for word linking, divided into training data and test data. The sentences come from the English and Swedish parts of Europarl.

Language resources

Resource type

Corpus

Foreseen use

NLP application

Text corpus

  • Linguality

    Bilingual
  • Language

    • English (eng)

    • Swedish (swe)

      Sentences: 1164

    More..
  • Modality

    Written Language
  • Size

    Sentences: 1164

  • Annotation

    • Alignment

      Manual annotation

Contact for questions about the data

Lars Ahrenberg

Publications

Maria Holmqvist and Lars Ahrenberg (2011). A Gold Standard for English-Swedish Word Alignment. In Proceedings of the 18th Nordic Conference on Computational Linguistics, Riga, Latvia, May 11-13, 2011.

If you have published anything based on these data, please notify us with a reference to your publication(s).


If you are responsible for the catalogue entry, you can update the metadata/data description in DORIS.

License

Creative Commons License

Gold standard for English-Swedish Europarl data (GES)

Creator/Principal investigator(s)

Lars Ahrenberg - Linköping University, Department of Computer and Information Science

Maria Holmqvist - Linköping University, Department of Computer and Information Science

Description

Data are created from the English-Swedish part of the Europarl corpus. For each sentence pair in the selected subset, token correspondences are stated as pairs of integral token identifiers

Data format / data structure

Numeric

Text

Published: 2019-05-15