Gold standard for English-Swedish Europarl data (GES)

Creator/Principal investigator(s):

Lars Ahrenberg - Linköping University, Department of Computer and Information Science

Maria Holmqvist - Linköping University, Department of Computer and Information Science

Description:

Reference corpus for word linking, divided into training data and test data. The sentences come from the English and Swedish parts of Europarl.

Creator/Principal investigator(s):

Lars Ahrenberg - Linköping University, Department of Computer and Information Science

Maria Holmqvist - Linköping University, Department of Computer and Information Science

Identifiers:

SND-ID: EXT 0283

Description:

Reference corpus for word linking, divided into training data and test data. The sentences come from the English and Swedish parts of Europarl.

Language resources

Resource type

Corpus

Foreseen use

NLP application

Text corpus

  • Linguality

    Bilingual
  • Language

    • English (eng)

    • Swedish (swe)

      Sentences: 1164

    More..
  • Modality

    Written Language
  • Size

    Sentences: 1164

  • Annotation

    • Alignment

      Manual annotation

Contact person for questions about the data:

Lars Ahrenberg

Publications

Sort by name | Sort by year

Maria Holmqvist and Lars Ahrenberg (2011). A Gold Standard for English-Swedish Word Alignment. In Proceedings of the 18th Nordic Conference on Computational Linguistics, Riga, Latvia, May 11-13, 2011.

If you have published anything based on these data, please notify us with a reference to your publication(s).

License:

Creative Commons License

Gold standard for English-Swedish Europarl data (GES)

Description:

Data are created from the English-Swedish part of the Europarl corpus. For each sentence pair in the selected subset, token correspondences are stated as pairs of integral token identifiers

Data format / data structure:

Numeric

Text