Texts from the Swedish Migration Agency

SND-ID: EXT 0329

This study is part of the collection Parallel Texts from Public Agencies

Description Data and documentation

Creator/Principal investigator(s)

Simon Dahlberg - Institute for Language and Folklore, Language Council of Sweden

Institute for Language and Folklore, Language Council of Sweden

Description

Parallel texts downloaded with "w3m -dump" from an ubuntu shell, from the website of the Swedish Migration Agency.
Protection and ethical review
Method

Sampling procedure

Multilingual parallel content.
Language resources

Resource type

Corpus

Foreseen use

NLP application

Text corpus

  • Linguality

    Multilingual
  • Language

    • Swedish (swe)

      Texts: 33

    • Amharic (amh)

      Texts: 23

    • Arabic (ara)

      Texts: 33

    • Azerbaijani (aze)

      Texts: 27

    • Central Kurdish (ckb)

      Texts: 29

    • English (eng)

      Texts: 33

    • Persian (fas)

      Texts: 32

    • Croatian (hrv)

      Texts: 23

    • Armenian (hye)

      Texts: 24

    • Georgian (kat)

      Texts: 1

    • Northern Kurdish (kmr)

      Texts: 28

    • Mongolian (mon)

      Texts: 25

    • Dari (prs)

      Texts: 28

    • Pushto (pus)

      Texts: 28

    • Romany (rom)

      Arli (dialect)

      Texts: 24

    • Russian (rus)

      Texts: 33

    • Somali (som)

      Texts: 29

    • Spanish (spa)

      Texts: 31

    • Albanian (sqi)

      Texts: 27

    • Thai (tha)

      Texts: 4

    • Tigrinya (tir)

      Texts: 29

    • Turkish (tur)

      Texts: 2

    • Uzbek (uzb)

      Texts: 25

    • Chinese (zho)

      Texts: 3

    • French (fra)

      Texts: 31

    More..
  • Modality

    Written Language
  • Size

    Words: 29008 (swe)

    Texts: 33 (swe)

    Words: 438614 (TOT)

    Texts: 580 (TOT)

  • Original source

    migrationsverket
    www.migrationsverket.se
Geographic coverage

Geographic spread

Geographic location: Sweden

Topic and keywords

Subject area

Legislation and legal systems, International politics and organisations, Conflict, security and peace, SOCIAL WELFARE POLICY AND SYSTEMS, SOCIETY AND CULTURE (CESSDA Topic Classification)
Social Sciences, Languages and Literature (The Swedish standard of fields of research 2011)

Publications
Dataset
Parallel texts from the Swedish Migration Agency

Description

The texts have been downloaded using the command 'w3m -dump' from an ubuntu shell, whereafter the resulting text files were stripped to contain only the interesting text (no menus and such).

Data format / data structure

Text

Creator/Principal investigator(s)

Simon Dahlberg - Institute for Language and Folklore, Language Council of Sweden

Institute for Language and Folklore, Language Council of Sweden

Data collection

  • Mode of collection: Self-administered writings and/or diaries: web-based
  • Time period(s) for data collection: 2019-01-01–2019-01-31
Published: 2020-03-30