Texts from the Swedish Migration Agency

This study is part of the collection Parallel Texts from Public Agencies

Creator/Principal investigator(s):

Institute for Language and Folklore, Language Council of Sweden

Simon Dahlberg - Institute for Language and Folklore, Language Council of Sweden

Description:

Parallel texts downloaded with "w3m -dump" from an ubuntu shell, from the website of the Swedish Migration Agency.

Responsible department/unit:

Institute for Language and Folklore, Language Council of Sweden

Creator/Principal investigator(s):

Institute for Language and Folklore, Language Council of Sweden

Simon Dahlberg - Institute for Language and Folklore, Language Council of Sweden

Contributor(s):

Institute for Language and Folklore, Language Council of Sweden

Identifiers:

SND-ID: EXT 0329

URL: http://liljeholmen.sprakochfolkminnen.se/sprakresurser/version/20190124/myndighetsdata/texter

URL: http://liljeholmen.sprakochfolkminnen.se/sprakresurser/version/20190124/myndighetsdata/texter/Migrationsverket

Description:

Parallel texts downloaded with "w3m -dump" from an ubuntu shell, from the website of the Swedish Migration Agency.

Geographic spread:

Geographic location: Sweden

Language resources

Resource type

Corpus

Foreseen use

NLP application

Text corpus

  • Linguality

    Multilingual
  • Language

    • Swedish (swe)

      Texts: 33

    • Amharic (amh)

      Texts: 23

    • Arabic (ara)

      Texts: 33

    • Azerbaijani (aze)

      Texts: 27

    • Central Kurdish (ckb)

      Texts: 29

    • English (eng)

      Texts: 33

    • Persian (fas)

      Texts: 32

    • Croatian (hrv)

      Texts: 23

    • Armenian (hye)

      Texts: 24

    • Georgian (kat)

      Texts: 1

    • Northern Kurdish (kmr)

      Texts: 28

    • Mongolian (mon)

      Texts: 25

    • Dari (prs)

      Texts: 28

    • Pushto (pus)

      Texts: 28

    • Romany (rom)

      Arli (dialect)

      Texts: 24

    • Russian (rus)

      Texts: 33

    • Somali (som)

      Texts: 29

    • Spanish (spa)

      Texts: 31

    • Albanian (sqi)

      Texts: 27

    • Thai (tha)

      Texts: 4

    • Tigrinya (tir)

      Texts: 29

    • Turkish (tur)

      Texts: 2

    • Uzbek (uzb)

      Texts: 25

    • Chinese (zho)

      Texts: 3

    • French (fra)

      Texts: 31

    More..
  • Modality

    Written Language
  • Size

    Words: 29008 (swe)

    Texts: 33 (swe)

    Words: 438614 (TOT)

    Texts: 580 (TOT)

  • Original source

    migrationsverket
    www.migrationsverket.se

Parallel texts from the Swedish Migration Agency

Creator/Principal investigator(s):

Institute for Language and Folklore, Language Council of Sweden

Simon Dahlberg - Institute for Language and Folklore, Language Council of Sweden

Description:

The texts have been downloaded using the command 'w3m -dump' from an ubuntu shell, whereafter the resulting text files were stripped to contain only the interesting text (no menus and such).

Data format / data structure:

Text

Data collection:

Mode of collection: Self-administered writings and/or diaries: web-based

Time period(s) for data collection: 2019-01-01 — 2019-01-31

Data collector:

Published: 2020-03-30