ACROBAT - a multi-stain breast cancer histological whole-slide-image data set from routine diagnostics for computational pathology

SND-ID: 2022-190

Alternative title

ACROBAT

Creator/Principal investigator(s)

Mattias Rantalainen - Karolinska Institutet, Department of Medical Epidemiology and Biostatistics orcid

Johan Hartman - Karolinska Institutet, Department of Oncology-Pathology orcid

Description

The ACROBAT data set consists of 4,212 whole slide images (WSIs) from 1,153 female primary breast cancer patients. The WSIs in the data set are available at 10X magnification and show tissue sections from breast cancer resection specimens stained with hematoxylin and eosin (H&E) or immunohistochemistry (IHC). For each patient, one WSI of H&E stained tissue and at least one one, and up to four, WSIs of corresponding tissue stained with the routine diagnostic stains ER, PGR, HER2 and KI67 are available. The data set was acquired as part of the CHIME study (chimestudy.se) and its primary purpose was to facilitate the ACROBAT WSI registration challenge (acrobat.grand-challenge.org). The histopathology slides originate from routine diagnostic pathology workflows and were digitised for research purposes at Karolinska Institutet (Stockholm, Sweden). The image acquisition process resembles the routine digital pathology image digitisation workflow, using three different Hamamatsu WSI scanners, specifically one NanoZoomer S360 and two NanoZoomer XR. The WSIs in this data set are accompanied by a data ta

... Show more..

Language

English

Research principal, contributors, and funding

Research principal

Karolinska Institutet

Responsible department/unit

Department of Medical Epidemiology and Biostatistics

Contributor(s)

Leena Latonen - University of Eastern Finland, Institute of Biomedicine orcid

Constance Boissin - Karolinska Institutet, Department of Medical Epidemiology and Biostatistics orcid

Yanbo Feng - Karolinska Institutet, Department of Medical Epidemiology and Biostatistics orcid

Philippe Weitz - Karolinska Institutet, Department of Medical Epidemiology and Biostatistics orcid

Dusan Rasic - Zealand University Hospital, Department of Surgical Pathology orcid

... Show more..

Funding 1

  • Funding agency: Swedish Research Council rorId

Funding 2

  • Funding agency: ERA PerMed
  • Funding agency's reference number: ERAPERMED2019-224-ABCAP
  • Project name on the application: Advancing Breast Cancer histopathology towards AI-based Personalised medicine

Funding 3

  • Funding agency: Swedish Cancer Society rorId
Protection and ethical review

Data contains personal data

No

Ethics Review

Stockholm - Ref. 2017/2106-31

Amendment: 2018/1462-32

Method and time period

Unit of analysis

Population

Anonymised female primary breast cancer patients from the Stockholm region

Study design

Observational study

Sampling procedure

A subset of the whole-slide-images that were generated in terms of the CHIME study were randomly selected for the ACROBAT data set. Training and validation data are a random subset, whereas the test data was generated using stratified sampling, taking into account biomarker statuses and the scanner model that was used to generate the respective whole-slide-image.

Time period(s) investigated

2012 – 2018

Geographic coverage

Geographic spread

Geographic location: Stockholm County

Publications

Weitz, P. et al., (2022). ACROBAT -- a multi-stain breast cancer histological whole-slide-image data set from routine diagnostics for computational pathology. doi:10.48550/ARXIV.2211.13621
DOI: https://doi.org/10.48550/ARXIV.2211.13621

If you have published anything based on these data, please notify us with a reference to your publication(s). If you are responsible for the catalogue entry, you can update the metadata/data description in DORIS.

Dataset
ACROBAT - a multi-stain breast cancer histological whole-slide-image data set from routine diagnostics for computational pathology

Description

The data set consists of three subsets, the training, validation and test set, based on the ACROBAT WSI registration challenge. There are 750 cases in the training set, for each of which one H&E WSI and one to four IHC WSIs are available, with 3406 WSIs in total. The validation set consists of 100 cases with 200 WSIs in total and the test set of 303 cases with 606 WSIs in total. Both for the validation and test set, one H&E WSI as well as one randomly selected IHC WSI is available.

WSIs were a

... Show more..

Version 1

Citation

Mattias Rantalainen, Johan Hartman. Karolinska Institutet (2023). ACROBAT - a multi-stain breast cancer histological whole-slide-image data set from routine diagnostics for computational pathology . Swedish National Data Service. Version 1. https://doi.org/10.48723/w728-p041

Download citation

Data format / data structure

Still image

Creator/Principal investigator(s)

Mattias Rantalainen - Karolinska Institutet, Department of Medical Epidemiology and Biostatistics orcid

Johan Hartman - Karolinska Institutet, Department of Oncology-Pathology orcid

Data collection

  • Description of the mode of collection: Archived routine clinical diagnostic tissue slides with tissue material were scanned using whole-slide-image scanners at Karolinska Institutet.
  • Time period(s) for data collection: 2012–2018
  • Data collector: Karolinska Institutet
  • Instrument: NanoZoomer S360 (Technical instrument(s)) - Hamamatsu whole-slide-imaging scanner
  • Instrument: NanoZoomer XR (Technical instrument(s)) - Hamamatsu whole-slide-imaging scanner.

Number of individuals/objects

1153

License

Creative Commons  Attribution 4.0 International (CC BY 4.0)
Published: 2023-01-02
Last updated: 2023-04-27