Prepare the data for deposit

The objective for depositing research data in a data repository is to enable users to review the research results, and to reuse the data in future studies. In order to make this possible, data need to be organised and presented in a self-explanatory way. A certified repository (like SND) employs a quality review process, which means that deposited data have to meet certain minimum requirements before they are published. Therefore it is important that you as a researcher take an active part in the review process. Data that do not meet the minimum requirements will be rejected and not published.

There are data repositories which don’t employ a review process (i.e. non-certified). But bear in mind that some journals or research funders demand that data are deposited in a certified repository.

Here are some useful recommendations:  

Data files 

  • At present, data shared through the SND research data catalogue cannot contain personal data (with the exception of studies from University of Gothenburg). This includes data with indirect identifiers which may, in combination with other information, re-identify an individual. Data deposited in another registry can still be described in the SND catalogue.
  • Data files should be saved in a standard file format, which is also open and freely accessible (see the SND guides to good research data management, and the web page on file formats).
  • File and folder names should be meaningful and consistent. File names with sequential numbers or code should be explained, for instance in a .txt file.
  • Datasets that consist of several files have to be structured in a way that can be understood by other users. The structure and file relations can be described in a .txt file.
  • Today, all data are made accessible in a collective .zip file. For large datasets, file size could cause download problems. You may want to consider whether delimited data subsets can or should be published independently. Studies which contain several such subsets can then be connected by SND.
  • Files should be cleared of irrelevant information (e.g. unused dummy variables which have no importance for the research results). 
  • If possible, include relevant metadata in the data files (e.g. variable names and codes for variable values for tabular data, or information about coding standard or the meaning of different formatting etc. for textual data).
  • Make sure that all data files are complete and contain relevant information. 

Metadata 

Metadata is structured information used to describe and categorise digital information. In the SND research data catalogue, metadata make it easier for users to search, find, and understand various research materials.

  • When you use the SND data description form, metadata will automatically be linked to the data files.
  • The more metadata you use to describe the data files, the easier it is for another user to understand the file contents. Mandatory fields signify the minimum level of information that SND, as a certified repository, requires. Additional, non-required information can be invaluable to others who are interested in your study’s research data.
     
  • Remember to describe the metadata as thoroughly as possible. If the project data concern field work in Colombia and Peru, enter Colombia and Peru in the “Geographic coverage” tab, rather than just South America.
  • Link to articles or other publications which describe or are based on the study data. 

Documentation 

Relevant documentation must be appended to the data description in order to enable future researchers to understand and reuse the data. Give careful thought to what kind of documentation is needed to improve the understanding of the data.

It can be: 

  • Variable lists with explanations of the contents in each variable
  • Questionnaires or surveys 
  • Interview forms, including interview instructions  
  • Code lists and code books  
  • An inventory of the data material 
  • Links to articles or other publications 
  • Method descriptions or technical reports
  • Information about processed data, how they have been processed etc.
  • Syntax for derived variables  
  • End of project reports 
  • Instructions for how to manage the data in custom-developed software
  • Fieldwork diaries or log books.

SND has no specific requirements for the documentation. The nature of documentation, and what the documentation is called, varies across research areas and within disciplines. In SND, we care more about what the documentation contains. 

If there is no existing, completed documentation, relevant information can be collected in a ReadMe file (see an example developed by Cornell).

If you are unsure of what documentation is needed, feel free to contact SND or your local data support unit for support.