To document research data means that you describe the research activities, data structures, and decisions you make during the research process. In short: you document everything that can be important if someone wants to analyse the data. This data analysis may come later in the same research project, by you or another researcher or reviewer, or in another project altogether, by someone who has no knowledge of the data material.

Documentation serves several purposes:

  • It can help an individual researcher remember the conditions during the data collection, or the reason why a they made a certain decision. This is helpful if someone asks you to explain why the data and results appear in a certain way, perhaps years after the results have been published.
  • It can help a research group communicate within the group and provide mutual understanding of the data.
  • Good documentation makes data useful for researchers who analyse them in future projects, regardless of whether these projects are in five, ten, or fifty years’ time.

Data management plan

Planning simplifies the documentation process. In a data management plan (DMP) you can describe, for instance, the folder structure in the project, how you name the files, and the changes between various file versions. If you follow the DMP, you don’t need to do any extra documentation about these things, you just specify that you follow the DMP. In the DMP you can also describe what other documentation you may need to help you to remember decisions, actions and processes, and to support your work routines.

If you occasionally deviate from the data management plan, you should comment that in the documentation. If you keep making many changes because the DMP doesn’t suit how the project develops, you should revise the DMP instead of documenting constant deviations from it.

What to document 

Data materials can be documented in different levels of detail, and the documentation can cover different parts of the project work. What is necessary to document for a certain project depends on the area of research. The basic principle is to document all information that someone else (or you) may need to analyse the data correctly. Bear in mind that this “someone else” may come from another scientific discipline.

It is important to describe:

  • how the data have been collected, created, or modelled
  • how the data files and file versions are organised
  • which changes have been made between different versions of data files
  • what the codes, abbreviations, variable names etc. that you use mean
  • which definitions you use for encoding and markup of the material
  • which legal, ethical, and other possible restrictions that limit how the data can be reused.

Which specific information that is necessary is, however, up to you as the expert on your data material to decide.

Data documentation needs to be made and updated continuously, so that you don’t forget any details. The best way to make sure that you register all relevant information about what you have done with the data, which decisions you have made, and which definitions you have used, is to write it down right away. Ideally, you will have a structured, continuously updated documentation, but it is better with an unstructured file which still contains all the information than no file at all. The worst data documentation is the one that doesn’t exist.

When the project is finished and the data have been archived and possibly made accessible and disseminated, you will prepare a final version of the documentation, which is intended for archives and other researchers. In this final version, you make sure that the documentation is complete and easy to understand. An archivist or staff in your local research support unit can assist with or give you feedback on your data documentation.

Documentation software

Depending on which tools you use for data processing and analysis, there are different kinds of support for documentation. If it is possible to document the process in the same program that you use for analyses, as you analyse the data, this may lower the threshold to documenting the work. Some analysis programs have integrated documentation functions, other tools have features which may not be intended for documentation, but can nevertheless be used, even if you need to do some adjustments to the output. Many analysis tools don’t have a function for documenting data. In that case, you need to keep the documentation in a separate file.

Examples of documentation capabilities

  • Built-in functions: SPSS, which can be used to analyse survey data, and Dedoose, which can be used for qualitative data analyses, contain functions for documenting variables.

  • Plug-in programs: The plug-in program Colectica for Excel adds functions for documenting variables from, for instance, surveys to Excel.

  • Use existing functions: Transana (for qualitative analyses of text, image, audio, and video files) and Kinovea (video analyses of human and other motion) contain commenting functions for documenting data. The comments can be exported, but you may need to compile the documentation yourself.
  • No functions: Many analysis tools lack features that can be used to document data. Write your documentation in another program.