When you choose a file format, remember that all formats can become outdated. If that happens, future software may not be able to read the files or show the information in the files correctly; in other words, it may not be possible to open a file and use the data saved in it. In order to minimise the risk that files become unreadable, you should choose a file format that is likely to be usable in the future, and the future may be as soon as in five or ten years' time.
File formats that stand a good chance of survival meet four criteria:
- They are commonly used
- They can be read by multiple software
- They are well-documented, meaning that it is possible to find a technical specification which details how information is stored in the format
- They are open/non-proprietary.
You can see examples of suitable file formats for a number of different data types in the Choosing a file format section.
You should choose a file format that is appropriate for your data collection and intended analysis method, but you may also want to consider whether that file format is suited for long-term data preservation. If it isn’t possible to choose a file format that is suitable for preservation from the beginning of the project, one option may be to use one file format for short-term processing of files (a work format), and then convert the finalised files that shall be preserved to a format that is suitable for long-term preservation. Remember to document which formats you have chosen and for which purposes in the project documentation. If you want advice on recommended file formats for long-term preservation, consult with the local research data support unit at your university.
Proprietary file formats
If there are two equal file formats, where one can be read by several software applications and the other can only be read by one, it is safer to choose the format which can be read by several applications. A proprietary file format means that there is an owner (usually a company) who decides how the format operates, and this affects in which applications the format can be used. Many proprietary file formats are also locked and can only be opened in software developed by the same owner.
What happens to the information in a file with a proprietary format if the owner goes bankrupt, or decides that it’s too expensive to keep supporting the format? In a worst case scenario, all information may be lost. (Read more about software in the Software section.)
Some proprietary file formats can be opened in other software applications, either directly or by importing/converting them into the application format. However, doing so risks that the file contents will look strange and that all data may not be converted properly. Similarly, in some applications you can choose from a list of file formats when you save files, and may choose formats developed by other companies, including some proprietary formats. In this case you should also check that all data remain correct in the saved format. The Microsoft Office applications (Word, Excel, PowerPoint etc.) are proprietary, and even if you can open a Word file (.docx) in another application, text and tables may look strange. This is because Microsoft owns the formatting and several of the fonts that are used, and they don’t want to share them with their competitors.