Research material with personal data
- Fundamental principles
- Lawful bases for personal data processing in research
- Sensitive personal data
- Direct and indirect personal data
- Indirect identification
- Personal data in audio, image, and video files
- Storing personal data
Personal data are any type of information that directly or indirectly refers to a living, identified or identifiable natural person (a “data subject”). Typical examples are a person’s name, personal identification number, e-mail address, ethnic origin, health status, or political opinions. Some personal data are special category data, as they are categorized as sensitive, and thus in need of extra protection.
All processing of personal data is regulated in the General Data Protection Regulation (GDPR). Personal data processing can be everything from data collection, registration, storage, and processing, to sharing, dissemination, and deletion of personal data.
All processing of personal data shall be made according to the fundamental principles in Article 5 of the General Data Protection Regulation. The personal data shall, for instance, be processed in a lawful, fair, and transparent manner in relation to the data subject. The data may be collected only for specified, explicit, and legitimate purposes, and may not at a later time be processed further in a way that’s incompatible with these initial purposes. The controller (typically the HEI) shall be responsible for and able to demonstrate compliance with these principles (accountability).
- Lawfulness. To process the data in a lawful manner means that there must be a lawful basis for processing them.
- Fairness. To process the data in a fair manner means that the processing must be proportional. You may not process more data than is necessary for your purpose.
- Transparency. To process the data in a transparent manner means that you shall inform the data subjects of the processing in concise and plain language.
Contact the data protection officer if you want to know more about personal data processing in your HEI. You can read more about the fundamental principles on the Swedish Authority for Privacy Protection website.
Lawful bases for personal data processing in research
All processing of personal data must be based on one of the lawful bases for processing mentioned in Article 6 of the General Data Protection Regulation. For authorities such as HEIs and universities, it is almost exclusively the lawful basis of public interest in personal data processing for research purposes. As HEIs have a legal obligation to carry out research, the processing of personal data is necessary for the performance of a task carried out in the public interest, which is then the lawful basis for the processing.
Consent can be another lawful basis for personal data processing. However, authorities can rarely base their processing on the data subject’s consent. The reason is that there is significant inequality between the registered individual and the authority. The relation between an individual research subject and a research principal (typically an HEI) can also be unbalanced. In order for a consent to be a lawful basis for processing, there can be no situation of dependence between the research subject and the HEI. (A research subject is a person who is part of a research study. Different disciplines use different terminologies for study participants, for instance informants, respondents, or interviewees.)
Regardless of which is the appropriate lawful basis, the research subjects must always be given concise and clear information about the personal data processing involved, and about which rights they have. These rights include that the data subject shall be given information about when and how their personal data are processed, and they shall have control over their own data.
Note: Consent as a legal basis in the GDPR is not connected to the consent to participate in research in accordance with The Act concerning the Ethical Review of Research Involving Humans (2003:460, in Swedish), se Ethical Review.
Sensitive personal data
Sensitive personal data, or special category data, concern for instance:
- racial or ethnic origin
- political opinions
- religious or philosophical beliefs
- trade union membership
- data concerning a person’s health, sex life, or sexual orientation
- genetic data
- biometric data which uniquely identify a person.
These special category data are covered by special regulations in the GDPR. As a general rule, processing of special category data is prohibited. The processing needs to, other than having a lawful basis in accordance with Article 6, also meet one of the exceptions in Article 9 in order to be allowed. Two such exceptions are if the processing is necessary for reasons of substantial public interest, or if the processing is necessary for scientific or historical research purposes.
In those cases, there is a further requirement for ethical approval and appropriate security measures. One such measure is pseudonymization (you can read more under the next heading).
Direct and indirect personal data
Direct personal data are data that clearly identify a natural person, such as name, photo, or personal identification number. Indirect personal data are data that in combination can be used to uniquely identify a person. Examples are place of residence, membership in a particular organization, IP address, vehicle registration plate, information about income, or health-related information. In a small town where one only person has a certain profession, information about place of residence and profession can be sufficient to identify a participant in a research study.
One way of hiding a person’s identity is to pseudonymize the data. Pseudonymization means that you replace data that can directly identify a person, such as name or personal identification number, with codes. There is a code key, which is safely locked away and only accessible to certain people. As long as there is a code key, it is still possible to connect the data to a natural person. Thus, pseudonymized data are still personal data and within the definitions of GDPR. Note that personal data can be pseudonymized regardless of how they appear in the material. It doesn’t matter if the data are presented in neat data columns (such as in a database), if they are collected in an information entry for each individual person (such as in information about who appears in a photograph), or if they are scattered throughout the material (such as names in an unstructured interview); they can still be pseudonymized.
If you destroy the code key or erase everything that directly or indirectly identifies a natural person, so that it is no longer possible to connect the information to an individual, you have de-identified, or anonymized the material. In that case, the data are no longer considered to be personal data, and GDPR does not apply.
Data that are going to be made accessible have to be checked for any information that may pose a disclosure risk for subjects in the study. Re-identification is when someone combines a number of indirect identifiers in an anonymized material, such as profession, municipality, and age, to uncover the individual behind the data. You can minimize the risk for re-identification by re-coding variables so that age and income are entered in larger intervals, and geographical location is entered in a wider area. Which indirect identifiers that need to be re-coded varies between projects, as the risk for re-identification depends on which types of data have been collected in the project.
Personal data in audio, image, and video files
Before audio, image, and video files are made accessible for further research, it is important to consider GDPR and other legislation. A person’s face can easily be used to identify that person, which means that it is personal data. A voice is personal data if it can identify a person by voice alone, meaning that as long as it hasn’t been distorted or changed due to age or other factors. It is often impossible to anonymize these types of data without labour-intense efforts and technical tools, and doing so may make the data more or less useless for further research. (For example, a distorted voice cannot be used to study dialects.) These data should still be pseudonymized as far as possible, by for instance replacing names of people, dates, locations, and other information with ID numbers, serial numbers, pseudonyms, or more general terms. Consult the data protection officer if you collect audio, image, or video data that may contain personal data.
These are some examples of audio, image, and video material that may contain personal data:
- x-ray images
- recordings of in-depth interviews or focus group interviews
- video recordings of dance recitals (how a person moves may be sufficient to identify them)
- audio recordings for dialect studies
- classroom video recordings.
Storing personal data
Just like other types of research data, personal data need to be stored in a secure manner, depending on their information security classification level. GDPR and other legislation stipulate some additional requirements.
Personal data should never be stored on a cloud service, as that will make it more difficult to guarantee that the data are securely stored, and that the project complies with legal requirements. If personal data are stored on a cloud service, there is a risk that they are accidentally disclosed or transferred to a third country (outside of the EU/EEA). A better storage solution is, for instance, a secure, password-protected server on a known location.
Code keys and personal data shall be kept apart and protected by a password-protected computer or server, or in a safety cabinet.
Consult with the data protection officer in your HEI to determine the best storage solution for material with personal data in your place of research.