Skip to Main Content

Data Services

Data Management Services assists Brandon researchers with the organization, management, and curation of research data to enhance its preservation and access now and into the future

What is Sensitive Data?


Not all data can or need to be open. In some instances data must remain in a secure, protected location and must be coded in a manner that ensures the data remains secure.  In these instances team members access to the data must take certain forms and any sharing of the data after the study can only take place if certain procedures are followed. 

Sensitive data / propietary data include: 

  • Personal data which contain identifiers such as name, age, gender, physical traits, genetic information
  • Confidential data such as trade secrets, financial information, intellectual property rights 
  • Biological data such as location data of endangered species 

Before starting a DMP, make sure you understand all Institutional  /  Government / Funder Research Policies or Legal Agreements related to research data and data security;  important privacy legislation such as FIPPA; and information pertaining to managing Indigenous data (C.A.R.E. Principles).

Types of Identifying Information 


Identifying information is classified as one of two types: direct and indirect.


Direct identifiers

These data point directly to an individual and are typically removed from data sets before sharing with the public.  These may include:

  • name
  • initials
  • mailing address
  • phone number
  • email address
  • unique identifying numbers, like Social Security numbers or driver's license numbers
  • vehicle identifiers
  • medical device identifiers
  • web or IP addresses
  • biometric data
  • photographs of the person
  • audio recordings
  • names of relatives
  • dates specific to individual, like date of birth, marriage, etc.

Indirect identifiers

These may seem harmless on their own, but can point to an individual when combined with other data. It has been recommended (see BMJ article below) that datasets containing three or more indirect identifiers should be reviewed by an independent researcher or ethics committee to evaluate identification risk. Any indirect information not needed for the analysis should be removed. It may be reasonable to supply some of these types of data in aggregated form (like ranges of annual incomes instead of exact numbers). Indirect identifiers may include:

  • place of medical treatment or doctor's name
  • gender
  • rare disease or treatment
  • sensitive data like illicit drug use or other "risky behaviors"
  • place of birth
  • socioeconomic data, like workplace, occupation, annual income, education, etc
  • general geographic indicators, like postal code of residence
  • household and family composition
  • ethnicity
  • birth year or age
  • verbatim responses or transcripts

Further reading:

Handling Sensitive Data


Portage has released a series of documents (October 2020) as part of a toolkit for researchers working with sensitive data in the Canadian research context.  These provide Canadian researchers with important information around how to manage sensitive data:

In addition, There is a number international guides that may be useful including:

Further Reading:

Adapted from: 

McGill Libraries, research Data Management.  “Ethics and Compliance”.  Accessed Feb24, 2021, https://libraryguides.mcgill.ca/c.php?g=718144&p=5127408#s-lg-box-16205010

Modifying Sensitive Data for Public Release


Sensitive data that contain potentially identifying information -- whether it be human subject data or other types of sensitive data -- will likely need to be modified prior to sharing these data with the public. It is important that these modifications are made in order to protect participant confidentiality, the location of endangered wildlife, or for other relevant reasons. However, these modifications may affect the data to the point where reproducibility or additional subsequent research by others is no longer possible. You might consider retaining multiple versions of the data: one that is suitable for public release, and one that is suitable for further research but that is available on a highly restricted basis.
 

Further Reading