Events

Hands-on Data Anonymisation 2023

In this 2 x 3-hours workshop, we will cover the concepts of anonymisation and pseudonymisation and apply them in practice to various types of quantitative and qualitative research data. Come and learn by doing!
Hands-on Data Anonymisation

Description

Hands-on personal data anonymisation and pseudonymisation.

The goals for this 2-days-workshop are practical: to have people actually minimise/pseudonymise/anonymise personal data in many of its forms and also use modern techniques for working with personal / sensitive data. There will be a conceptual introduction on day 1, other days will cover tools for (pseudo)anonymising personal data.

Who can participate?

Anyone who works with personal data in all its forms (background variables from questionnaires, medical images, health data, geospatial location data, speech, videos, pictures, etc…).

Learning Outcomes

  • Understanding the basic concepts and limitations of anonymisation and pseudonymisation 
  • Automating anonymisation for tabular data using Amnesia
  • Anonymisation for complex datasets: Faces in pictures and videos, speech, geospatial data, medical data.
  • Anonymisation in qualitative research (interviews, text).
  • Advanced techniques for working with sensitive data when anonymization cannot be achieved (data synthesis, federated learning and differential privacy)

Format

A two days three-hours webinar. 1 ECTS credit is available for those students who are willing to do some extra homework.

Schedule and location

The training will be held online via Zoom on April 18th and April 20th, at 12:00–15.00 Eastern European Time (EET).

The structure of the workshop below is a draft. The goal is to adapt to the actual needs of the majority of the participants in the room. A third optional day might be added if specific data types cannot be covered in two days.

Day 1
12:00 - 12:10: Intro
12:10 - 12:25: Hands-on exercise #1: tabular data and spreadsheets programs 
12:25 - 12:50: Basics of data anonymisation pt1
13:00 - 13:30: Learning k-anonymity with Amnesia tool (demo and exercise #2)
13:30 - 14:00: Basics of data anonymisation pt2
14:10 - 14:30: Hands-on exercise #3: being a data peer reviewer
14:30 - 15:00: Questions and wrap-up

Day 2
12:00 - 12:10: Intro + recap from day 1
12:10 - 12:50: Working with audio/visual/text material
13:00 - 13:30: Hands-on exercise #4: anonymisation of a transcribed interview
13:30 - 13:50: Overview of more advanced data types (depending on the interest of the audience): medical images, geospatial data
14:00 - 14:30: When anonymisation is not possible and data is sensitive: secure data analysis workflows, data synthesis (this can be made longer depending on the interest of the audience)
14:30 - 15:00: Questions, future directions, and various unsolved issues between data protection, open science, and research integrity. 
 

Not covered unless there is interest: visualising personal data, sharing personal data, making personal data FAIR through data minimisation and data protection, federated approaches.

Requested: using AI safely for minimising (and processing) personal data; example of local "GPT" Large Language Models.

Instructor(s)

Dr. Enrico Glerean, Data Agent, Staff Scientist, School of Science, Aalto University

Aalto RDM & Open Science Training | YouTube | Privacy Notice

  • Published:
  • Updated: