From the course: Data Steward Foundations
Unlock the full course today
Join today to access over 23,100 courses taught by industry experts.
Data anonymization
From the course: Data Steward Foundations
Data anonymization
- [Instructor] One way that many organizations seek to protect themselves against accidental disclosures of personal information is to remove all identifying information from data sets before placing them in the cloud or with another service provider. De-identification is the process of moving through a data set and removing data that may be individually identifying. For example, you would certainly want to remove names, social security numbers, and other obvious identifiers. However, simple data de-identification is often insufficient to completely safeguard information. The reason for this is that you can often combine seemingly innocuous fields to uniquely identify an individual. A study done at Carnegie Mellon University analyzed three fields commonly retained in de-identified data sets, zip codes, dates of birth, and gender. Now, you wouldn't think any one of these fields used alone would allow you to identify someone. After all, a lot of people live in the same town as me and…
Practice while you learn with exercise files
Download the files the instructor uses to teach the course. Follow along and learn by watching, listening and practicing.