NanoToolkit Blog

The Way to Keep in Touch.

Why Obfuscation does not mean Security

Why Obfuscation does not mean Security

Whether you chose to redact, mask or to anonymize a document or a form; you must be aware that does not mean the data is now fully protected from wandering eyes. For instance consider the following Data Report from a hospital.

Patient Visit Report:

First Name

Last Name

Gender

Zip Code

Date of Birth

Chief Complaint

Eric

Gardner

Male

89876

05/08/1978

Red Eyes

Tony

Foyle

Male

89877

09/07/1986

Sore Throat

Amanda

Sampson

Female

89879

11/23/1990

Nasal congestion

Dona

Garcia

Female

89872

02/17/1972

Chest Pain

Masked Patient Visit Report:

First Name

Last Name

Gender

Zip Code

Date of Birth

Chief Complaint

EXXX

xxxxxxxer

Male

89876

XX/XX/1978

Red Eyes

XXXy

Fxxxxxx

Male

89877

XX/XX/1986

Sore Throat

Axxxxa

Sxxxxxxxxx

Female

89879

XX/XX/1990

Nasal congestion

Doxxxxx

XXXXXia

Female

89872

XX/XX/1972

Chest Pain

Notice that our masked report actually completely conceals the patients’ identities. We could hand this data to a researcher with peace of mind and be quite comfortable that we have not violated any of the patient’s data privacy right under HIPAA. But if this same report was handed to an insurance company investigator he might very well be able to cross reference this data with existing data sets and fully identify a person.

Thus it is important to remember that obfuscating identity is only very effective in cases where the consumer of obfuscated data does not have access to other pieces of data that are in some way associated with the shared data.

It is kind of like Solving for a Multi Variable Algebra Problem in High school not unlike the following.

Example 1: X = Y+10, Z = Y+30, Z =40. , Y=-10, X = 0;

Example 2: K = 7 + J, J = J * 0; -> K = 7, J = 0

Example 3: A = B^2, A = 49 -> B = +/- 7

The point of the Algebra Expressions above is to demonstrate that sometimes it is possible to pinpoint or at least narrow down the possibilities of missing pieces of data by cross referencing the limited known data against other known pieces of data or against other black listed pieces of data.

In a patient’s case if I knew What Letter the First Name Started with and I knew the Gender, and I knew the Postal Code of the neighborhood where the person lived and I had their Year of Birth. I have effectively been able to eliminate many possibilities from my search population. Imagine how quickly somebody with access to marketing databases can come up with a list of candidates that have the best probability to fill the hole with the missing data.