Facebook Pixel Code

One of the biggest worries for businesses using AI, and their customers who are the target of those AI services, is that the data is not protected. More precisely, there is a fear, understandable but ultimately unfounded, that the data being used by the AI agent is used to “expose” the sources of that data, i.e., the business and the customers. 

As it turns out, these fears will dissipate once you learn about the process of de-identification, which is becoming increasingly popular by AI service providers, and on the way to becoming more than common, and eventually, probably and hopefully, required. 

However, the process of de-identifying data has yet to be perfected, and so there a few caveats about this costly process that ought to be considered, which we will cover later in the article.

What is De-Identification?

The de-identifying process involves the algorithmic stripping of the identifying parts of a data set, such as the names, addresses, etc., from the desirable information. 

What is desirable depends on the task at hand for the AI agent. For example, the gender of a person could be important for certain marketing efforts, while irrelevant to certain healthcare-related tasks. 

De-identifying algorithms can be programmed to preserve certain parameters while keeping others, keeping only what is relevant to the overall goal. 

This ought to be a relief for those who hold reservations about the fact that, often without your permission, your (typically publicly posted and available) information available on social media and other open internet platforms. 

Why is it Helpful?

Many times, data is gathered by a company, and then sold to another for the purposes of marketing, or other customer outreach efforts. Or, a company outsources an AI service to a company that uses the service’s insights. 

De-identification ensures that none of these companies will exploit this data beyond the stated use. 

For example, an AI service called persona modeling gathers scores of data about and from customers of any given business, organization, or institution. Sentiments are gathered from this data, which includes self-authored posts like tweets, and cognitive insights are created from them. These “cognitive insights” function more or less like predictions about what your customers will most eagerly respond to in, say, a digital ad you put up on social media or an email your send to them. 

These insights, sometimes delivered in the form of “persona models,” which are fake customers profiles that represent the typical customers you serve, are free of any actual, identifying information about your customers, containing only information relating to likes, dislikes, beliefs, etc. 

This is important because neither party involved can readily exploit the data offered by the AI agent to, say, reach out to a specific customer and pursue them based on their social media posting history. Or, perhaps, sell that specific customer’s data to another company or advertiser to make targeting easier. 

Generally, this is what de-identification protects customers, and the businesses who have their precious data, from. However, de-identification as both an art and science is far from being perfected, which we will cover in the next section. 

Re-Identification is the Enemy of De-Identification

In a recent article published by Stanford’s Human-Centered Artificial Intelligence, Stanford professor Nigam Shah was quoted as saying that “de-identification is not anonymization.” 

This is because that anyone in the possession of some sets of de-identified data can re-identify it by combining those data sets with other sources, making it easy to match the supposedly safe data with the people it was sourced from. 

In short, de-identified data, with some effort, can be exploited by anyone in possession of it to gain the real-life information of the people involved. 

This is especially important in the field of healthcare, which is what the Stanford article is about. Keeping sensitive information such as Social Security numbers and medical histories private is very important to the people being data-mined, so there is a real wariness that must be given to de-identification processes involving such information. 

Caution, then, must be exercised by those using de-identifying processes. 


De-identification, though not perfect, is a useful process for increasing data privacy in the world of AI. Depending on the parameters of the task, a de-identification algorithm will strip data of any sensitive information within a data set. Businesses or organizations who own the data, then, can be relieved in knowing that trustworthy parties who provide AI services will not be selling or exploiting their data for gain.