- The Monopoly Report
- Posts
- Personal Data & Schrödinger's big red dog
Personal Data & Schrödinger's big red dog
Expanding definitions of personal data ruin de-identification
I’m Alan Chapell. I’ve been working at the intersection of privacy, competition, advertising and music for decades and I’m now proudly mixing metaphors at The Monopoly Report. If you have a tip to share in confidence, ping me at my last name at Gmail or find me on Twitter or Bluesky.
Our latest Monopoly Report podcast is out with Ana Milicevic of Sparrow Advisors. We discuss how content monetization lost its way, the risks of oversimplification of privacy and the end of an Internet without borders. Ana also has a few choice words for the DMV.

Meet err… “Gifford”, the big, red and ever expanding definition of personal data (pls don’t sue)
What is the definition of personal data?
“We don’t touch PII.”
It’s amazing to me how many in the ads space continue to lead with the above when asked about privacy. It’s one of my favorite McPrivacy-isms.
There used to be a time when ad tech companies could plausibly state that they weren’t impacted by most privacy laws - as very few of those laws defined personal data broadly. For that reason, if you “only” processed pseudonymous data like a cookie ID, a MAID, or an IP address, you didn’t have much to worry about. Even in the EU, if you truncated the IP address, your data set may not have been deemed personal data under EU data protection law.
What in the world happened to my PII?
Expanding the definition of personal data was a gradual process that took years.
By way of analogy, you may remember reading children’s books about a big red dog - who was born as a tiniest pup in the litter, but grew to be as large as a house. That’s what happened to the definition of personal data. It grew and grew over time. And now, like anything that’s gotten “too big”, it’s starting to break the privacy rules.
Here’s a quick history of how all this went down:
Blame Canada. Back in 2011, the Privacy Commissioner of Canada came out with guidance on behavioral targeting. The challenge for Canada was - how do they regulate behavioral advertising under their comprehensive privacy law (PIPEDA) if that law “only” applied to “PII”? So the guidance from Canada was that pseudonymous data processed in connection with behavioral targeting WAS personal data - but that the profiles created COULD still be subject to an opt-out standard (i.e., provided that the profiles were non-sensitive).
In my view, we would have been much better off if that same notice and opt-out standard was adopted everywhere for non-sensitive profiling (as apposed to all these consent and enhanced notice prompts literally popping up everywhere). Canada implicitly focused on intrusiveness of the profile, scale and data minimization - things that don’t get nearly enough attention by regulators or the ad industry. That said, ascertaining what profiles are sensitive remained a challenge (e.g., Google got dinged for creating “sleep apnea” segments.) Anyway, the rule in Canada from that time forward was that pseudonymous data is personal data.
FTC and COPPA. Then in 2012, the FTC came out with updated rules for the Children’s Online Privacy Protection Act (COPPA) that (among other things) indicated that pseudonymous data (e.g, Ad ID, IP address) were personal data when processed using certain targeting techniques in connection with children’s sites. And a few years later, the FTC’s Jessica Rich told a stunned audience at the NAI summit that IP addresses were always personal data.
EU and GDPR. In 2018, the GDPR further expanded the definition of personal data as a matter of EU data protection law. More on that in a minute.
U.S. State Laws. The California Consumer Privacy Act (CCPA) even further expanded the definition of personal data. In case there was any ambiguity regarding pseudonymous data, the CCPA specifically calls out “browsing history, search history, and information regarding a consumer’s interaction with an internet website application, or advertisement” as personal information. It also calls out “inferences” as personal data when attached to a profile. Many other states followed suit with similarly broad definitions.
The definition of personal data has expanded so much that you’d be hard pressed to find data collected in an ads context as anything but personal data. One of the ironies about the move to ID-less is that - somewhere along the chain - many of them still process personal data. I don’t mean that as an indictment of ID less solutions. Rather, it’s its a critique of a ruleset that doesn’t necessarily create incentives to move towards those types of solutions.

A very scientific chart illustrating the expanding definition of personal data in select markets
Why else does the definition of personal data matter?
There’s a long-standing privacy concept known as “de-identification”. De-identification is a process that removes or alters personally identifiable information from a data set (e.g., protected health information under HIPAA) to make it no longer identifiable. If you’re familiar with the health privacy law the Health Insurance Portability and Accountability Act (HIPAA), you’re probably familiar with de-identification. Without going into all the details, the idea is that if you remove the ability to use a data set to identify someone, you reduce risks.
De-identification isn’t perfect. As we’ve learned, it’s really hard to remove ANY chance of re-identification of a data set collected in a digital setting. But de-identification is a standard that has been around for a long time - and has enabled the (relatively) privacy safe use of large health data sets for all kinds of public good. And de-identification allowed the business community to effectively remove certain data sets from the privacy rules (i.e., because the data set was relatively low-risk). It was a win / win.
But now that personal data is defined so broadly by U.S. state laws, it has pretty much swallowed de-identification as a concept. As a result, it’s technically possible for a data set to be BOTH “personal data” (i.e., subject to state law rules) and “de-identified” (i.e., not subject to state privacy law rules).
In other words, policymakers have created a “Schrödinger's dog” issue where the definition of personal data has expanded to the point where its starting to break things.
Case in point: How should you be thinking about a profile created via HIPAA de-identified data once appended onto a cookie UID? Is the data set either:
Exempt from CCPA because the data set is regulated under HIPAA which pre-empts CCPA,
OR
A potentially sensitive health inference because it’s appended to a pseudonymous UID?
It’s a great question to run past your favorite privacy lawyer at the next industry event. Don’t be shy - we LOVE giving out free advice.
Europe is grappling with the big red dog too!
It isn’t just the U.S. states. Europe is struggling with this concept as well.
(Editors note: My friend Peter Craddock has some particularly interesting takes on the GDPR definition of personal data - we’ll have Peter on a TMR podcast soon.)
You may remember when the Belgian data privacy regulator took issue with the IAB EU TCF - saying that an the TC string was personal data. And more recently, there’s a case that is making it’s way to the EU Court of Justice which might help provide clarity.
Overall, the EU definition of personal data data may be narrowing slightly (see chart above). That said, there remains a tension in EU data protection regarding the definition of personal data. The tension is partly driven by the fact that EU data protection regulators are reluctant to concede that any data set falls outside the definition of personal data.
If the data set falls outside the scope of GDPR, how can they possibly demand that you get explicit consent for your ads related processing of that data set?
OK, sarcasm off, I can understand why regulators might be reluctant to remove any data flows from their remit. But at some point, the desire to call everything personal data might be creating disincentives for the market to adopt privacy preserving techniques.
And don’t get me started on how that approach is faring as the market focuses on AI.
Reply