Part of the problem is that there are still no good criteria available to define anonymity. Concepts like differential privacy are a step in the right direction but they still provide room for error, and in many cases they are either too restrictive (transformed data is not useful anymore) or too lax (transformed data is useful but can be easily re-identified).
Society is a tapestry of bullshit and low-level swindling is generally tolerated or quickly forgotten about. Thus, there's nothing to prod the unprincipled in charge to do the right thing. As long as something seems to be good(anonymized, in this cage), and problems can be hidden behind the corporate veil long enough, the unwritten rule is to half-ass security solutions because, well, security is boring and there's other things to devote company time and resources to(that will advance upper management).
Security measures, especially those that protect the users, don't make money. At best, they're insurance against the fallout that might occur when it's revealed that your company has been silently screwing people over. Like most human beings, businesses often put off serious consideration of the future in order to enjoy quick and immediate gain.
I wouldn't put it past most companies to screw up an approach like differential privacy. Not enough people actually care that much.
This is why the government has to make regulations with teeth in this space (of course, the government could be the "unprincipled in charge" you referred to).
Lots of companies are content to stop at "our data can't be linked back to a person's identity", which doesn't prevent building a uniquely-identifying user profile. (e.g. via browser fingerprinting, plus enough metadata to associate a user's computer and phone accounts.) Even if they do better than that, its typically "our data is not uniquely identifying in isolation", which still isn't enough. If your differential privacy model says that these four pieces of data have a specificity of 10,000 possible individuals, that's a good start. But if someone with an individual's PII and three of those keys comes looking, they can still narrow down information about the fourth value from your aggregates.
And even if no one screws up, what happens when someone queries a half dozen differential datasets for different subsets of a uniquely identifying key? It's something like the file-drawer problem, where one researcher hiding bad data is malicious, but a dozen studies failing to coordinate produces the same result innocently. If outright failures to anonymize become rarer, cross-dataset approaches become more rewarding.
https://fpf.org/wp-content/uploads/2017/06/FPF_Visual-Guide-...
You keep data because data is economically valuable, but even when you care enough to implement some techniques that depends on the invariants you still fail to achieve something the better because of scale and because you don't want to refine the techniques. This also means that somehow somebody may have a technique that, provided enough pieces of data, can reverse you transformation.
If companies were required to aggregate information in this way and throw away their logs, perhaps leaks would be much less risky for their users.
Today this might seem far-fetched, but it could come to pass in the future, when people raised in this environment and able to understand the implications and technical aspects come to political power.
The main take-away from the talk - an in fact all the talks I saw on the same day - was that while DP is touted as a silver bullet and the new hotness, in reality it can not protect against the battery of information theoretical attacks advertisers have been aware of for couple of decades, and intelligence agencies must have been doing for a lot longer. Hiding information is really hard. Cross-correlating data across different sets, even if each set in itself contains nothing but weak proxies, remains a powerful deanonymisation technique.
After all, if you have huge pool of people and dozens or even hundreds of unique subgroups, the Venn-diagram-like intersection of just a handful will carve out a small and very specific population.
There's a lot of privacy snakeoil out there and even large govt departments fall for it.
https://pursuit.unimelb.edu.au/articles/the-simple-process-o...
It's possible I'm misreading, but your paper seems to focus on the very anonymization techniques diff privacy was invented to improve on, specifically because these kinds of attacks exist. While I agree it's no silver bullet, the reason is because it's too strong (it's hard to get useful results while providing such powerful guarantees) rather than not strong enough.
I've found the introduction to this textbook on it to be useful and very approachable if others are interested: https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf
After spending three years working on privacy technologies I'm convinced that anonymization of high-dimensional datasets (say more than 1000 bits of information entropy per individual) is simply not possible for information-theoretic reasons, the best we can do for such data is peudonymization or deletion.
I want to be better equipped to respond to this slowly emerging "DP is a silver bullet" meme and your response implies that you'd have actual research to back the position up.
Also, it's not a magical solution. Here's one of the issues from the linked paper (edited for clarity):
"The proponents of differential privacy have always maintained that the setting of the [trade-off between privacy loss (ε) and accuracy] is a policy question, not a technical one. [...] To date, the Census committee has set the values of ε far higher than those envisioned by the creators of differential privacy. (In their contemporaneous writings, differential privacy’s creators clearly imply that they expected values of ε that were “much less than one.”)
us census, RIP
One of the leaks they talk about way from Experian, a credit reporting agency. Not only would this approach work poorly for them, it wouldn't be legal (they need to be able to back up any claims they make about people, which requires going back to the source data).
For freeways, lots of small segments, and fuzzing of timestamps to co-mingle users. Where there's a stoplight snap the intersection cross-time to the green light (guess) for anyone in the queue.
The anonymity would come from breaking up both requests and observed telemetry to fragments too small to tie back to a single user or session (and thus form a pattern; I hope).
Do NOT record end-times, only an intended route. Do NOT associate that movement to any particular user or persistent session (ideally in memory on the mobile device only, not saved: though it could save favorite routes locally). Packages of transition times between various freeway exits would generally help add to anonymity.
That would also be part of generally improving the UI for the user. The application on the device should be making most of the decisions, by asking about the traffic in a given region on a grid. I also want it to show me (the driver) the data (heatmap) on the rejected routes so I know what isn't a good option.
https://www.hhs.gov/hipaa/for-professionals/privacy/special-...
https://www.deccanchronicle.com/technology/in-other-news/201...
There's tons of PHI on the internet. Your local hospital's online medical chart, your insurance companies bill-pay, etc...