Can data science save Social Media?
The unfettered internet is too often used for malicious purposes and
is frequently woefully inaccurate. Social media — especially Facebook — has failed miserably at protecting user privacy and blocking miscreants from sowing discord.
That’s why CEO Mark Zuckerberg was just forced to testify about user privacy before both houses of Congress. And now governmental regulation of Facebook and other social media appears to be a fait accompli.
Specifically, Facebook must promulgate and embrace what is known in high-level security circles as homomorphic encryption (HE), often considered the “Holy Grail” of cryptography, and data provenance
(DP). HE would enable Facebook, for example, to generate aggregated
reports about its user psychographic profiles so that advertisers could
still accurately target groups of prospective customers without knowing
their actual identities.
Meanwhile, data provenance — the process of tracing and recording true identities and the origins of data and its movement between databases — could unearth the true identities of Russian perpetrators and other malefactors, or at least identify unknown provenance, adding much-needed transparency in cyberspace.
Both methodologies are extraordinarily complex. IBM and Microsoft, in addition to the National Security Agency, have been working on HE for years, but the technology has suffered from significant performance challenges. Progress is being made, however. IBM, for example, has been granted a patent on a particular HE method — a strong hint it’s seeking a practical solution — and last month proudly announced that its rewritten HE encryption library now works up to 75 times faster. Maryland-based ENVEIL, a startup staffed by the former NSA HE team, has broken the performance barriers required to produce a commercially viable version of HE, benchmarking millions of times faster than IBM in tested use cases.
A particularly promising sign for HE emerged last year, when Google revealed a new marketing measurement tool that relies on this technology to allow advertisers to see whether their online ads result in in-store purchases.
Unearthing this information requires analyzing data sets belonging to separate organizations, notwithstanding the fact that these organizations pledge to protect the privacy and personal information of the data subjects. HE skirts this by generating aggregated, non-specific reports about the comparisons between these data sets.
In pilot tests, HE enabled Google to successfully analyze encrypted data about who clicked on an advertisement in combination with another encrypted multi-company data set that recorded credit card purchase records. With this data in hand, Google was able to provide reports to advertisers summarizing the relationship between the two databases to conclude, for example, that five percent of the people who clicked on an ad wound up purchasing in a store.
The art market, as an example, deploys DP to combat fakes and forgeries of the world’s greatest paintings, drawings and sculptures. It uses DP techniques to create a verifiable, chain-of-custody for each piece of the artwork, preserving the integrity of the market.
Much the same thing can be done in the online world. For example, a Facebook post referencing a formal statement by a politician, with an accompanying photo, would have provenance records directly linking the post to the politician’s press release and even the specifics of the photographer’s camera. The goal — again — is ensuring that data content is legitimate.
Companies such as Walmart, Kroger, British-based Tesco and Swedish-based H&M, an international clothing retailer, are using or experimenting with new technologies to provide provenance data to the marketplace.
Let’s hope that Facebook and its social media brethren begin studying HE and DP thoroughly and implement it as soon as feasible. Other strong measures — such as the upcoming implementation of the European Union’s General Data Protection Regulation, which will use a big stick to secure personally identifiable information — essentially should be cloned in the U.S. What is best, however, are multiple avenues to enhance user privacy and security, while hopefully preventing breaches in the first place. Nothing less than the long-term viability of social media giants is at stake. (Via Techcrunch)
That’s why CEO Mark Zuckerberg was just forced to testify about user privacy before both houses of Congress. And now governmental regulation of Facebook and other social media appears to be a fait accompli.
At
this key juncture, the crucial question is whether regulation — in
concert with Facebook’s promises to aggressively mitigate its weaknesses
— will correct the privacy abuses and continue to fulfill Facebook’s
goal of giving people the power to build transparent communities,
bringing the world closer together?
The answer is maybe.
What
has not been said is that Facebook must embrace data science
methodologies initially created in the bowels of the federal government
to help protect its two billion users. Simultaneously, Facebook must
still enable advertisers — its sole source of revenue — to get the user
data required to justify their expenditures.
Meanwhile, data provenance — the process of tracing and recording true identities and the origins of data and its movement between databases — could unearth the true identities of Russian perpetrators and other malefactors, or at least identify unknown provenance, adding much-needed transparency in cyberspace.
Both methodologies are extraordinarily complex. IBM and Microsoft, in addition to the National Security Agency, have been working on HE for years, but the technology has suffered from significant performance challenges. Progress is being made, however. IBM, for example, has been granted a patent on a particular HE method — a strong hint it’s seeking a practical solution — and last month proudly announced that its rewritten HE encryption library now works up to 75 times faster. Maryland-based ENVEIL, a startup staffed by the former NSA HE team, has broken the performance barriers required to produce a commercially viable version of HE, benchmarking millions of times faster than IBM in tested use cases.
How homomorphic encryption would help Facebook
HE
is a technique used to operate on and draw useful conclusions from
encrypted data without decrypting it, simultaneously protecting the
source of the information. It is useful to Facebook because its massive
inventory of personally identifiable information is the foundation of
the economics underlying its business model. The more comprehensive the
data sets about individuals, the more precisely advertising can be
targeted.
HE could keep Facebook information safe from hackers and
inappropriate disclosure, but still extract the essence of what the
data tells advertisers. It would convert encrypted data into
strings of numbers, do math with these strings, then decrypt the
results to get the same answer it would if the data wasn’t encrypted at
all.A particularly promising sign for HE emerged last year, when Google revealed a new marketing measurement tool that relies on this technology to allow advertisers to see whether their online ads result in in-store purchases.
Unearthing this information requires analyzing data sets belonging to separate organizations, notwithstanding the fact that these organizations pledge to protect the privacy and personal information of the data subjects. HE skirts this by generating aggregated, non-specific reports about the comparisons between these data sets.
In pilot tests, HE enabled Google to successfully analyze encrypted data about who clicked on an advertisement in combination with another encrypted multi-company data set that recorded credit card purchase records. With this data in hand, Google was able to provide reports to advertisers summarizing the relationship between the two databases to conclude, for example, that five percent of the people who clicked on an ad wound up purchasing in a store.
Data provenance
Data provenance has a markedly different core principle. It’s based on the fact that digital information is atomized into 1s and 0s with no intrinsic truth. The dual digits exist only to disseminate information, whether accurate or widely fabricated. A well-crafted lie can easily be indistinguishable from the truth and distributed across the internet. What counts is the source of these 1s and 0s. In short, is it legitimate? What is the history of the 1s and 0s?The art market, as an example, deploys DP to combat fakes and forgeries of the world’s greatest paintings, drawings and sculptures. It uses DP techniques to create a verifiable, chain-of-custody for each piece of the artwork, preserving the integrity of the market.
Much the same thing can be done in the online world. For example, a Facebook post referencing a formal statement by a politician, with an accompanying photo, would have provenance records directly linking the post to the politician’s press release and even the specifics of the photographer’s camera. The goal — again — is ensuring that data content is legitimate.
Companies such as Walmart, Kroger, British-based Tesco and Swedish-based H&M, an international clothing retailer, are using or experimenting with new technologies to provide provenance data to the marketplace.
Let’s hope that Facebook and its social media brethren begin studying HE and DP thoroughly and implement it as soon as feasible. Other strong measures — such as the upcoming implementation of the European Union’s General Data Protection Regulation, which will use a big stick to secure personally identifiable information — essentially should be cloned in the U.S. What is best, however, are multiple avenues to enhance user privacy and security, while hopefully preventing breaches in the first place. Nothing less than the long-term viability of social media giants is at stake. (Via Techcrunch)
Post a Comment