Google adds Python support to privacy-preserving data analytics tool

Google has extended its open-source Differential Privacy (DP) Platform to support the Python programming language, expanding availability to millions more developers and data analysts.

The announcement makes Python the fourth language supported by the project after its initial launch in 2019 with support for C++, Java and the Google-created Go language, sometimes referred to as Golang.

It comes after Google reported that a significant number of developers have contacted the company to express interest in using the open-source library for their Python projects. Google has worked for more than a year with OpenMined on Python support and said many projects have already used its DP library, including Australian developers who have accelerated scientific discovery by analyzing medical data privately.

DP is a system used by data analysts to maintain the privacy of individuals whose data is used in an analyzed data set. The work to develop a solid DP goes back decades, but it’s only in recent years that tech giants like Google and Apple have embraced the system.

One of DP’s main areas of development for Google over the past year has been providing a tool for library developers to fine-tune “epsilon” – a mathematical measure of privacy. Finding the optimal epsilon takes a lot of trial and error to perfect, and having a tool in the library that allows developers to make adjustments to produce a lower epsilon, which indicates a more private build, means projects individual can be settled privately as possible.

Google said that now Python is supported, the DP library is now available to almost half of all developers worldwide, which means more developers and researchers will be able to analyze data and make new discoveries while preserving the confidentiality of the users to whom the data belong.

Python is one of the most popular programming languages ​​in use today and won the “Language of the Year 2021” award from the TIOBE index, which ranks programming languages ​​based on their popularity. Python is useful for a wide range of programming activities, but it’s especially known for its data analysis capabilities, which makes it a natural progression for Google’s DP library.

As part of the launch, Google launched a new web product, pipelinedp.io, which allows any Python developer to analyze their dataset with differential privacy. Google also said it has seen organizations experiment with new use cases, such as displaying a website’s most visited web pages by country, anonymously.

The library is compatible with major big data processing engines, Spark and Beam frameworks, and Google will release an additional tool to help users “visualize and better tune the parameters used to produce differentially private information.”

“We encourage developers around the world to take this opportunity to experiment with differential privacy use cases like statistical analysis and machine learning, but most importantly, let us know what you think,” Google said in announcing the news. “We’re excited to learn more about the apps you can all build and the features we can provide to help you along the way.

“We will continue to invest in democratizing access to critical privacy-enhancing technologies and hope that developers will join us on this journey to improve usability and coverage. As we’ve said before, we believe that every Internet user in the world deserves world-class privacy, and we will continue to partner with organizations to achieve this goal.”

What is Differential Privacy?

Differential privacy is a tool that has grown in popularity in recent years as data and identity protection has become a focal point for researchers, businesses, and regulators.

Some argue that it is fundamentally necessary in data analysis to preserve the confidentiality and hide the identity of the people whose data is analyzed. For tech companies in particular, it has been at the forefront of how their users expect them to handle the data they hold about others.

DP works by adding “controlled noise” to datasets so that people cannot be individually identified by the data they provide to the dataset. For example, if residents of a neighborhood provided data for analysis involving their salaries which were then represented as an average, and a resident left the neighborhood, their salary information could be linked to their identity by looking at the difference between pre- and post-move data.

Similarly, if two databases were analyzed, one with a single data point on 50 people and the other with a single data point on 51 people, the scan results for the two should be indistinguishable. each other to avoid identifying this 51st person, in order to qualify as differentially private.

Adding controlled noise to a data set would remove the possibility of identifying an individual by skewing the statistics just enough to remove the identifying element, without significantly compromising the accuracy of the results.

All major Big Tech companies have adopted DP in different ways. Microsoft’s Artificial Intelligence Lab is working with Harvard University on projects to facilitate DP-based research. Apple has used DP on its products since macOS Sierra and iOS 10, and Facebook and Amazon also have experience working with the system.

Featured Resources

Modern governance: the practical guide

Equip organizations with the right tools for business resilience

Free download

Cloud Operational Excellence

Everything you need to know about optimizing your cloud operations

look now

Board Management Software Buyer’s Guide

Improve the performance of your card

The Real Business Value of Oracle’s Autonomous Data Warehouse

Leader with a return on investment of 417% over five years

Download now

Comments are closed.