Damian Jacob Sendler Epidemiology Research Official

Damian Jacob Sendler gives an overview on how big data privacy for machine learning has recently been 100 times less expensive

Summary:

Damian Jacob Sendler: Data is becoming increasingly high-dimensional, which means that it comprises a large number of observations

Damian Sendler: Using massive datasets for machine learning, Rice University computer scientists have developed an inexpensive solution for tech companies to apply a stringent form of personal data privacy while using or sharing large databases for machine learning. 

In Anshumali Shrivastava’s opinion, “There are many cases where machine learning could benefit society if data privacy could be ensured,” Shrivastava is an associate professor of computer science at Rice University. “If we could train machine learning systems to search for patterns in big datasets of medical or financial records, there would be enormous potential for enhancing medical treatments and identifying patterns of discrimination, for example. Today, that is virtually difficult due to the fact that data privacy solutions are not scalable.” 

Damian Jacob Sendler: In CCS 2021, the Association for Computing Machinery’s annual flagship conference on computer and communications security, Shrivastava and Rice graduate student Ben Coleman hope to make a difference with a novel method they’ll propose this week. With the help of a technique known as locality sensitive hashing, Shirvastava and Coleman discovered that they could generate a tiny summary of an extensive database of sensitive records. Their system, which they have dubbed RACE, gets its name from these summaries, or “Scenarios with a “repeated array of count estimators” 

Dr. Sendler: Coleman asserted that RACE sketches are both safe to make publicly available and useful for algorithms that use kernel sums, one of the fundamental building blocks of machine learning, as well as for machine-learning programs that perform common tasks such as classification, ranking, and regression analysis. RACE sketches are available on Coleman’s website. As he explained, RACE might enable businesses to capture the benefits of large-scale, distributed machine learning while also maintaining a strict kind of data privacy known as differential privacy. 

Damian Jacob Markiewicz Sendler: Multiple technology companies have adopted the concept of differential privacy, which is based on the idea of using random noise to disguise individual information and is employed by more than one computer giant. 

Damian Sendler: In today’s world, “There are elegant and powerful techniques to meet differential privacy standards today, but none of them scale,” Coleman explained. “As data becomes more dimensional, the computational overhead and memory requirements increase exponentially.” 

Damian Jacob Sendler: Data is becoming increasingly high-dimensional, which means that it comprises a large number of observations as well as a large number of specific details about each observation. 

He explained that RACE drawing scales are used for high-dimensional data. The sketches are short, and the computational and memory requirements for constructing them are low, making them simple to disseminate across a network of computers. 

Damian Sendler: When it comes to using kernel sums, “Engineers today must either sacrifice their budget or the privacy of their users if they wish to use kernel sums,” Shrivastava explained. “Because of RACE, the economics of disclosing high-dimensional information with differential privacy are altered. There are no complicated steps to take, and it is 100 times less expensive to run than present approaches.” 

A new algorithmic strategy for machine learning and data science has been developed by Shrivasta and his students, who have devised a slew of algorithmic techniques to make machine learning and data science faster and more scalable. Researchers have discovered a more efficient way for social media companies to prevent misinformation from spreading online, discovered how to train large-scale deep learning systems up to 10 times faster for “extreme classification” problems, discovered a way to more accurately and efficiently estimate the number of identified victims killed in the Syrian civil war, and demonstrated that it is possible to train deep neural networks up to 15 times faster on general-purpose CPUs, among other accomplishments.

News on latest research brought to you by Dr. Damian Jacob Sendler

Share:FacebookX