JCSE, vol. 5, no. 3, pp.167-168, 2011
DOI:
Preface for the Special Issue on Privacy-Aspects of Data Mining
Aris Gkoulalas-Divanis, Kun Liu, Vassilios S. Verykios, Ran Wolff
Information Analytics Lab, IBM Research-Zurich, Switzerland/ Yahoo! Silicon Valley Labs, Santa Clara, CA, USA/ School of Science and Technology, Hellenic Open University, Patras, Greece/ University of Haifa, Mt. Carmel, Israel
Abstract: The collection, analysis and sharing of person-specific data for various purposes, such as data publication or data mining, raises
serious concerns about the privacy of the individuals who are represented in the data, as well as the sensitive knowledge patterns that
can be exposed when mining the data by the existing data mining technology. To address these concerns, the domain of privacy-preserving
data mining was brought into existence a decade ago. Since then, a variety of methodologies for privacy-preserving data publishing,
data mining and data linkage, have been developed. Emerging research areas, such as stream mining, mobility data mining,
web mining, clinical genomics data mining, and social network analysis, have been also recently investigated under the prism of privacy
protection, leading to several interesting theoretical and applied techniques for offering privacy.
The trade-off between the conflicting goals of data privacy and data utility is of paramount importance in privacy-preserving data
mining. Accordingly, privacy-preserving methodologies that offer certain guarantees about the level of achieved privacy or data utility,
thereby allowing the released data to be useful for supporting data demanding applications such as biomedical and healthcare studies,
location-based services and e-commerce, have been designed. Moreover, the necessity for integrating data from multiple data sources
in a privacy-preserving manner, has led to the proposal of approaches that provide privacy while achieving high accuracy in the performed
linkage.
This special issue is focused on state-of-the-art approaches in the area of privacy-preserving data mining. We invited five papers,
out of which three were recommended submissions from the best ranked papers presented at the IEEE International Workshop on Privacy
Aspects of Data Mining [PADM 2010 (The website of the workshop is available at: http://www.zurich.ibm.com/padm2010/)] that
was held in Sydney, Australia. All papers were reviewed by external reviewers on the basis of technical quality, originality, significance
and clarity. The accepted papers represent the diversity of research in privacy-preserving data mining. In what follows, we elaborate
on the contribution of each invited paper to the state-of-the-art.
The first paper is titled “Limiting Attribute Disclosure in Randomization based Microdata Release” and is authored by Ling Guo,
Xiaowei Ying, and Xintao Wu. It provides a systematic study of the randomization method for protecting micro-data from attribute
disclosure, and proposes a randomization model together with an efficient solution to improve the utility of the released dataset. The
authors introduce a uniform definition for attribute disclosure which is compatible for both randomization and generalization-based
models, and use it to evaluate their approach against l-diversity and anatomy in terms of data utility preservation, under the same privacy
requirements. Their evaluation covers three aspects of data utility, namely reconstructed distributions, accuracy of answering
queries, and preservation of correlations, and indicates that randomization can incur lower utility loss.
The second paper is “Privacy Disclosure and Preservation in Learning with Multi-relational Databases” by Hongyu Guo, Herna L.
Viktor, and Eric Paquet. It studies the following problem: Given a relational database with multiple tables, a target attribute for future
learning, and confidential attributes that should not be divulged, how can one derive a subschema that can maintain the predictive performance
of the target attribute while protecting the confidential attributes. To solve this problem, the authors propose Target Shifting
Multi-relational Classification (TSMC), an algorithm which operates in four steps. First, the attributes that are correlated with a confidential
attribute are identified using the CrossMine algorithm. Second, based on the computed correlations, the degrees of sensitivity
for the different tables of the database are calculated. Third, subschemas consisting of different tables of the database are constructed.
Last, for each of the subschemas, its performance when predicting the target attribute, along with its privacy sensitivity level, is computed
using a combined metric called Subschema Privacy-Informativeness (PI). As a result, a ranked list of subschemas with different
PI values is returned to the user to allow him/her select the subschema of interest. The effectiveness of the proposed method is demonstrated
using a financial and an insurance database.
The third paper is titled “Anonymizing Graphs Against Weight-based Attacks with Community Preservation” and is authored by
Yidong Li and Hong Shen. It considers the identity re-identification problem in the context of social networks and, in particular, in
weighted graphs. Weighted graphs are more vulnerable to identity disclosure than their non-weighted counterparts because they introduce more unique information which makes the disclosure easier. The authors formalize two important background knowledge attacks
for weight-based graphs, which regard the volume (sum of weights) and the histogram (set of adjacent weights) for each vertex.
Accordingly, they propose an algorithm that effectively blocks these attacks by transforming the original graph to create structural uniformity.
The proposed algorithm aims to minimize information loss, captured as the change in the graph spectrum due to the anonymization
process, and is evaluated on both real and synthetic weighted graphs. An extension of the proposed approach that allows
preserving the quality of community detection, which is a popular application in graph mining, is also provided by the authors.
The fourth paper is “Uncertainty for Privacy and 2-Dimensional Range Query Distortion” by Spyros Sioutas, Emmanouil Magkos,
Ioannis Karydis, and Vassilios S. Verykios. It studies the problem of privacy in the context of location-based services by focusing on
the efficient answering of range queries involving users moving on the plane. The authors adopt the well-known centralized model for
location privacy, where a trusted server mediates between the mobile devices of the users and the untrusted LBS provider. By employing
this model, they introduce a framework for k-trajectory privacy, which aims to protect the location privacy of the users when
requesting location-based services, as well as prevent the linkage of two or more successive user positions. The proposed framework
uses two-dimensional surfaces to introduce uncertainty regarding the whereabouts of the users, clusters mobile users according to their
motion patterns, and utilizes the generated clusters to answer queries over the user locations. The authors develop a set of efficient spatiotemporal
access methods and experimentally evaluate the impact of information distortion that is introduced by their algorithm to
the original data, by comparing the performance results of the same spatio-temporal range queries when executed on the original and
on the anonymized data.
The fifth paper is titled “Secure Blocking + Secure Matching = Secure Record Linkage” and is authored by Alexandros Karakasidis
and Vassilios S. Verykios. It proposes an approach to tackle the problem of privacy-preserving approximate record linkage by using
the well-known Levenshtein Distance algorithm. The authors introduce a framework which consists of two basic components: a secure
blocking component, which is based on phonetic algorithms that are statistically enhanced to improve security, and a secure matching
component, which performs approximate matching using a private version of the Levenshtein Distance algorithm. Together, these
components achieve to combine the speed of private blocking with the increased accuracy that is offered by approximate secure
matching. As the authors demonstrate, th
Keyword:
No keyword
Full Paper: 338 Downloads, 2822 View
|