JCSE

JCSE, vol. 5, no. 3, pp.167-168, 2011

DOI:

Preface for the Special Issue on Privacy-Aspects of Data Mining

Aris Gkoulalas-Divanis, Kun Liu, Vassilios S. Verykios, Ran Wolff
Information Analytics Lab, IBM Research-Zurich, Switzerland/ Yahoo! Silicon Valley Labs, Santa Clara, CA, USA/ School of Science and Technology, Hellenic Open University, Patras, Greece/ University of Haifa, Mt. Carmel, Israel

Abstract: The collection, analysis and sharing of person-specific data for various purposes, such as data publication or data mining, raises serious concerns about the privacy of the individuals who are represented in the data, as well as the sensitive knowledge patterns that can be exposed when mining the data by the existing data mining technology. To address these concerns, the domain of privacy-preserving data mining was brought into existence a decade ago. Since then, a variety of methodologies for privacy-preserving data publishing, data mining and data linkage, have been developed. Emerging research areas, such as stream mining, mobility data mining, web mining, clinical genomics data mining, and social network analysis, have been also recently investigated under the prism of privacy protection, leading to several interesting theoretical and applied techniques for offering privacy. The trade-off between the conflicting goals of data privacy and data utility is of paramount importance in privacy-preserving data mining. Accordingly, privacy-preserving methodologies that offer certain guarantees about the level of achieved privacy or data utility, thereby allowing the released data to be useful for supporting data demanding applications such as biomedical and healthcare studies, location-based services and e-commerce, have been designed. Moreover, the necessity for integrating data from multiple data sources in a privacy-preserving manner, has led to the proposal of approaches that provide privacy while achieving high accuracy in the performed linkage. This special issue is focused on state-of-the-art approaches in the area of privacy-preserving data mining. We invited five papers, out of which three were recommended submissions from the best ranked papers presented at the IEEE International Workshop on Privacy Aspects of Data Mining [PADM 2010 (The website of the workshop is available at: http://www.zurich.ibm.com/padm2010/)] that was held in Sydney, Australia. All papers were reviewed by external reviewers on the basis of technical quality, originality, significance and clarity. The accepted papers represent the diversity of research in privacy-preserving data mining. In what follows, we elaborate on the contribution of each invited paper to the state-of-the-art. The first paper is titled “Limiting Attribute Disclosure in Randomization based Microdata Release” and is authored by Ling Guo, Xiaowei Ying, and Xintao Wu. It provides a systematic study of the randomization method for protecting micro-data from attribute disclosure, and proposes a randomization model together with an efficient solution to improve the utility of the released dataset. The authors introduce a uniform definition for attribute disclosure which is compatible for both randomization and generalization-based models, and use it to evaluate their approach against l-diversity and anatomy in terms of data utility preservation, under the same privacy requirements. Their evaluation covers three aspects of data utility, namely reconstructed distributions, accuracy of answering queries, and preservation of correlations, and indicates that randomization can incur lower utility loss. The second paper is “Privacy Disclosure and Preservation in Learning with Multi-relational Databases” by Hongyu Guo, Herna L. Viktor, and Eric Paquet. It studies the following problem: Given a relational database with multiple tables, a target attribute for future learning, and confidential attributes that should not be divulged, how can one derive a subschema that can maintain the predictive performance of the target attribute while protecting the confidential attributes. To solve this problem, the authors propose Target Shifting Multi-relational Classification (TSMC), an algorithm which operates in four steps. First, the attributes that are correlated with a confidential attribute are identified using the CrossMine algorithm. Second, based on the computed correlations, the degrees of sensitivity for the different tables of the database are calculated. Third, subschemas consisting of different tables of the database are constructed. Last, for each of the subschemas, its performance when predicting the target attribute, along with its privacy sensitivity level, is computed using a combined metric called Subschema Privacy-Informativeness (PI). As a result, a ranked list of subschemas with different PI values is returned to the user to allow him/her select the subschema of interest. The effectiveness of the proposed method is demonstrated using a financial and an insurance database. The third paper is titled “Anonymizing Graphs Against Weight-based Attacks with Community Preservation” and is authored by Yidong Li and Hong Shen. It considers the identity re-identification problem in the context of social networks and, in particular, in weighted graphs. Weighted graphs are more vulnerable to identity disclosure than their non-weighted counterparts because they introduce more unique information which makes the disclosure easier. The authors formalize two important background knowledge attacks for weight-based graphs, which regard the volume (sum of weights) and the histogram (set of adjacent weights) for each vertex. Accordingly, they propose an algorithm that effectively blocks these attacks by transforming the original graph to create structural uniformity. The proposed algorithm aims to minimize information loss, captured as the change in the graph spectrum due to the anonymization process, and is evaluated on both real and synthetic weighted graphs. An extension of the proposed approach that allows preserving the quality of community detection, which is a popular application in graph mining, is also provided by the authors. The fourth paper is “Uncertainty for Privacy and 2-Dimensional Range Query Distortion” by Spyros Sioutas, Emmanouil Magkos, Ioannis Karydis, and Vassilios S. Verykios. It studies the problem of privacy in the context of location-based services by focusing on the efficient answering of range queries involving users moving on the plane. The authors adopt the well-known centralized model for location privacy, where a trusted server mediates between the mobile devices of the users and the untrusted LBS provider. By employing this model, they introduce a framework for k-trajectory privacy, which aims to protect the location privacy of the users when requesting location-based services, as well as prevent the linkage of two or more successive user positions. The proposed framework uses two-dimensional surfaces to introduce uncertainty regarding the whereabouts of the users, clusters mobile users according to their motion patterns, and utilizes the generated clusters to answer queries over the user locations. The authors develop a set of efficient spatiotemporal access methods and experimentally evaluate the impact of information distortion that is introduced by their algorithm to the original data, by comparing the performance results of the same spatio-temporal range queries when executed on the original and on the anonymized data. The fifth paper is titled “Secure Blocking + Secure Matching = Secure Record Linkage” and is authored by Alexandros Karakasidis and Vassilios S. Verykios. It proposes an approach to tackle the problem of privacy-preserving approximate record linkage by using the well-known Levenshtein Distance algorithm. The authors introduce a framework which consists of two basic components: a secure blocking component, which is based on phonetic algorithms that are statistically enhanced to improve security, and a secure matching component, which performs approximate matching using a private version of the Levenshtein Distance algorithm. Together, these components achieve to combine the speed of private blocking with the increased accuracy that is offered by approximate secure matching. As the authors demonstrate, th

Keyword: No keyword

Full Paper: 367 Downloads, 3156 View