Privacy Preserving Classification and Association Rules Mining over Centralized Data
Supervisor: Marzena Kryszkiewicz, Professor PhD, DSc
tel. +48 22 234 77 01
fax. +48 22 234 60 91
Beginning: 2009-10-02
End: 2010-09-30
Aim of project
Nowadays large amounts of data can be collected and stored, thus data mining is used in almost every domain of our life. Nevertheless, users are afraid of revealing sensitive values about themselves, because provided data and hidden knowledge discovered by data mining can be misused. It makes gathering high quality data harder. The goal of preserving privacy is to encourage people to provide true information, even about sensitive values. It also enables organizations to provide data with the possibility to discover hidden knowledge from it, but without revealing the individual characteristics of the objects, i.e., customers. In privacy preserving, the information can be hidden either on an individual or aggregate level. In the former case, the individual characteristics about the objects are not revealed. In the latter, the knowledge which could be discovered by a data miner is hidden. One of the privacy preserving methods used for hiding information on the individual level is the value distortion. For continuous attributes, random noise is added to original values. For nominal attributes, the original values of attributes are changed according to a given probability distribution. Only distorted values are stored. The value distortion method enables a data miner to store data in a centralized database. This method causes trade-off between a privacy level and accuracy. The higher level of privacy, the lower accuracy of the results we have. It is a challenge for data miners. Thus, new more effective privacy preserving classification and association rules mining algorithms for centralized data will be proposed. Ordered attributes, meta-learning, and hierarchical combining of classifiers will be used to reduce accuracy loss. Moreover, the modification of the association rules mining algorithm shall reduce time complexity. The experimental system will be used to test new algorithms and compare them with the existing solutions.
Expected results
As a result of the project new methods and algorithms for privacy preserving classification and association rules mining will be proposed. The most important results will be presented in conferences and published in scientific journals. To verify the proposed algorithms and check their efficiency the experimental system will be developed. The proposed algorithms and the results of experiments will be presented in the Ph.D. thesis.
Polish version