LightBlog
Contact at mumbai.academics@gmail.com or 8097636691/9323040215
Responsive Ads Here

Thursday, 22 February 2018

A Scalable Two Phase Top Down Specialization Approach for Data Anonymization Using MapReduce on Cloud(2014)


A Scalable Two Phase Top Down Specialization 

Approach for Data Anonymization Using 

MapReduce on Cloud(2014)

ABSTRACT:
A large number of cloud services require users to share private data like electronic health records for data analysis or mining, bringing privacy concerns. Anonymizing data sets via generalization to satisfy certain privacy requirements such as k-anonymity is a widely used category of privacy preserving techniques. At present, the scale of data in many cloud applications increases tremendously in accordance with the Big Data trend, thereby making it a challenge for commonly used software tools to capture, manage, and process such large-scale data within a tolerable elapsed time. As a result, it is a challenge for existing anonymization approaches to achieve privacy preservation on privacy-sensitive large-scale data sets due to their insufficiency of scalability. In this paper, we propose a scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the MapReduce framework on cloud. In both phases of our approach, we deliberately design a group of innovative MapReduce jobs to concretely accomplish the specialization computation in a highly scalable way. Experimental evaluation results demonstrate that with our approach, the scalability and efficiency of TDS can be significantly improved over existing approaches.
EXISTING SYSTEM:
Ø A widely adopted parallel data processing framework, to address the scalability problem of the top-down specialization (TDS) approach for large-scale data anonymization. The TDS approach, offering a good tradeoff between data utility and data consistency, is widely applied for data anonymization. Most TDS algorithms are centralized, resulting in their inadequacy in handling largescale data sets. Although some distributed algorithms have been proposed, they mainly focus on secure anonymization of data sets from multiple parties, rather than the scalability aspect.
DISADVANTAGES OF EXISTING SYSTEM
Ø The MapReduce computation paradigm still a challenge to design proper MapReduce jobs for TDS.
Ø The overall performance of the privacy provided is low.
Ø It is only suitable for the small amount of data sets.
Ø The anonymization of the each level is low.
PROPOSED SYSTEM:
Ø In this paper, we propose a scalable two-phase top-down specialization (TDS) approach to anonymize large-scale data sets using the MapReduce framework on cloud.
Ø  In both phases of our approach, we deliberately design a group of innovative MapReduce jobs to concretely accomplish the specialization computation in a highly scalable way.
Ø This approach get input data’s and split into the small data sets.  Then we apply the ANONYMIZATION on small data sets to get intermediate result.
Ø Then small data sets are merge and again apply the ANONYMIZATION.
Ø We analyze the each and every data set sensitive field and give priority for this sensitive field.  Then we apply ANONYMIZATION on this sensitive field only depending upon the scheduling.
ADVANTAGES OF PROPOSED SYSTEM:
Ø Accomplish the specializations in a highly scalable fashion.
Ø Gain high scalability.
Ø  Significantly improve the scalability and efficiency of TDS for data anonymization over existing approaches.
Ø The overall performance of the providing privacy is high.
Ø Its ability to handles the large amount of data sets.
Ø The anonymization is effective to provide the privacy on data sets.
Ø Here we using the scheduling strategies to handle the high amount of datasets.
MODULES:
] ANONYMIZATION] DATA PARTITION
] MERGING
] SPECIALIZATION
] OBS
MODULES DESCRIPTION:
DATA PARTITION:
ü In this module the data partition is performed on the cloud.
ü Here we collect the large no of data sets.
ü We are split the large into small data sets.
ü Then we provides the random no for each data sets.
ANONYMIZATION:
ü After geting the individual data sets we apply the anonymization.
ü The anonymization means hide or remove the sensitive field in data sets.
ü Then we get the intermediate result for the small data sets
ü The intermediate results are used for the specialization process.
ü All intermediate anonymization levels are merged into one in the second phase. The merging of anonymization levels is completed by merging cuts. To ensure that the merged intermediate anonymization level ALI never violates privacy requirements, the more general one is selected as the merged one
MERGING:
ü The intermediate result of the several small data sets are merged here.
ü The MRTDS driver is used to organizes the small intermediate result
ü For merging, the merged data sets are collected on cloud.
ü The merging result is again applied in anonymization called specialization.
SPECIALIZATION:
ü After geting the intermediate result those results are merged into one.
ü Then we again applies the anonymization on the merged data it called specialization.
ü Here we are using the two kinds of jobs such as IGPL UPDATE AND IGPL INITIALIZATION.
ü The jobs are organized by web using the driver.
OBS:
ü The OBS called optimized balancing scheduling.
ü Here we focus on the two kinds of the scheduling called time and size.
ü Here data sets are split in to the specified size and applied anonymization on specified time.
ü The OBS approach is to provide the high ability on handles the large data sets.
SYSTEM REQUIREMENTS:
HARDWARE REQUIREMENTS:
Ø System                           :         Pentium IV 2.4 GHz.
Ø Hard Disk                      :         40 GB.
Ø Floppy Drive                 :         1.44 Mb.
Ø Monitor                          :         15 VGA Colour.
Ø Mouse                           :         Logitech.
Ø Ram                              :         512 Mb.
SOFTWARE REQUIREMENTS:
Ø Operating system   :         Windows XP/7.
Ø Coding Language  :         JAVA/J2EE/Hadoop
Ø IDE                            :         Ecllipse Europa, Hadoop 0.18.0
Ø Database                 :         MYSQL

No comments:

Post a Comment