# Closeness: A New Privacy Measure for Data Publishing(2010)

**ABSTRACT**

The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (i.e., a set of records that are indistinguishable from each other with respect to certain “identifying” attributes) contains at least k records. Recently, several authors have recognized that k-anonymity cannot prevent attribute disclosure. The notion of `-diversity has been proposed to address this; `-diversity requires that each equivalence class has at least ` well-represented values for each sensitive attribute. In this article, we show that `-diversity has a number of limitations. In particular, it is neither necessary nor sufficient to prevent attribute disclosure. Motivated by these limitations, we propose a new notion of privacy called “closeness”. We first present the base model t- closeness, which requires that the distribution of a sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table (i.e., the distance between the two distributions should be no more than a threshold t). We then propose a more flexible privacy model called (n, t)-closeness that offers higher utility. We describe our desiderata for designing a distance measure between two probability distributions and present two distance measures. We discuss the rationale for using closeness as a privacy measure and illustrate its advantages through examples and experiments.

**Modules:**

**1.**

**Publishing privacy:**

Doesn’t need to set security for your publishing data’s but yours are safe. Yes administrator only can see full details. The third party can’t fully details. Third searching for people in this records database can view splitting/blocking records using l-diversion and closeness.

**2.**

**L-diversion and closeness:**

L-diversion and closeness is derived formula can using secured data publishing. Here using the formula of l-diversion Entropy (E) = −∑

_{s2S }p(E, s) log p(E, s). Logically we will process this formula getting data’s recursively then splitting row wise data’s. For example*Count*that indicates the number of individuals. The probability of cancer among the population in the dataset is 700/3000 = 0.23 while the probability of cancer among individuals in the first equivalence class is as high as 300/600 = 0.5**3.**

**Anonymization Algorithms:**

To be calculate the distance closeness and checking for this privacy process. Desiderata for Designing the Distance Measure for these properties.

i. Identity of indiscernible. An adversary has no information gain if her belief does not change. Mathematically, D[P,P] = 0, for any P.

ii. Non-negativity: When the released data is available, the adversary has a non-negative information gain. Mathematically [P,Q] _ 0, for any P and Q.

iii. Probability scaling: The belief change from probability_ to _+ is more significant than that from _ to _+ when _ < _ and _ is small. D[P,Q] should consider reflect the difference.

iv.Zero-probability definability: D[P,Q] should be well-defined when there are zero probability values in P and Q.

4.

**Data Processing:**
This is one of property in Desiderata for Designing the Distance Measure. Searching for in particular is set on substring of asterisk (*). That mean the public visible data’s are blocked/substring in * symbols using EMD analyses. You can identify easy to see closeness ratio. L-diversion and closeness is very low the security mode very high. Incase l-diversion and closeness is very high the security mode is very low.

**Existing System:**

Before data publishing privacy called to set security code. Each and every person need to register and getting security code. This is the waste of time. Another one is the public semantic searching and getting result for public person. This public person is not considered anonymous. Clearly, the released data containing such information about individuals should not be considered anonymous. Sometimes getting information via searching in particular/filter particular name wise.

**Proposed System:**

Can’t visible full information for the public person. Incase public person search for a particular person information the result is each and every splitting data’s then blocking or set substring of asterisk (*) using l-diversion and closeness. Here public person or unauthorized person is considered anonymous. We can analyse how much percentage of possible privacy loss. Here is also available checking utility (EMD) analyse using Anonymization Algorithm.

You can identify easy to see closeness ratio. L-diversion and closeness is very low the security mode very high. Incase l-diversion and closeness is very high the security mode is very low. We can the Distance Measure,

i.Identity of indiscernible: An adversary has no information gain if her belief does not change. Mathematically, D[P,P] = 0, for any P.

ii.Non-negativity: When the released data is available, the adversary has a non-negative information gain. Mathematically [P,Q] _ 0, for any P and Q.

iii.Probability scaling: The belief change from probability_ to _+ is more significant than that from _ to _+ when _ < _ and _ is small. D[P,Q] should consider reflect the difference.

iv. Zero-probability definability: D[P,Q] should be well-defined when there are zero probability values in P and Q.

**HARDWARE SPECIFICATION**

**Processor**: Any Processor above 500 MHz.

**Ram**: 128Mb.

**Hard Disk**: 10 GB.

**Input device**: Standard Keyboard and Mouse.

**Output device**: VGA and High Resolution Monitor.

**SOFTWARE SPECIFICATION**

**Operating System**: Windows Family.

**Pages developed using**: Java Server Pages and HTML.

**Techniques**: Apache Tomcat Web Server 5.0, JDK 1.5 or higher

**Web Browser**: Microsoft Internet Explorer.

**Data Bases**: My SQL 5.0

**Client Side Scripting**: Java Script

## No comments:

## Post a Comment