Contact at mumbai.academics@gmail.com or 8097636691
Responsive Ads Here

Thursday, 22 February 2018

Optimizing Bloom Filter Settings in Peer-to-Peer Multi-keyword Searching(2012)


Optimizing Bloom Filter Settings in 

Peer-to-Peer Multi-keyword Searching(2012)


Abstract:                                                              
Peer-to-Peer multi-keyword searching requires distributed intersection/union operations across wide area networks, raising a large amount of traffic cost. Existing schemes commonly utilize Bloom Filters (BFs) encoding to effectively reduce the traffic cost during the intersection/union operations. In this paper, we address the problem of optimizing the settings of a BF. We show, through mathematical proof, that the optimal setting of BF in terms of traffic cost is determined by the statistical information of the involved inverted lists, not the minimized false positive rate as claimed by previous studies. Through numerical analysis, we demonstrate how to obtain optimal settings. To better evaluate the performance of this design, we conduct comprehensive simulations on TREC WT10G test collection and query logs of a major commercial web search engine. Results show that our design significantly reduces the search traffic and latency of the existing approaches.
Existing System:
Existing P2P retrieval mechanisms provide a scalable distributed hash table (DHT) [3] that allows every individual keyword to be mapped to a set of documents/nodes across the network that contains the keyword. Using this single-keyword- based index, a list of entries for each keyword in a query can be retrieved by using existing DHT lookups. For multikeyword search, the simple solution which merges the results of each keyword search incurs a lot of traffic.
Proposed System:
It is well known that Bloom Filter (BF) [4] is an effective way to reduce such communication cost [5], [6]. A BF is a lossy but succinct and efficient data structure to represent a set S, which can efficiently process the membership query such as “is the element x in the set S.” By transmitting the encoded sets instead of raw sets among peers, the communication cost can be effectively saved. Applying BF is not difficult, but how to get optimal results in terms of minimum communication cost is not trivial
In this work, we show mathematically that the optimal setting of BF in terms of communication cost is determined by the global statistical information of the involved items on both sides, not the minimized false positive rate as claimed by the previous studies [5]. We further demonstrate how to get optimal settings through numerical analysis. Moreover, we find that the intersection order between sets is indeed important for multikeyword search; thus, we design optimal order strategies based on BF for both queries with “AND” and “OR” operators.
Modules:
·         Authorization
·         Finding Network Computers
·         DHT creation
·         Single-Keyword Search
·         Multi-Keyword Search
Software and Hardware Requirements
Hardware Required:                            
System                                    :           Pentium IV
Hard Disk                   :           80 GB
RAM                           :           512 MB
Software Required:                              
Operating System       :           Windows XP
Language                   :           JAVA

No comments:

Post a Comment