Contact at or 8097636691
Responsive Ads Here

Monday, 12 February 2018

CHARM: A Cost-Efficient Multi-Cloud Data Hosting Scheme with High Availability(2015)

CHARM: A Cost-Efficient Multi-Cloud Data 

Hosting Scheme with High Availability(2015)

Nowadays, more and more enterprises and organizations are hosting their data into the cloud, in order to reduce the IT maintenance cost and enhance the data reliability. However, facing the numerous cloud vendors as well as their heterogenous pricing policies, customers may well be perplexed with which cloud(s) are suitable for storing their data and what hosting strategy is cheaper. The general status quo is that customers usually put their data into a single cloud (which is subject to the vendor lock-in risk) and then simply trust to luck. Based on comprehensive analysis of various state-of-the-art cloud vendors, this is a novel data hosting scheme (named CHARM) which integrates two key functions desired. The first is selecting several suitable clouds and an appropriate redundancy strategy to store data with minimized monetary cost and guaranteed availability. The second is triggering a transition process to re-distribute data according to the variations of data access pattern and pricing of clouds. We evaluate the performance of CHARM using both trace-driven simulations and prototype experiments. The results show that compared with the major existing schemes, CHARM not only saves around 20 percent of monetary cost but also exhibits sound adaptability to data and price adjustments.
KEYWORDS: Multi-cloud, data hosting, cloud storage.
  1. More and more enterprises and organizations are hosting all or part of their data into the cloud, in order to reduce the IT maintenance cost (including the hardware, software, and operational cost) and enhance the data reliability.
Existing clouds exhibit great heterogeneities in terms of both working performances and pricing policies. Different cloud vendors build their respective infrastructures and keep upgrading them with newly emerging gears. They also design different system architectures and apply various techniques to make their services competitive. Such system diversity leads to observable performance variations across cloud vendors. Moreover, pricing policies of existing storage services provided by different cloud vendors are distinct in both pricing levels and charging items. For instance, Rack space does not charge for Web operations, Google Cloud Storage charges more for bandwidth consumption, while Amazon S3 charges more for storage space. Facing numerous cloud vendors as well as their heterogenous performances/policies, customers may be perplexed with which cloud(s) are suitable for storing their data and what hosting strategy is cheaper. The general status quo is that customers usually put their data into a single cloud and then simply trust to luck. This is subject to the so-called “vendor lock-in risk”, because customers would be confronted with a dilemma if they want to switch to other cloud venders.
The vendor lock-in risk first lies in that data migration inevitably generates considerable expense. For example, moving 100 TB of data from Amazon S3 (California datacenter) to Aliyun OSS (Beijing datacenter) would consume as much as 12,300 (US) dollars. Besides, the vendor lock-in risk makes customers suffer from price adjustments of cloud vendors which are not uncommon. For example, the fluctuation of electricity bills in a region will affect the prices of cloud services in this region. We notice that giant cloud vendors like Windows Azure and Google Cloud Storage have been adjusting their pricing terms.
Unexpected bankruptcy of cloud vendors further aggravates the situation. Nirvanix, which has thousands of customers including top 500 companies, suddenly shut down its cloud storage service in Sep. 2013. Ubuntu One, also a famous player in the market of cloud storage service, escaped in Apr. 2014. So clearly, it is unwise for an enterprise or an organization to host all data in a single cloud—“your best bet is probably not to put all your eggs in one basket.”
  1. Lock in risk
  2. Service level agreement failures and outages.
  3. Uncontrolled availability.
We propose a novel cost-efficient data hosting scheme with high availability in heterogenous multi-cloud, named “CHARM”. It intelligently puts data into multiple clouds with minimized monetary cost and guaranteed availability. Specifically, we combine the two widely used redundancy mechanisms, i.e., replication and erasure coding, into a uniform model to meet the required availability in the presence of different data access patterns.  Next, we design an efficient heuristic-based algorithm to choose proper data storage modes (involving both clouds and redundancy mechanisms). Moreover, we implement the necessary procedure for storage mode transition (for efficiently re-distributing data) by monitoring the variations of data access patterns and pricing policies. We evaluate the performance of CHARM using both trace-driven simulations and prototype experiments. The traces are collected from two online storage systems: Amazing Store and Corsair, both of which possess hundreds of thousands of users. In the prototype experiments, we replay samples from the two traces for a whole month on top of four mainstream commercial clouds: Amazon S3, Windows Azure, Google Cloud Storage, and Aliyun OSS.
  1. Saves 20% monitoring cost.
  2. Sound Price adjustments.
There are four main components in CHARM: Data Hosting, Storage Mode Switching (SMS), Workload Statistic, and Predictor. Workload Statistic keeps collecting and tackling access logs to guide the placement of data. It also sends statistic information to Predictor which guides the action of SMS. Data Hosting stores data using replication or erasure coding according to the size and access frequency of the data. SMS decides whether the storage mode of certain data should be changed from replication to erasure coding or in reverse, according to the output of Predictor. The implementation of changing storage mode runs in the background, in order not to impact online service. Predictor is used to predict the future access frequency of files. The time interval for prediction is one month, that is, we use the former months to predict access frequency of files in the next month. However, we do not put emphasis on the design of predictor, because there have been lots of good algorithms for prediction. Moreover, a very simple predictor, which uses the weighted moving average approach, works well in our data hosting model.
We first formally define the mathematical model applied in Data Hosting. When talking about erasure coding, we usually mean m > 1 (not replication). However, replication is a special case of erasure coding (i.e., m = 1). So we combine the two storage mechanisms and define a unified model. Assuming we have N clouds that meet performance requirements. We choose n cloud to store a file, the file should be encoded into n blocks of equal size (n <_ N), including m data blocks and n - m coding blocks. If m = 1, the n - m coding blocks are the same with the data block, i.e., replication. Then the n blocks are distributed into the n clouds. We call a (m, n) pair with its corresponding clouds a storage mode.
We first assign each cloud a value di which is calculated based on four factors (i.e., availability, storage, bandwidth, and operation prices) to indicate the preference of a cloud. We choose the most preferred n clouds, and then heuristically exchange the cloud in the preferred set with the cloud in the complementary set to search better solution. This is similar to the idea of Kernighan-Lin heuristic algorithm, which is applied to effectively partition graphs to minimize the sum of the costs on all edges cut.
Intuitively, when a file changes from “hot” to “cold”, we should change its storage mode. More specifically, when the read frequency of the file drops below or increases above a certain value, changing storage mode can save more money. The value is determined by the prices of clouds. Given the available clouds including their prices and availability, we can figure out the storage mode and the selected clouds with the input of file’s size and read count.
Software Requirements:
Language                                    :       JDK (1.7.0)
Frontend                                     :       JSP, Servlets
Backend                                      :       Oracle10g
IDE                                             :       my eclipse 8.6
Operating System                       :       windows XP
Server                                          :       tomcat 7
Hardware Requirements:
Processor                                     :       Pentium IV
Hard Disk                                    :       80GB
RAM                                            :       2GB

No comments:

Post a Comment