Building effective storage the key to low-cost cloud?

16th November 2011

Virsto Software’s VP for Product Management, Eric Burgener provides an insight into the technologies required to build a cost-effective storage infrastructure, and why understanding performance, scalability and availability issues are so vital in the process.

Storage the key to aggressively-priced cloud
Eric Burgener, Virsto Software

One of the key reasons for moving to a cloud-based infrastructure is to lower overall infrastructure costs. This is true regardless of whether you are building an in-house (private) cloud or are a public cloud provider looking to price your offerings competitively.

Because storage comprises such a large part of the outlay of any cloud-based infrastructure, it is an obvious place to look for optimisations and lower overall costs. A lower cost virtual infrastructure gives cloud providers pricing leeway which can be used to either out-price competitors or to increase margins.

In building a cost-effective storage infrastructure it is important to look at the following critical technologies:

Scalable, resilient networked storage subsystems

Ensure that the storage you choose is modularly expandable and will scale to meet your business requirements. Networked storage architectures offer better opportunities not only for expansion, but also for redundancy and for storage sharing – which is critical to support the live migration of virtual machines (VMs) necessary to meet uptime requirements. Storage layouts should use RAID for redundancy, and provide multiple paths to each storage device for high availability, as well as supporting online expansion and maintenance.

Thin provisioning

Historically, storage has been significantly over-provisioned to accommodate growth. Allocated but unused storage is an expensive waste of space, and thin provisioning is a storage technology which effectively addresses this. By transparently allocating storage on demand as environments grow, administrators no longer have to over-provision. When thin provisioning technology is initially deployed in an environment, it’s not uncommon for it to decrease storage capacity consumption by 70% or more. It allows higher utilisation of existing storage assets, reducing not only hardware infrastructure costs but also energy and floor space costs.

However, thin provisioning must be carefully watched when it is deployed in virtual environments. It can be difficult to stay on top of the capacity planning requirements to ensure that you do not unexpectedly run out of storage capacity. Running out of capacity shuts down VMs, so thin provisioning must be carefully managed to ensure that this does not occur. The savings, however, are significant so it is always worth it.

On a related topic, pay attention to how storage space reclamation occurs in your virtual infrastructure. When files are deleted, is the storage space that is freed up immediately returned to the storage pool, or does that only occur when the VMs that owned that data are rebooted? Both storage space reclamation and thin provisioning pose additional management challenges when multiple layers of virtualisation exist, as is the case in hypervisor-based environments where an array that uses virtual storage technology itself is in use.

Scalable snapshot technologies

Snapshots have all sorts of uses – working from VM templates, ensuring a safety net during software updates, creating copies for test/dev environments, cloning desktops in VDI environments, etc. – all of which have significant operational value. If you’ve worked with snapshots in the past, you probably already know that snapshots can impose negative performance impacts. In fact, this performance impact can be so bad that administrators consciously limit their use of snapshots in some situations. In others, the value snapshots can provide has helped to drive the purchase of very high end, very expensive storage arrays that overcome snapshot performance issues. In virtual computing environments, hypervisor-based snapshots generally also impose these same types of performance penalties.

Snapshots can also be very valuable when used for disk-based backup. Your customers will expect you to protect their data, and provide fast recovery with minimal data loss. To provide the best service to your customers, data protection operations should be as transparent as possible. The best way to meet these requirements will be to use snapshot backups, working with well-defined APIs like Windows Volume Shadowcopy Services (VSS) to ensure that you can create application-consistent backups for fast, reliable recovery.

The use of disk for backups also allows you to leverage storage capacity optimisation (SCO) technologies like data deduplication to minimise the secondary storage capacity needed for data protection operations. Disk also makes it easier to leverage replication in creating disaster recovery (DR) plans for those customers that need them, and asynchronous replication products that use IP networks and support heterogeneous storage offer cost-effective DR options for virtual environments.

For cloud computing environments, the ability to use high performance, scalable snapshot technology has real operational value. Each cloud provider will need to evaluate how best to meet this need while still staying within budgetary constraints.

Primary storage optimisation

SCO technology is not limited to use with secondary storage, and a number of large storage vendors offer what is called “primary storage optimisation” in their product portfolios today. Similar in concept to deduplication (but not in implementation), these products effectively reduce the amount of primary storage capacity required to store a given amount of information. Because of their high performance requirements, primary data stores posed an additional challenge that did not exist for secondary storage: whatever optimisation work is done must not impact production performance. Describing the different approaches for achieving primary storage optimisation is beyond the scope of this article, but suffice it to say that they can generally reduce the amount of primary storage required for many environments by 70% or more, reducing not only primary storage costs but also secondary storage costs (since less primary storage is being backed up).

Storage cost challenges in cloud computing environments

To meet performance, scalability, and availability requirements, cloud providers often invest in high end, enterprise-class storage to support their virtual infrastructure. Higher per terabyte costs can be understood upfront, but there is another issue here which can hit cloud providers unexpectedly.

Server virtualization is critical to cloud computing, but it poses cost challenges for legacy storage architectures. Because many VMs, each with their own independent workloads, will be placed on each host, the I/O patterns these hosts generate are much more random and much more write-intensive than those generated by physical servers running dedicated applications. This randomness lowers storage performance, driving the purchase of more spindles or exotic storage technologies like solid state disk (SSD) to meet performance requirements. This poses a conundrum for those building cloud infrastructures: how do I create a cost-effective platform when the use of server virtualisation requires more storage, actually driving storage costs up?

Cloud providers often consider the judicious use of SSD to reduce spindle count while maintaining high performance. SSD offers great read performance, quite good sequential write performance, but quite poor random write performance. The challenge in virtual computing environments is in managing random, write-intensive workloads, so SSD by itself is only a partial solution.

Interestingly, if there was a way to turn all those random writes into sequential writes, this could have a significant performance improvement without requiring any other infrastructure changes. Enterprise databases for decades have used a unique logging architecture to do just that. By sending all writes to a persistent log which generates the write acknowledgement back to the database, it takes all the randomness out of the I/O stream. This means that the performance of the environment is determined by the sequential, not random, write performance characteristics of the device hosting the log. These writes are then later asynchronously de-staged to primary storage, an operation which has zero performance impact on the database. This trick increases the IOPS per spindle any given storage technology can sustain, a speedup which varies between 3x and 10x, depending on the storage technology in use.

What makes this particularly relevant for virtual computing environments is that it has been implemented in software at the storage, not the application layer, by several vendors. By implementing it at the storage layer, the performance speedups it produces are available to all applications, not just a given database application. For any given storage configuration, it reduces the number of spindles required to meet a given performance requirement, regardless of the type of storage technology in use – generally by at least 30%. It even speeds up SSD, since it allows SSD to operate at sequential rather than random write speeds.

This capability goes by the name of virtual storage optimisation technology. It can be used in a complementary manner with the other storage technologies mentioned, is transparent to applications, and can be used with any heterogeneous, block-based storage. Much like the way server virtualisation technology allowed organisations to get higher utilisation out of their existing server hardware, virtual storage optimisation technology does the same thing for storage hardware.

For cloud providers, cost is critical

When selling cloud-based services, the set of performance, scalability, and availability requirements are relatively clear, and building the storage infrastructure to meet those needs will likely comprise at least 40% of the overall cost of your virtual infrastructure. But there is a big difference in how each cloud provider chooses to get there, and how each leverages available storage technologies to meet those requirements. The functionality of cloud service offerings for specific markets may be the same across providers that address those markets, but the one who meets those requirements with the most cost-effective virtual infrastructure has a significant leg up against the competition. The storage technologies available in today’s market offer the savvy cloud provider the tools to achieve this advantage.

Eric Burgener is VP of product management at Virsto Software, a leading provider of virtual storage architecture:
www.virsto.com

Related stories: How to 'green' your data centre

Tags: infrastructure as a service | virtualization

Contact us

To Contact the Business Cloud News team please use: