Virtualisation is an accepted part of server and data centre strategy. One of the biggest challenges is optimising the storage infrastructure to keep up with the changing workloads and storage demands. What are some of the key considerations to think about when investing in storage for the virtual data centre?
Optimising storage assets for virtual workloads
Storage flexibility is often cited as one of the biggest challenges with a large-scale migration to server virtualisation. Moving to a dynamic storage architecture that can move and cache data may help to deliver performance where it is needed while keeping the costs in check.
Cacheing has long been used as a means to improving performance in computing. Anytime there is a step change in price, performance or features there are choices and trade-offs to be made that can dramatically affect the final implementation. One of the most common systems of cacheing, and one that works very effectively, is that of the CPU and memory system of modern PCs and servers. The CPU is where the work happens, and needs fast access to data wherever possible.
Early CPUs ran at about the same speed as memory so loads did not take many cycles. CPU speeds soon increased and now far outpace the speed of memory. The result is that today there are large delays when reading from and writing to memory.
If not addressed, processing would be held up for long periods and performance would suffer. For this reason, CPUs have cache memory built in that can hold recently used data so that it is almost instantly available.
At first, CPUs incorporated a single level of cache that was small as die space was at a premium. As memory sizes grew and applications used more resources, more cache memory was added to the CPU. But simply adding more was not the ideal solution, as access times grew with increasing cache size, while constraints such as cost and die size limits mean that cache memory remains a very limited resource. So today we have first, second and even third levels of cache as the different levels work together to get the best trade off of size, performance and cost.
Different CPUs, even from the same generation and vendor, have quite different cache setups depending on the expected usage model and price point. The trick to getting the performance is to balance the way the data is put into the cache and then replaced based on the usage history. Advanced techniques can also help the cache to perform well – for example, pre-fetching data can bring data into the cache and close to the CPU before it is necessary so that it is readily available when needed.
That’s great, but how and why is it relevant to the future of storage? Storage is an integral and expensive part of the computing infrastructure. It is also composed of a range of vastly different technologies, price points, features and performance, making it an ideal candidate for cacheing.
Another major issue is that as demand increases for yet more storage, applications are demanding high performance to satisfy the always-on generation of users. Many IT managers find that the storage infrastructure struggles to keep up with the demands of new technologies such as server virtualisation, as we can see below.
Mass storage is cheap, at least to acquire if not to manage, and plentiful, but performance and reliability leave much to be desired for high-end performance. Enterprise drives give good performance and reliability, but capacities are smaller and the prices higher. At the top end, solid state drives give excellent performance but the price points mean that they are generally unsuited to all but the most demanding applications.
It is possible to architect the storage system into various tiers, and allocate storage for applications on the appropriate tier to provide the performance and reliability needed, but it is a pretty blunt approach. Usage patterns may change by month, week, day or even hour. Virtualised workloads may ramp or down unpredictably, which may leave the storage struggling to adapt. Trying to cope with this by optimising manually may result in over-provisioning certain tiers of the storage system, rather making best use of the overall capabilities of the tier.
So this is where the CPU analogy comes into play. Caching can help to dynamically optimise the storage tiers by not requiring which tier the data resides in to be fixed and static. Instead, a common storage controller acts as the front end to all the storage tiers, and is able to determine in real time which data is used by various applications, and what operations are performed on it. It is able to move data between tiers, ensuring that data that is most frequently used or modified is placed in the higher performing tiers automatically, increasing utilisation of the most expensive and high performance storage tiers and lifting performance across the board. Taking another leaf from the learnings of CPU caching, prefetching data based on policy before it is needed can also move data into the top performing tiers in advance of it being accessed so that it is immediately available with high performance when needed, but without it taking up valuable space in the times when it is not critical. Such an example could be payroll processing, which is an activity that takes place in a short window every month that has a high business impact and risk when running, but that is generally inactive the rest of the month. Of course, this should not impact on the ability to also decide which data should always reside in a particular tier. Data can be pinned in place so that vital business applications can get the performance guarantees that they need, while the least used or least “important” data can be prevented from polluting the cache in the higher level tiers.
The value of investing in a caching architecture is best realised if it can encompass the entire storage stack, moving to a virtualised model so that data is not tied to physical storage and where management tools and automation take centre stage. This may take some time to come to full fruition as there is a large amount of installed storage that will be in place and in active use for many years. Depending on the cacheing solution, it may not be suitable to utilise this existing storage kit. Ripping it all out and replacing it is unlikely to be an option, so it will make sense to phase any cacheing implementation in gradually as new investments are made at the highest tiers of the storage hierarchy and to incorporate lower tiers as they are modernised.