Translate this website:
Search this website:


BC/DRCloud StorageComplianceData CentresDeduplicationDisk/RAID/Tape/SSDsEthernet StorageSAN/NASTiered StorageVirtualization

Stale Data Backup

By Tony Ashley, SNIA Europe member, HDS.

 

Date: 7 May 2012

Compulsive hoarding (or pathological collecting) is a pattern of behavior that is characterized by the excessive acquisition and inability or unwillingness to discard large quantities of objects that would seemingly qualify as useless or without value.

When it comes to data, it would appear that we are all hoarders.

We often read tabloid articles reporting on the likes of Mrs. Smith, who at the tender age of 75 hadn’t thrown anything away for the past 20 years. The “useful items” that she’d been storing (which she insists will be put to good use at a later date), now entirely fills her home, meaning every day tasks take umpteen times longer and important items are always almost impossible to find. Ultimately the local council or similar public body gets involved and tries to return some measure of normality by clearing and disposing of the worthless clutter.

I often wonder, that if we could physically see data, in similar terms to the newspapers and empty tins that Mrs. Smith stores, whether there would be a difference in how people store and actively make use of their hoarded data. Whether people would realize that storing and systematically protecting a file, created in 2008, that would likely never be read or reviewed again, is a wise thing to do.

Speaking to various organizations, one consistent pain point is evident – the
backup and recovery of this accumulated data. Whether it be an ever reducing backup window, or the shear scale of the amount of content that requires protection, one thing is clear; we can’t continue down this well trodden path. We won’t likely wake up tomorrow and decide to thin or weed out content from our file shares, we won’t decide to delete all emails older than 2009. We’re all hoarders.

There have been various advances in backup technologies that are aimed at reducing backup & recovery pain points - centralizing the storage and backup infrastructure, faster and larger capacity tape devices and tape emulating disk solutions (virtualized tape libraries or disk to disk to tape solutions). Backup and recovery software vendors also try to attack this problem in various ways. Incremental forever solutions that can potentially solve the problem of a reducing backup window (it only backs up content that has changed) but can hugely complicate the recovery process. The traditional master and media server solutions may resolve the issue of backing up huge amounts of content, though they can prove difficult to manage and costly in terms of licensing and support. Ultimately we’re not addressing the root cause of the problem, which is the explosion in accumulated data. We don’t delete data, so the content will always require storage and protection. We’re constantly playing catch up, deploying ever increasing (and costly) backup and recovery solutions to solve a problem that will likely never go away.

Looking further up the stack – moving away from backup and recovery, to the storage layer, there are technologies that can help. Snapshot and cloning solutions provide the ability to quickly and easily create copies of data at points in time, allowing users to recover their own data. On the flip side, snapshots are often stored on the same physical spindles as the live data, ultimately meaning a 3rd copy (ideally on tape) should be sought. Data De-duplication and compression technologies that help to thin out the data being stored, meaning backup windows can be shrunk, though these efforts are usual undone when the data is backed up. More interestingly a relatively old methodology is once again being used, a technology offering intelligent data tiering capabilities that help to apply a value to data being stored, thus enabling the Administrator to determine whether content actually requires such vigorous protection methods.

Did you know that in a “typical organization” roughly 75% of the files that are stored on “Tier 1” storage repositories (e.g. costly Fiber Channel or Serial Attached SCSI drives) hasn’t actually been read, reviewed or used in 6 months? Of that 75% of data, a massive 65% hasn’t even been accessed in over a year. Why then, is this content treated the same as business critical data. Content, which is never read and provides little or no value to the business. In fact, storing the old content and religiously backing it up actually equates to increasing operating expenditure. Conversely, data, which is new or is consistently accessed, has a high value to the business – content which is mission critical and if lost, would result in reduced revenue or cost many man hours to recover or recreate. And yet today, most corporates feverously backup all the data as one, with no regard for the true value of the data.

To start to solve the problem of backup and recovery we must first understand one important point. Not all data is equal. Some data holds more value than other data.

Hierarchical Storage Management (HSM) is nearly as old as the problem of backup and recovery itself. Originally designed for use in Mainframe environments, the technology allowed for the migration of old or stale content onto tape, thus freeing up the costly hard disks to store newer and more valuable data. As the capacity of hard disks increased over time, the purchase cost were driven further down, hard disk were then seen as commodity items, something that could be discarded when newer, higher capacity drives were made available. The ability to store larger amounts of data online (as opposed to offline on tape), spelt the death knell for HSM. But alas, the trend for ever increasing capacity and ever reducing costs will unlikely last forever, and once again the industry is turning to HSM as a means to reduce cost and waste.

Many storage guru’s will fondly remember the concept of Information Lifecycle Management (ILM). Conceived in around 2004, It promised the ability to easily move data between different tiers of storage, automatically with little or no administrator over head, completely dependent on the value of data. The mantra for ILM being “the right data stored on the right place at the right time”. Whilst it was a great idea in principle, no real technology or solutions existed at the time that was able to manage the transparent migration of data between the tiers. Potential customers were worried (for good reason) that the recovery of migrated content would be complex, or that their users would be confused by new icons, stating that their files had been moved or migrated. The industry forgot the benefits of applying a value to data and moved onto the next big thing.

Technology as always, has advanced and has more recently surpassed the capabilities offered by the historic HSM and ILM type solutions. Now providing the ability to stage data onto faster media (Serial ATA hard disk drives), instead of tape devices, in a completely transparent and non-disruptive fashion. Whereby historically, users would notice a distinct speed and/or user experience difference in accessing content that had been archived (to tape for example with HSM), the latest data tiering technologies means that users are unaware that their data is now being stored of cheaper and more appropriate storage. The migration is virtualized, and thus transparent or invisible from users and applications alike.

The newer Data Migrator technologies use stubs and pointers in much the same way as the older HSM solutions worked, allowing the storage administrators to migrate specific content types, dependent on policies, to several different locations. Be it a cheaper, more cost effective and higher capacity Serial ATA drives, older and less utilized centralized storage devices (NAS) or even object stores. The migrated data, can (in most instances) still benefit from local Disaster Recovery mechanisms such as snapshots and cloning as well.

Now the stale and less valuable data has been migrated, what’s next? Well here’s the cookie; the migrated data, deemed stale or old by the migration policy, can be removed from the backup lifecycle. Instead of a full backup consisting of 20 TB’s of data (and remembering that typically 75% of the content stored, will not have been accessed in 6 months), the full backup will be reduced to 5 TB’s, leading to huge savings in time and money, spent not only backup, but also on the continual accumulation of ever larger disk storage capacities for Tier 1 (FC and SAS). Agreed, the migrated content would still need to be backed up separately, though given its apparent value (it hadn’t been accessed in 6 months) a much less regular protection level could be applied. Alternatively, if the migration tier resided on an Object Store solution, you could even argue that given the self -healing, error correction and versioning capabilities of the Object Store, the data would never need to be backed up again.

So, maybe I don’t have to start thinning out my file stores quite yet, the data is valuable after all? My expense report from 2008 may just become useful again. I’ll be able to search and index all my hotel bookings and that’ll help present a case for a better corporate rate. Even better, it’ll help me keep track of all my loyalty points.

I’ll admit it. I’m a hoarder, but it’s only because I can’t throw away data. It could be useful one day.

ShareThis

« Previous article

Next article »

Tags: BC/DR, Compliance, Deduplication, Disk/RAID/Tape/SSDs, Ethernet Storage, SAN/NAS, Tiered Storage

Related White Papers

23 Nov 2011 | White Papers

Automated Storage Tiering on Infortrend’s ESVA Solution by Infortrend

This white paper introduces automated storage tiering on Infortrend’s ESVA storage solutions. Automated storage tiering can generate significant advant... Download white paper

15 Jul 2010 | White Papers

Is Your Data Safe & Sound? by SecurStore

Ease of recoverability, secure protection and strict compliance policies are all key aspects when backing up data online. Download white paper

Read more White Papers»

Related News

23 May 2013 | BC/DR

23 May 2013 | BC/DR

22 May 2013 | BC/DR

21 May 2013 | BC/DR

Read more News »
Related SNS UK TV & Audio

24 Nov 2011 | BC/DR

IBM Centennial Film: 100 X 100 - A century of achievements that have changed the world

The film features one hundred people, who each present the IBM achievement recorded in the year they were born. The film chronology flows from the oldest person to the youngest, offering a whirlwind history of the company and culminating wi...

14 Oct 2011 | Deduplication

Introducing Quantum's DXi Accent: Maximizing Deduplication Efficiency [Part 2]

Get to know Quantum's DXi Accent software in Part 2 of our video blog introduction by Dan Duperron.

3 Oct 2011 | BC/DR

StoreOnce Backup Systems whiteboard overview

HP StoreOnce Backup systems make it much easier for administrators to deal with the exploding amount of data that they have to manage.

More SNS UK TV»

More Audio»

Related Web Exclusives

6 May 2013 | BC/DR

22 Apr 2013 | BC/DR

15 Apr 2013 | BC/DR

8 Apr 2013 | BC/DR

Read more Web Exclusives»

Related Magazine Articles

| BC/DR

May/June 2010 | Deduplication

March 2010 | Tiered Storage

October 2009 | BC/DR

  • Banking on business continuity

    Premier Asset Management needed to implement a complete business continuity strategy that would meet the company's recovery objectives around its critical busin... Read more

Read more Magazine Articles»

Related Supplements

1 Jun 2009 | Data Centres

Sharpen Your Business

It might be stretching the point to compare the present state of the IT industry with either Charles Dickens? revolutionary-era France, or the Renaissance, but there?s no doubting that the current global economic turmoil is a great opportunity for UK businesses to innovate. For far too long now, many have been content to simply throw more disks at their storage problem; continued to invest in expensive solutions, with after-sales contracts to match, because ?they always have?; and employed muddled thinking when it comes to CAPEX- and OPEX-related decisions.

Click here to learn more »

1 Oct 2008 | Virtualization

Discovering Business Continuity in a Virtualized Environment

At first, organisations saw VMware server virtualization mainly as a way to save money on their hardware and power budgets. Now though, innovative users have realised that virtualization can make vital contributions in many other ways as well - in particular, they are using it to improve application availability and enhance their disaster recovery capabilities.

Click here to learn more »

Read more Supplements »

Recruitment

Latest IT jobs from leading companies.

 

Click here for full listings»