ROT Data Redefined: Risky is the New Redundant.
Why the term ROT should be redefined as “Risky, Obsolete & Trivial”
An opinion piece by Brendan Sullivan, CEO
—
Within the realm of Information Governance, the term ROT data is generally accepted as an acronym for “Redundant Obsolete & Trivial” data. Records managers would seek to remediate such ROT data, presumably as a cost saving in storage or the management of such data.
Redundant is the operative word in this acronym because there are duplicate versions of the same data. Horrifying, right? Well, not really.
A lot has changed over the past 15 years in the data storage world, so much so that we’d argue the word redundant doesn’t justify any action. Back in the day, backups might have been daily, weekly, or monthly. Full backups would quickly create hundreds of instances of the same data in a matter of just a few years. Differential, incremental, or synthetic full backups dropped the redundancy significantly, followed by the use of single instance “on-the-fly” de-duplication applications and appliances for email archives.
Further down the road came big data storage. I don’t mean the cloud; I mean tape – which I strongly suspect many of the big clouds back up to. The latest LTO-9 stores 30 TB and costs roughly $100 meaning that for a dollar you are storing 300 GB! Does anybody care about redundant data at these cost levels for storage?
Then there’s the cloud of course, great for archiving no doubt, but backup or disaster recovery, not so much. Would anybody seriously entertain the project of eliminating redundant data on a mass scale with bandwidth limiting throughput and not so lovely egress charges that would be higher than the value of the redundant data itself, I think not.
And now the final nail in the coffin for redundant data….AI. I wondered a long time ago while attending ARMA, (“Association of Records Management and Administrators”) and RIM, (“Records Information Management”) meetings whether we were doing it the wrong way around with records management. by trying to organize data in such a way that you can understand and find it at some stage in the future. Index the lot and you find what you are looking for. I get the cost of power, cooling, GPU’s, search engines etc. but I don’t think we are catching and organizing this water halfway down the waterfall either. So, to me, the word “Redundant” in ROT is already both trivial and obsolete, and it needs to be changed – because deletion is still a good thing. The world’s data centers need to be constantly clearing the attic out, but the term ROT data kind of fits. What about replacing “Redundant” with “Risky.” It’s risky because “legal” doesn’t want it because it’s a growing data danger concern. Yes, that fits. We suggest that ROT be redefined to mean “Risky, Obsolete & Trivial” and therefore still worth defensibly deleting. Come to think about it, we might not need to worry about the O or T either! (But we are absolutely worrying about the R!)
How does your organization currently assess the ‘risk’ in its ROT data, and are you confident that your approach aligns with evolving legal, privacy, and security challenges?