Bitten by SSD Bit Rot

Back in 2012, I wrote a post titled Looking at External Disk Performance using USB 3.0 and eSATA with SSD, where I tested a number of external drive caddy’s with SSD’s that I had replaced and just had sitting around. Ultimately I started to use those SSD’s for storing information that I didn’t really need to have on my laptops and when they were full, they ended up in my desk drawer, where they have sat, unplugged and “safe” for the last 7-8 years. Or so I thought. With cheer competitions season in full swing, one of the things I love to do is shoot photos of my kids and their teammates competing, and storing RAW files that are 25-35MB per photo when you shoot 1000+ photos in a weekend across four different teams starts to take up a lot of space, so I figured I would pull out the old SSD’s and see what was on them that was worth keeping, delete what wasn’t and I could then move last years RAW files over to them and archive them for safe keeping. WRONG!!!  Of the four SSD’s I had stored data on, 100% of them had data loss due to a phenomenon known as bit rot. One of them wouldn’t even show up in Disk Manager in Windows and had to be low level formatted and reset using diskpart’s clean command due to partition table corruption.

TLDR;

You can’t write data to an SSD, disconnect it, store it in a “safe” location for years on end and expect the data to remain accessible.  SSD’s require power and routine usage to allow the onboard controller to check cells and perform reprogramming to reduce the likelihood of bit rot. Some relevant links on this:

What is bit rot?

Quite simply put, bit rot is when the bits stored on a storage medium degrades over time to the point where they are no longer readable/reliable. In traditional hard disks, floppy disks, and even magnetic tapes used for backups, the bits lose their magnetic orientation.  In CD-R and DVD-R disks the storage medium is UV sensitive and prone to decay over time.  For solid state media, the NAND cells store data using electrical charges that can slowly leak electrons away over time.  Periodic refreshing/rewriting of the data is a method of being able to prevent bit rot on all forms of media, but suffice it to say that nothing lasts forever. 

What affects bit rot?

While doing some research on this issue I learned there are actually a huge number of variables that affect the rate of decay, with the most critical one being the storage medium being used first and foremost.  After that, the next most important variable is the storage conditions of the medium, with dark, cool, low humidity storage being the ideal condition for storing disks that hold backup data.  However, I live in Florida, which is anything but cool and low humidity, and to make matters worse, I moved for years ago and my desk, and my external SSD disks, spent a little over a month in a storage unit while I was between selling my house and closing on the purchase of my new place.  Not a good place to store them as the storage unit wasn’t climate controlled and I wasn’t really thinking about this as a consideration with everything else I was juggling at the time. 

Does this really matter?

As a data professional, I was actually surprised that I hadn’t encountered this previously, and found the readings online both fascinating, confusing, and contradictory.  There are plenty of sources that say it is a huge consideration, and then there are plenty of other sources that downplay the effects/likelihood of it occurring.  I recognize that my situation is actually an extreme exception to what most people would do and is not typical/normal usage of the devices.  I also had other backups of practically everything that was on the SSD’s on other media, DVD’s copies of photos (which I am now in the process of testing/rewriting), photo backups on Facebook and Instagram in Albums, along with copies on Amazon Glacier and Dropbox.  I am that paranoid about losing memories of my kids. Not only that, but I had jpeg files that had been rendered by Lightroom after touchup edits to the original RAW files that remained on my local machine because they are significantly smaller in size at about 3-5MB per image, so while it is a lossy format, I can still go back and get the images. 

Detecting bit rot

So how do you go about checking for and detecting bit rot in SSD devices?  One of the ways that I found mentioned multiple times was using a tool called HD Tune which can perform an Error Scan of the device.  Here is the results of a Quick Scan of one of my Samsung 840 EVO drives that is used every day in my M6700.  The quick scan only takes a few seconds even for 1TB of space when nothing is wrong. 

And here is a quick scan of one of the SSD’s that I left disconnected in my desk drawer for the last seven years:

I stopped it at two minutes and forty-seven seconds because it was so incredibly slow.  There is also a feature to do a full scan, which will take a long time even on a good device because it is reading every block of the storage to check for CRC/ECC failures during the read.  For my 1TB drives with no errors it took almost an hour to perform the full test, scanning at around 280MB/sec.  For a bad 250GB drive I gave up at six minutes.

This is the same exact device as the bad quick scan above and shows that you really need the full scan to verify that there isn’t bit rot happening.  The scanning speed zeros out when it encounters an error and has to retry the operation in an attempt to correct the problem. 

Does this mean the device is bad?

No, it doesn’t mean the actual SSD is bad.  In fact I have been able to use diskpart and the clean command to clear the partition table and then initialize the device with a new partition table and then format it and the HD Tune full error scan returns no damaged cells. 

The SMART controller also doesn’t report any issues with write failures of the cells, and the devices all have plenty of life left in them.  This unfortunately is just a case of bit rot due to poor storage conditions and not having the device powered up for an incredibly long period of time.

Lessons Learned

First and foremost, it is always important to have a backup of your data stored in a safe, cool, dark, low humidity place. Years ago  I switched from rotating HDD external drives to SSD’s because, well there’s no moving parts, so I wasn’t going to lose data because of a mechanical failure like a head crash due to vibration/impact or the drive refusing to spin up because the lubricant on the spindle hardening.  While I use DVD-R’s for backing up things, I have had enough scratched disks to know that those aren’t sufficient alone so I still keep offsite cloud based backups of things that matter to me, even if there is a higher cost associated with this.  What I didn’t know, and have now learned was that it’s important to also keep flash based storage not only safe but powered up so that it doesn’t suffer from bit rot.  What I don’t know still is whether just keeping the device powered up alone is sufficient or if it has to also be connected to the SATA bus as well.  There are lots of differences of opinions on this and contradictions if you read the articles.  For the foreseeable future I plan to continue to rely on multiple copies of everything on different storage mediums, formats and in the cloud as a means for backing up the precious moments with my kids that I will never be able to get back.  If you are a SQL Server professional, hopefully this makes you think about your backup storage and longevity. 

9 thoughts on “Bitten by SSD Bit Rot

  1. If you have a Samsung EVO drive, make sure you’ve installed Samsung Magician which protects your SSD (at a certain level)

    1. The EVO 840 actually got multiple firmware updates to address the performance degredation for reads due to cell charge voltage drops over time.

      https://www.anandtech.com/show/8617/samsung-releases-firmware-update-to-fix-the-ssd-840-evo-read-performance-bug

      https://www.anandtech.com/show/9158/new-samsung-ssd-840-evo-read-performance-fix-coming-later-this-month

      https://www.anandtech.com/show/9196/samsung-releases-second-840-evo-fix

      I keep the firmware updated on all my devices, including the BIOS updates, as a best practice, so I haven’t been affected by that issue, and I would know as those are my main workhorse machines disks.

    2. It would be useful to know the model numbers of the SSDs that failed in this manner. NAND plays a role here, but it still should not have failed in this way after just 6 years when it’s been sitting in a desk. I recently found an old 256MB CF card (old Kodak camera card) that hasn’t seen power in 18 years and everything appeared to be just fine with it. That card was made by Dane-elec and was using SLC NAND.

      If you see this and don’t mind sharing, I’d really like to know. Thanks

      1. On one of the Samsung SSD stickers it has:

        Model: MZ – 5PA2560/0D1

        Is that the number you are specifically looking for? I can go open the other drive enclosures and get the numbers from them if that’s what you wanted.

  2. Kind of makes you wonder:
    If you use Azure Storage/DataLake
    Does MS occasionally scan the drives as a maintenance pass?
    10 Years in the cloud and…. its all blank out there too?

  3. Bit rot is a symptom of bad / buggy firmware. If updating your firmware does not solve bit rot issues, the drive’s firmware is badly supported by the manufacturer and the storage devices should be used for anything important.

    Samsung has had major bit rot issues over the years, and drives supplied to OEMs (even for business products such as Dell) often never received the fixed firmware Samsung eventually released for their retail products and for OEMs who hounded them for it (Lenovo is very good for firmware support).

    Drives with early SMI controllers also had a lot of bit rot issues, some of which were fixed (Eg: Intels using them, some Plextors) but a lot were ignored such as the ADATA SP550. Drives which have not been fixed can maintain usability by monitoring read performance of existing data, and rewriting all of the existing data periodically before the speed degrades to the point of unreadability.

    1. You’ll see in my previous comment reply’s that the Samsung SSD’s did receive the firmware updates to address the performance degradation. It doesn’t do anything if the devices aren’t plugged in at all, or spend a year in storage that is not temperature controlled in FL where it can easily exceed 100 degrees. It’s just a warning and example of what actually happened due to the conditions that were described in the post. YMMV depending on different circumstances.

  4. As a side note, this mainly became an issue from the TLC flash era and beyond. HD Tune’s full drive read benchmark is a good method of visually checking for speed degradation.

Leave a Reply

Your email address will not be published. Required fields are marked *

Other articles

Imagine feeling confident enough to handle whatever your database throws at you.

With training and consulting from SQLskills, you’ll be able to solve big problems, elevate your team’s capacity, and take control of your data career.