Recently, one of our developers came to us with a failed “server”, and requested the IT department assist in recovering data from it. Powering the box on revealed the click of death. Ouch. Let’s give this a shot.
Outage report
Just a word of warning. This might be a long read, but I had to share the entire story, from the initial “outage”, to the recovery, and finally working with an outside recovery company.
This exercise has fail written all over it, but I am willing to give it a shot. My first attempt would be to simply restore the necessary data from backup. Unfortunately, this particular machine was built long before I came on board, and not by IT. The machine was set up outside the datacenter, and never made it into the backups. Nobody even remembered who build the machine. Luckily, we found that the drives were configured in a RAID1 mirror. Score! We can just restore from the surviving mirror set member. My boss (who was working this ticket) was able to successfully recover a vast quantity of data from the machine, and restore it to a virtual machine inside the datacenter. Crisis averted, right? Wait, there’s more!
Bucket of WTF
I have to make a note here: we found a really weird configuration going on inside this RAID. You’d think “Hmmm, RAID1. It ain’t that difficult to grasp.” But dig this. We all know how a RAID1 works. There are two identical drives that are set up with an upstream controller that sends identical writes to both devices. This way you get two identical copies of the data, hence the “mirror” moniker. The resultant volume of space can be used like any other disk. It can be partitioned, defragged, etc. I am not sure why you would partition a mirror, but ok. You do what you want.
So the rocket scientist that built this box didn’t partition the drives. Nope, he went one better. He built two Virtual Hard Disk (VHD) files on the mirrored volume, then mounted those VHDs as drives. A volume inside a volume. Let that sink in.
The only reason that I can figure that he did this was maybe he was at one point using VSS to snap off copies of the VHD files, storing them in some mystery location. Nobody will ever know what was going on there. Maybe he was a real big fan of the Inception movie a few years back.
In any event, as I mentioned previously, the recovered data was placed into a shiny new virtual machine, and placed into backup rotation. The problem is, te data that was recovered was six months out of date.
Management wants more
We were able to recover the contents of the drives with no problems. However, management wants those missing six months back. Basically, we told them that the drives would have to be sent out to a file recovery service in order to recover the missing data, since the data was not currently visible on the drive, and we couldn’t see any VSS snapshots available on the drive. Much discussion was had, in which it was explained multiple times to multiple people that it would most likely be fruitless endeavour, and most likely would not resurrect the missing data. Further, the items that were missing were already nearly rebuilt by the dev folks. Apparently there wasn’t much change on that machine over the years.
Data Recovery “Specialists”
I ended up retaining the services of fileretrieval.com, a data recovery company based in Houston, TX, with offices around the country including one in Seattle, about half a block from my office. Well, I call them locations, but they appear to be simply shared spaces for customers to drop off media. I think that very little actual recovery, outside maybe initial diagnosis, is performed at each location. I could be wrong on this, but at least one other person that have used them has noticed that their media was shipped off to another country without notice or regard to data sensitivity to be processed. The company appeared to be highly rated otherwise, so I thought I would give them a try.
Opening the case was pretty straightforward, and I was able to drop off the media on 11/11. Note that they wanted both drives, event though one drive of the mirror was physically borked (click of death). No problem, we paid a few extra dollars to get an express diagnosis, and get recovery options.
I have to say here, this company is very hard sell. Before I had even signed off the documents authorizing them to work on the drives, before they had even cracked open the box that I had dropped off, I received 2 emails and a phone call wanting me to upgrade my diagnosis and recovery options. It seems like that’s all I heard out of these guys during this whole process; buy buy buy.
The Diagnosis
Eventually, on 11/14, I get an email from them:
Our RAID 1 data recovery specialist technicians have concluded the diagnostic process, including a preliminary battery of tests and inspections to determine the functionality of your RAID, and can confirm that physical damage present on one of the drives (failed read/write head and bad sectors) has resulted in array’s failure to correctly record and access data critical to its operation.
Due to the critical failure in operation as a result the failed read/write heads, the drives must be serviced by our certified RAID specialist technicians in our cleanroom laboratory and repaired using exact match donor parts where and if necessary (to repair such damage as evident; failed read/write heads and subsequent bad sectors interfering with the normal detection/operation of the array).
Once the read/write head physical issue has been addressed and the array has been restored to operating status long enough for the volume of data present to be scanned, it will be rebuilt in a virtual environment. Since damage to the components has resulted in the detection of data corruption (evidenced by the presence of bad sectors on each drive) due to intermittent failure during operation, our technicians will apply proprietary extraction/recovery software to rebuild/restore lost data and to repair corrupted data to complete working integrity.
I have a problem with this diagnosis. First, I told them one of the drives was borked. The second drive was fine. Since this was a mirror, the data can be pulled off the second drive. That’s why we use mirrors. Why do we need to have the drive sent off to the clean room? Something is starting to smell funny, and my bullshit detector was starting to indicate a problem. If their “RAID specialist” couldn’t figure out what a RAID1 was about, why would I leave my media in their hands?
Along with the diagnosis, received another sales pitch for expedited recovery. Cost? They wanted a total of $3975, with $899 due up front in order to begin work. I knew the managers wouldn’t go for this, but I asked them if they wanted to go forward with the recovery, with no guarantee that any data would be recovered.
The managers agreed that the “lost” data was not worth the expense, and requested that I close the case. This is where the really hard sell comes into play. I not only got the press from Phil, but after he agreed to close the case, his manager, Ronald, gave me a call (and email) as well, with more price quotes and sales pitches. I finally got them to close the case, and agree to release my media back to me for proper destruction.
I finally received a message letting me know that my media would be ready for pickup in several days. I assume this was done in order to allow time for shipping my media back from wherever they had shipped it off to.
We have acknowledged your request to pick up your media from our office. Please allow several days for your media to be safely released from our laboratory. Once your media has undergone the security release procedure it can be shipped or picked-up. Please note that no media may be released to anyone until you have received a release confirmation.
So I wait. And wait some more. I don’t like waiting.
Since I had decline to move forward with this recovery, I apparently didn’t rate the time of day from them any more. After 5 days, I had left a voicemail and dropped an email for Phil, requesting status. To this inquiry, I received crickets.
On 12/4/2014, I dropped the same communications again, again met with silence. Again on 12/8/2014, I left voicemail for Phil, requesting status. No response.
Finally, on 12/12/2014, I left voice mail again, as well as email. This time, however, I expanded my email audience. Also, I mentioned twice in the email that I would be turning the matter of the media over to my legal team if I didn’t hear back from them by the end of the business day. Apparently, the threat of having to deal with a law-talking dude puts a little fear into these guys. Two hours after mentioning the legal team, I got a message from Ronald saying that my drives could be picked up anytime. Great! Finally, can get my media back.
Security? What’s that?
On the afternoon of 12/12/2014, just a few short minutes after getting notification that I could pick up my media at any time, I arrived on-site at the fileretrieval.com drop site. Nobody was in the lobby, with the exception of two employees behind the counter. One was the receptionist that I had met before, when dropping off the drives. It was readily apparent that the other one was a new employee. I know this because I stood there waiting for someone to acknowledge my existence, listening to their conversation. It was apparently more important that this person ask the same question three times regarding the outfitting of his new cubicle.
When the receptionist finally deigned to address me, I had been waiting for ten minutes, with all the necessary paperwork on the counter, while these jokers had a conversation about pens and paper. I am pretty sure that if a customer comes into your area, you at least want to acknowledge they are there, and that you will be with them momentarily.
He finally got around to asking for my case number, which I gave. He returned from the back with an opened box, containing my two drives. One was wrapped in a static bag, as I had dropped it off, while the other one was simply laying in the bubble wrap. Nice.
He set the box containing the drives on the counter, and asked if there was anything else he could do for me. I asked if he needed to see ID or paperwork, or needed me to sign off on receipt of the drive. He said no, and that I could go, if there was nothing else I needed.
Is it me, or is this a security breach? I find it interesting that I could walk into this joint, spout off a case number, and presented with a hard drive or some other media. If I was the nefarious sort, knowing that a potential target had a drive out for recovery, all I would have to do is dumpster dive or access the targets email to find out what the case number is. Then it would be a simple matter of finding out when the media is going to be on the premises. I waltz in before the target gets there, sing “my” case number, and dance the watusi right out the door, data in hand.
Further Investigation
With the BS detector now well into the yellow zone, and approaching the red border, I decided to do a little forensics of my own.
I knew one of the drives was totally gone, being a victim of the dreaded click of death. I grabbed the first one, which was not in a static bag anyway, and slid it into the USB/SATA drive adapter. Upon powering the drive on, I heard the initial startup, and then the tell-tale click … click … click of the dead drive. After powering off, I removed the drive and labelled it as such, and set it aside. I now knew which one was the right one.
After plugging in the second drive, I was greeted with a second with the “New Hardware Found” sound from Windows. Hmmm, sounds promising. I open Windows Explorer, and sure enough, theres the drive. right where it’s supposed to be. The only thing is, I can’t access it. Double-clicking the drive letter gets me to a nice little “Access “Denied” error. This tells me that someone actually accessed this drive, and reset the permissions. So I set forth to reset those permissions:
After resetting the permissions was complete (it took awhile), I was able to browse the file system straight away:
Summary
I was able to cruise around the file system with no issues. Booting from my Linux-based recovery disk, I was also able to examine the disk directly, and easily view and recover deleted files. The missing six months of data was not on the disk, though. Surprise.
If you’ve read this far along, you might as well read a bit further. You may be asking yourself, “Why did he turn over the drives for recovery if he knew the data wasn’t there?”. I am playing the drone role here. I spoke for days, until I was blue in the face, stating that the data was most likely gone. Management wanted to go forth. Basically, I got tired of trying to prevent the waste of money.
What spurred me to write this long, long post, is the overall impression I came away with of fileretrieval.com.
- I was constantly pressured to upgrade services, even before diagnostics were done. Pricing information did not match between the website, hardcopy, and what the agent told me. Even what the agent told me changed once.
- Either they need a new RAID specialist, or they need re-examine their business practices. Telling a customer that data in a RAID1 volume is not recoverable without a clean room visit, because one of the drives is pooched is not right. That’s what RAID1 is! If one drive goes bad, the other drive is still fully accessible!
- Why do I have to threaten legal action to get my media back?
There are two things that I hope to get across in this story. The first is to NOT let folks outside your department bully you. Let your boss fight that battle. Especially in software companies, developers and the like like to set up random machines under their desks, and make that particular machine a kluge of huge importance. Suddenly, the whole company might depend on that one box, and IT has no knowledge of it, much less back it up or make it redundant. If they insist on building a machine in their desk drawer, then let them know it’s not supported by IT, and they are responsible for the hardware and software therein.
If you have been snoozing for the last 10 years, much hubbub has been running around the IT community about this situation. It’s called Shadow IT. Shadow IT in itself is not necessarily a bad thing, and can be a learning experience for IT and the customers of IT. But the above described situation is just plain bad juju.
Second, due your due diligence when dealing with outside vendors. I searched around the interwebs for information on this company, and didn’t find much except for glowing reviews. There was one guy that had a problem, and he posted everywhere. A second look later on (too late, actually) showed that most of the customers appeared to be consumer or home users, usually the “Dear old Aunt Claudia accidentally deleted my dogs pictures” type of customer. Not many were enterprise customers. In a nutshell, dig deep into the company history. Contact the past clients, if you can. The information you find might save you heartburn later.