Saving an ESXi Datastore from a Foreign Config
My 3rd day of class started out with the unpleasant realization that my server was offline as of 4:30 this morning. I got the notice from my Uptime Robot who monitors the web server. I assumed that it was either a problem with my nginx frontend not starting automatically after a VM crash or a loss of internet to the house. Since I didn’t realize the problem until I was in class there wasn’t much I could do. My VPN has been a work-in-progress, and I now realized that the work on that particular project needs to get finished soon.
I couldn’t get into my Unifi controller to manage my firewall so I assumed that the internet was down. I called and found out from my roommate that the internet was working. My suspicion turned to the server itself. When I got home 12 hours later I confirmed that this was the case.
After hacking my way into my DRAC with a completely unnecessary amount of security downgrading, I found this welcoming message on my BIOS screen:
Fortunately it was not one of my NAS drives, just the drive that holds my virtual machine installs. Great.
So I pulled the drive out, blew on it like the 90’s kid I am, and shoved it back in. Now it was showing up as a foreign drive in my PERC. With no small amount of bravery, I cleared the old config in the PERC for the RAID 0 drive and created a new virtual disk for it without reinitializing.
But I still didn’t know if I’d get my VM’s off of the drive yet.
The BIOS was now happy and showed my normal virtual disk configurations from the PERC. I booted into ESXi and wasn’t really surprised to see that none of my VM’s were found.
Now I’m by no means trained in any way on ESXi, I only know what I’ve figured out so far. So I SSH’ed in and listed the volumes:
After listing the contents I determined that the data on my VM disk did survive in some degree. Yay!
However nothing showed up in the web console.
So after searching around I found some useful commands that helped me get a grip on the situation:
esxcfg-volume -l
It says the drive can mount and re signature. Awesome.
What happens in ESXi when a drive is rediscovered, like a snapshot or a copy of a disk, it assigns it a new UUID, which is a ‘signature’ for the drive. The VM’s in my web console are mapped to that signature. When you connect a backup snapshot LUN or copy of the disk, you oftentimes have to generate a new signature.
This was not the case for me. It appeared that the old UUID was working just fine, I just needed to give ESXi the go-ahead to fire it up. Finding the command to do that was a pain.
So if I had ESXI 5.5 this stuff would work, but I don’t. After searching and searching Finally found the command:
esxcfg-volume -M
(The -M tells it to mount this volume on boot every time.)
And it worked!
I was able to load up all of my VM’s and boot them normally without having to re-connect the datastore or remap my VM’s in the browser.
Although I didn’t know what I was doing for most of the time, I learned a lot about how ESXi manages LUNs and extents on the LUNS. I’m confident that if this same issue had happened on a RAID5 array that I’d be able to restore it (although I’m not looking forward to doing that anytime soon!)
Now I’m off to the long and arguous process of replacing that drive. What fun! It’ll probably end up here as another adventure!
If you have any questions or suggestions let me know and I’ll do my best to help out!
Regards,
Cronocide