Server Recovery
I recently completed a small change on the home server and followed it by a reboot. It never came back up…
I was running out of space on the OS partition. It has been my practice to build volumes that do not fully utilize available physical space. In this way, should I run low on available space I have the opportunity to expand the volume to buy some time before a purchase is necessary. In this case I expanded the volume while maintaining 5GB unallocated on this drive. On reboot the server hung prior to post with a flashing cursor but no splash screen. I had taken a shortcut when assembling this box and did not connect the speaker so I could not hear the POST codes. This server has been running for 4 years without any hardware issues so a failure at this point was unfortunate - but unsurprising.
Isolating the problem was a matter of minimizing the variables. I disconnected all external USB, sound and network cables (leaving KVM attached) with no success. I disconnected all of the drives (4 HDD, 1 SSD, 1 DVD) and got the BIOS splash screen. At this point I suspected either the power supply or one of the drives. I didn’t have a spare power supply so I focused on narrowing down the drives. I mirror my data drives in a RAID 1 so I first attempted with just one of each mirror set and the SSD with the OS volume. Still no success. Removing the data drives with just the OS SSD attached. Still no success. Double check that it boots with the SSD disconnected, but doesn’t with only the SSD connected. Yep, I have a bricked OCZ Vertex 2 on my hands.
I had a spare 1.5TB HDD on hand and so put that back in the system. Fired up the Server 2008 x64 boot disk and launched a repair session. The complete PC recovery wizard found my most recent back right away but would not proceed with a system restore complaining 'The data is invalid 0x8007000d'. Assuming this generic message meant something beyond the obvious I hit the 'net. The first hint was a reminder that the disk target for restore needed to be bigger than the disk that was the source of the backup. In my case I the backup source was across 2 volumes on 2 physical disks: Drive C with the OS at 105GB and Drive D with the pagefile and applications with 900GB. I wasn't certain but adding a second physical disk wouldn't hurt. It still didn't work but did give a very slightly more descriptive error 'There is not enough disk space on the the system to perform the restore operation 0x80042408'. At this point it appeared the wizard wasn't going to give me the results I needed.
I followed the set of instructions provided here (modified for my environment - see the links at the bottom to explore the command syntax fully before applying these instructions to your scenario):
http://social.technet.microsoft.com/Forums/en/winserverfiles/thread/f15bfe2f-e265-479a-afa3-f055530c97f5
This stepped through the manual creation of first volume on the first disk via diskpart and the subsequent restoration of hit volume's data from the backup via wbadmin. As pointed out in a further post on the same thread the Server 2008 x64 boot CD does not have a Startup-Repair option so I needed to use the bootrec and bootsect commands to make the system bootable again.
At this point the server was fully recovered. Easy in the end but deciphering what the error messages presented by the GUI was not trivial.
Reference:
WBadmin syntax
DiskPart syntax
Bootrec syntax
Bootsect syntax