Wednesday, July 30, 2008

This set of scripts saved my ass 100% just now. We've had a few issues with VMware servers being disconnected due to network incidents - not just one but the whole farm goes down (they all think they are isolated, and in appropriate fashion shut down their VMs, expecting one of the other HA cluster members to power them up - however all the hosts are isolated, so there isn't a cluster up for the VMs to be powered up on).

Anyway, following the 2nd time that happened, I wrote some scripts, first to log what VMs are running on a host each night, and the second to power those VMs up (I wrote a third which also gracefully shuts down all the VMs on a host, its handy, I posted that one third).

The reason it saved me is because I had just had to hard reboot an completely unresponsive ESX host, and there are a couple servers that weren't powered on for various reasons, and I forgot to write down which ones beforehand. Yay for pre-emptive scripting!

vmstate script:
#! /bin/bash
echo This script exports the VMs that are running to a text file for later startup/shutdown operations
rm -rf /root/vmlist
rm -rf /root/vmonlist
rm -rf /root/vmofflist
touch /root/vmlist
touch /root/vmonlist
touch /root/vmofflist
ON="on"
for vm in $( vmware-cmd -l );
do
echo $vm >> vmlist
done
for vm2 in $( cat /root/vmlist );
do
state=$( vmware-cmd -q $vm2 getstate );
if [ "$state" = "$ON" ]
then echo $vm2 >> vmonlist
else echo $vm2 >> vmofflist
fi
done

vmstart script:
#! /bin/bash
echo This script starts all VMs listed in /root/vmonlist
echo If you are recovering from an incident, this list was generated at 5:30 PM
echo If you are unsure, please quit and ask someone else
OPTIONS="Proceed Quit"
select opt in $OPTIONS;
do
if [ "$opt" = "Quit" ]; then
exit
elif [ "$opt" = "Proceed" ]; then
for vm in $( cat /root/vmonlist );
do
vmware-cmd -q $vm start
done
exit
else
echo bad option
fi
done


vmstop script
#! /bin/bash
echo This script will shutdown all running Virtual Machines
echo ARE YOU SURE YOU WANT TO DO THIS?
OPTIONS="YES NO"
select opt in $OPTIONS;
do
if [ "$opt" = "NO" ]; then
exit
elif [ "$opt" = "YES" ]; then
for vm in $( cat /root/vmonlist );
do
echo shutting down $vm
vmware-cmd -q $vm stop trysoft
done
echo Waiting 5 minutes, then forcing shutdown
sleep 5m
for vm in $( cat /root/vmonlist );
do
vmware-cmd -q $vm stop hard
done
else
echo bad option
fi
done

I run a cron job at 5:30 every night to spit out the powered-on VMs into the file. Thats mostly because VMs move around mostly during the day, while a couple of the junior admins might be relocating or powering up new ones. Hope this helps someone!

No comments: