Thursday, July 31, 2008

Vmware snapshots 160Gb!

One of our field ESX servers had a snapshot from Ranger that didn't get deleted, and grew to 160Gb, freezing the VM and making people at the site unable to log in. I set the snapshot to delete, but its been more than 24 hours in deleting.

Turns out the only way to view a snapshot being deleted is if the VM is on, you can run esxtop, then press "e" and enter the ID of the process for the VM. This will expand the process for that VM, and you check for SnapshotVMXCombo process for that group ID. This means the snapshot process is still running.

If the VM is off, there's apparently no way to tell if its still deleting or not. wow.

Wednesday, July 30, 2008

This set of scripts saved my ass 100% just now. We've had a few issues with VMware servers being disconnected due to network incidents - not just one but the whole farm goes down (they all think they are isolated, and in appropriate fashion shut down their VMs, expecting one of the other HA cluster members to power them up - however all the hosts are isolated, so there isn't a cluster up for the VMs to be powered up on).

Anyway, following the 2nd time that happened, I wrote some scripts, first to log what VMs are running on a host each night, and the second to power those VMs up (I wrote a third which also gracefully shuts down all the VMs on a host, its handy, I posted that one third).

The reason it saved me is because I had just had to hard reboot an completely unresponsive ESX host, and there are a couple servers that weren't powered on for various reasons, and I forgot to write down which ones beforehand. Yay for pre-emptive scripting!

vmstate script:
#! /bin/bash
echo This script exports the VMs that are running to a text file for later startup/shutdown operations
rm -rf /root/vmlist
rm -rf /root/vmonlist
rm -rf /root/vmofflist
touch /root/vmlist
touch /root/vmonlist
touch /root/vmofflist
ON="on"
for vm in $( vmware-cmd -l );
do
echo $vm >> vmlist
done
for vm2 in $( cat /root/vmlist );
do
state=$( vmware-cmd -q $vm2 getstate );
if [ "$state" = "$ON" ]
then echo $vm2 >> vmonlist
else echo $vm2 >> vmofflist
fi
done

vmstart script:
#! /bin/bash
echo This script starts all VMs listed in /root/vmonlist
echo If you are recovering from an incident, this list was generated at 5:30 PM
echo If you are unsure, please quit and ask someone else
OPTIONS="Proceed Quit"
select opt in $OPTIONS;
do
if [ "$opt" = "Quit" ]; then
exit
elif [ "$opt" = "Proceed" ]; then
for vm in $( cat /root/vmonlist );
do
vmware-cmd -q $vm start
done
exit
else
echo bad option
fi
done


vmstop script
#! /bin/bash
echo This script will shutdown all running Virtual Machines
echo ARE YOU SURE YOU WANT TO DO THIS?
OPTIONS="YES NO"
select opt in $OPTIONS;
do
if [ "$opt" = "NO" ]; then
exit
elif [ "$opt" = "YES" ]; then
for vm in $( cat /root/vmonlist );
do
echo shutting down $vm
vmware-cmd -q $vm stop trysoft
done
echo Waiting 5 minutes, then forcing shutdown
sleep 5m
for vm in $( cat /root/vmonlist );
do
vmware-cmd -q $vm stop hard
done
else
echo bad option
fi
done

I run a cron job at 5:30 every night to spit out the powered-on VMs into the file. Thats mostly because VMs move around mostly during the day, while a couple of the junior admins might be relocating or powering up new ones. Hope this helps someone!
I bought a Wii Fit yesterday, and am going to track my progress on wiifits.blogspot.com - I'm going to try to convince stevie to do it as well.

Monday, July 28, 2008

I hate email users

We got this string of emails forwarded to our company. I have removed the people on the to: list because I have no evidence that they forwarded it, unlike the people in the from fields.

Normally this sort of thing wouldn't concern me, we've locked down our top-level distribution lists to remove the possibility of anyone forwarding this to mass quantities of people. However some --enterprising-- (read: we've locked down the lists to keep you from doing this) user put EVERY OTHER MAILING LIST in a single email, and forwarded it to the WHOLE COMPANY. And then OTHER USERS STARTED RE-FORWARDING IT TO THE WHOLE COMPANY. And then OTHER USERS STARTED REPLYING TO THE WHOLE COMPANY TO PLEASE STOP FORWARDING THIS. And then finally, the last guy sent "ditto", which reminds us of course, of the dilbert cartoon about this exact subject.

The help desk "spoke directly to, and reminded each user of our email acceptable user policy". Glad I wasn't doing it, because there would have been some crying going on, and I probably would have got fired.

From: Goddard, Shando [mailto:SGoddard@petro-canada.ca]
Sent: Monday, July 07, 2008 4:50 PM
To: removed
Subject: FW: I DON'T KNOW HOW IT WORKS, BUT IT DOES.

--------------------------------------------------------------------------------
From: Appleton, Lee
Sent: Monday, July 07, 2008 7:10 AM
To: removed
Subject: FW: I DON'T KNOW HOW IT WORKS, BUT IT DOES.

Best Regards,

D.L. (Lee) Appleton

--------------------------------------------------------------------------------
From: McKinnon, Jim
Sent: Sunday, July 06, 2008 9:02 PM
To: removed
Subject: FW: I DON'T KNOW HOW IT WORKS, BUT IT DOES.

--------------------------------------------------------------------------------
From: Robert Lawson [mailto:palawson@telusplanet.net]
Sent: Sunday, July 06, 2008 2:34 PM
To: removed
Subject: FW: I DON'T KNOW HOW IT WORKS, BUT IT DOES.


Best of luck to everyone.



-----Original Message-----
From: Lawson, Robert [mailto:BoLawson@petro-canada.ca]
Sent: Sunday, July 06, 2008 2:43 PM
To: Lawson, Robert
Subject: FW: I DON'T KNOW HOW IT WORKS, BUT IT DOES.

--------------------------------------------------------------------------------

From: Campbell, Dennis
Sent: Thursday, July 03, 2008 8:41 AM
To: removed
Subject: FW: I DON'T KNOW HOW IT WORKS, BUT IT DOES.

This is a just in case it does work.

--------------------------------------------------------------------------------

From: Marty Price [mailto:Marty.Price@Halliburton.com]
Sent: Thursday, July 03, 2008 6:39 AM
To: removed
Subject: FW: I DON'T KNOW HOW IT WORKS, BUT IT DOES.

Blackberry post!

My first mobile post. My new BB is easier to type on, full keyboard
and all. My old one suffered a "shuswap water" treatment and stopped
functioning, surprise surprise.

Anyway, fragapalooza is coming up, paul brad brent igor and laz are
all attending, should be fun. Brad sent me a link to his tech blog
www.bradstechblog.com and I remembered that SCCM makes baby jesus cry.

--
Sent from Gmail for mobile | mobile.google.com

Ack I should do more regular updates

I figured out how to post to the blog remotely using my blackberry, so hopefully more posts will come, and more frequently.

We upgraded our netapp filers in 7.2.4 last week, and one of our LUNs had duplicate name mapping - how the hell does that happen? Anyway, all the VMs on that LUN had to be moved after the outage, thankfully nothing serious happened, but it just proves why we have jobs - IT stuff will always be ahead of the knowledge curve of normal users.

Sure they're catching up to IT workers in knowledge, but we'll always be ahead because IT will continue to become more complicated. Users can do a lot of the things for themselves that administrators used to have to do 10 years ago, but now we have more complicated tasks (like managing VMware infrastructure).

Stevie went to Kate's stag party on Saturday night, had to pick her drunk ass up at a Bar :) Hopefully Ken's will be just as much fun. Wait didn't they do these things the first time they got married? I'm glad I didn't know them the first time, because if I'd bought them a gift then, I would have stolen it back and re-gifted it to them.

Got into the Warhammer Online beta - can't discuss further, NDA etc...