IT Angst

Monday, August 31, 2015

Resurrection!

Back from the Dead!

I haven't posted here since I left Canadian Natural. A lot has happened since there, the quick recap:

I left CNRL to be the team lead at Canadian Pacific for their vblock implementation. It was so busy that I kind of dropped this blog off the edge of the world, and haven't picked it back up. Anyway, the vblock there was interesting. It was original intended for vCloud director and VDI for staff and IT, but as is often the case, SAP showed up and consumed all available resources.

After CP I designed a pseudo-flexpod (all the parts, none of the overhead) for Autopro Automation for a big oil and gas client, and then implemented it. A lot of documentation went into that. A LOT. That and designing the whole project in advance on emulators and from memory/old documenation due to engineers wanting all their documentation up front was interesting.

Then I took a solutions architect role for Hatsize Education. It wasn't a good fit, writing bash scripts 8 hours a day.

Now I'm back at CP, Converged Infrastructure Specialist for their Positive Train Control initiative. I also recently completed my CCNA-DC certification, and renewed my VCP at 5.5. Very interesting so far, hopefully I'll have more postings in the future.

Personal notes - I bought a new house!

Max.

Tuesday, January 31, 2012

VCP 5 exam thoughts

Just passed my VCP 5 this morning. Was pretty tough for me as there were quite a few questions pertaining to vDSwitches, and my experience set is almost entirely Nexus 1000v based, so I"m sure I got those wrong.

Almost no configuration maximum questions (the only one I remember is the maximum LUNs presentable to single host) lots of vswitch scenario, policy questions. Quite a few on memory limits, reservations, and lots of CPU performance troubleshooting. A number of questions on the DCUI and what you can change/modify there, and what items are downloadable off the ESXi host homepage.

All around a tough but fair exam I think. Pretty obvious you couldn't just bootcamp it, you need pretty intimate familiarity with the UIs.

Friday, January 20, 2012

2012 New Year, New Job

So I changed to LongView Systems from CNRL at the beginning of the year, had 6 years at CNRL, thats a long time for me, it was fun, lots of great learning, but in the end it was just too static to keep me interested and engaged.

Joining the cloud practice at Longview should be really interesting. Alread on-site at a client and about to migrate from an f-block (reference architecture for a vblock as a POC) to a couple vblocks.

Monday, April 4, 2011

More OTV, Jumbo Frames and Vmware fun!

So we're moving datacenters. Primary and backup both moving in the same short timeframe. Fun.

Our new VMware design relies heavily on OTV from Cisco. 4 blades in each datacenter running the same cluster of virtual machines. We had it tested and working with the secondary DC moved to its new location, but over the weekend we fired up the new primary datacenter and moved the Nexus 7ks to it, while keeping the 6509s in our old primary DC. No VC-ESX communications worked after that.

Now, last week we discovered that OTV adds some packet overhead to communications (we knew it did but didn't realize the repercussions). Vmware secure communications to the VC server are pretty close to the maximum size already (1500 by default). When we tried to add a host to the cluster when the VC was connected to a vlan using OTV, the host would send SHA thumbprint info, but the communication would timeout after that. Thats because OTV adds 70 bytes or so. Pings even work normally, but using the size option (-s, -l depending on client) we found that pings of 1430 size worked, and 1431 size didn't.

So after discovering this we played around with resizing the MTU on vmware and the VC, but decided rather that the switches all should have their MTU fixed. The network team fixed the MTU size on the 7ks, but the 6509s will unfortunately cause OSPF errors if the MTU isn't the same on all the switches. means a big outage, so we're scheduling that.

So, why did it work during testing? Because the 7ks could talk directly to each other without a router (6509) prior to the move, and afterwards they couldn't. Doh.

So, why then couldn't the VC server, which was hosted in the backup DC, not even communicate with the ESX host it resided on? Because of OTV domain ownership. The 7ks in the primary datacenter own all the OTV vlans, and because the 7ks couldn't talk to each other anymore directly, the OTV vlans in the backup DC are broken until the 6509 reboots. Big d'oh.

Wednesday, February 23, 2011

VMware HA and Cisco OTV

So we're moving to 2 new data-centers in the next little while, from our existing 2. However the new ones will eventually be active/active, once we get Ontap 8.x whichever one supports metronet clusters.

So in designing our new HA architecture, I had to take into account the fact that we're using HP blades and OTV from cisco, which allows us to have a flat network and have VMs that are portable across the 2 data-centers.

Which seems like a great idea. We'll have cisco 7000 10G switches connecting the ESX hosts in each location, and the 6509s will handle the OTV domain ownership since they control the links between locations as well. Links are redundant, ESX clusters have capacity in both locations, and eventually we'll have metronet clusters for the filers, so storage will be available in both places.

HOWEVER, I was talking about the OTV vlans to our network team today. Seems that in the event of a datacenter outage, an OTV domain owned by the switch in the failed datacenter would be unavailable everywhere, even on the 7000s in the opposite DC. So the HA response for ESX would be to shut everything down, because its heartbeats would fail everywhere.

The solution is having another service console network for the whole cluster, and have its OTV domain be owned by the 6509 in its DC. That way in the event of a DC failure, those hosts could still communicate on their secondary SC network, and they wouldn't fail, and as well they would properly power on the VMs from the other DC, which is what we want.

And we have to be sure we add hosts in a specific order to the HA cluster. The first 5 are HA masters, so we need at least 1 in each location, preferably 2 - so adding 3 from one DC and then 2 from the other to the HA cluster is important.

Monday, May 3, 2010

So I'm writing this down here to help me remind me to think this out. I think the internet will be the downfall of democracy.

I watched a video from the TED conference with a speaker (he was familiar, but I forget who he was, a journalist) who was taking about science denial, and how everyone is entitled to an opinion, but not their own facts. I think the internet is great, but it does a terrible job (good job, but terrible effect) of empowering people who have an opinion counter to science. Vaccines are a great example of this (although I think we seem to have a handle on that one), but now theres a furor over high-fructose corn syrup. Our soda in Canada has sugar, not HFCS in it, and we still get fat from drinking it. I think its a victim attitude.

Maybe I shouldn't say the internet, maybe I should say facebook will be the downfall, although its existence is necessary for facebook to exist. People with some political agenda think product X is evil, with no scientific background whatsoever, make facebook page to convince others its evil, and those people think its right! no science involved whatsoever.

Anyway, when we stop listening to experts, our system is going to fall into chaos, with only the loudest or most popular voices being listened to. Sean Penn will be president! Or something equally as bad.

Friday, March 19, 2010

While enabling mailboxes in Symantec Enterprise Vault (software that archives most of you mail except the last 30 days, put in place because we're too nice to tell users to FUCKING DELETE THEIR SHITTY 15 year old emails) this error popped up. Funny.