Kristian Glass - Do I Smell Burning?

Mostly technical things

Ubuntu Enterprise Cloud

My first encounter with Ubuntu Enterprise Cloud was while throwing together a quick dev server at work. I booted from an Ubuntu server ISO and saw the words “Enterprise” and “Cloud” together; I promptly dismissed it as some form of bloaty-buzzwordy-junk. It turns out the word “Ubuntu” trumps the “Enterprise Cloud” bit, and it’s quite awesome.

When I want to make someone’s eyes glaze over, I describe it as an Open Source Software Infrastructure-as-a-Service Solution, or OSSIaaSS. When I actually want to talk sensibly about it, I describe it as “Amazon Web Services for your own hardware”. At its core, it’s a software platform called Eucalyptus which gives your own private IaaS setup. Crucially, it exposes this via the AWS API, so there’s a wealth of tools out there, and it makes later migration to real AWS easier.

(I should clarify; I’m doing AWS a huge disservice here; really “all” I’m talking about is EC2 and S3, but they’re my favourite and most-used subsystems, and I’d guess they’re the two things people think of first when someone mentions AWS)

Generally it seemed remarkably easy to set up. I did manage to make a few silly mistakes along the way, that if I’m honest took an embarrassing amount of time to identify despite being relatively obvious things:

Don't forget to add a Node Controller

Node controllers run your VM instances, so you’ll want at least one. Obvious, right? Well, I managed to forget (multitasking at the time) and spent a little too long wondering why I couldn’t start any VMs.

$ sudo euca_conf --list-nodes
registered nodes:  llama   i-2B980609  i-38940671  i-3B250746  i-3D2B07A3  i-44F408EA  i-4A7B08DD  i-555C09E0  llama
$ euca-describe-availability-zones verbose
AVAILABILITYZONE	|- vm types	free / max   cpu   ram  disk
AVAILABILITYZONE	|- m1.small	0025 / 0032   1    192     2
AVAILABILITYZONE	|- c1.medium	0025 / 0032   1    256     5
AVAILABILITYZONE	|- m1.large	0012 / 0016   2    512    10
AVAILABILITYZONE	|- m1.xlarge	0012 / 0016   2   1024    20
AVAILABILITYZONE	|- c1.xlarge	0006 / 0008   4   2048    20
If you see nothing under --list-nodes and all the "max" numbers for your availability zones are 0, you've probably failed to add a node controller (yes this was slightly embarrassing).

Don't forget to enable virtualisation in the BIOS

Once I’d added my Node Controllers, I went to start some instances, only to watch them sit in “pending” mode before terminating. Once I found my way to the Node Controller logs, I was presented with:

libvirt: internal error no supported architecture for os type 'hvm' (code=1)

At this point, kvm-ok (for Ubuntu Lucid, in the qemu-kvm package) is your friend. The machines I was using as Node Controllers (Dell R710s) all have a “Virtualization Technology” setting in the BIOS (under “Processor Settings”). On all of our machines (and I gather this is standard) this was set to Disabled. Rebooting and editing the BIOS to enable it was all that was needed:

$ kvm-ok
INFO: Your CPU supports KVM extensions
INFO: /dev/kvm exists
KVM acceleration can be used

As an aside, I fully support this “bug” entry about disassociating kvm-ok from kvm and putting a “You have virtualisation support but it is disabled” pseudo-warning into the motd - bring on the next Ubuntu LTS release!

Potentially important "footnote"

One bit of weirdness I did find was that immediately after installing (once I’d remembered to create a Node Controller) was that despite the Node Controller existing, and claiming to having detected the Cluster Controller at install time, the Cluster Controller couldn’t find it. Prodding euca_conf gave nothing in –list-nodes, and –discover-nodes found nothing. However I was sort of able to cajole things manually with –register-nodes; at least, keys were copied to the Node Controller, but not a lot of success beyond that.

I then discovered this thread on the Eucalyptus forum of a user having an essentially identical issue - no NC discovery, manual NC registration appearing to work but not, et cetera, with a follow-up post that a solution reported in another (slightly longer-winded) thread had fixed things.

To repeat for posterity and archiving / Google reasons, the solution, with my own notes, was:

  • Deregister all Node Controllers - euca_conf --deregister-nodes)
  • Deregister the Cluster - I used the WebUI for this; I don't believe that using the CLI is necessary
  • Restart the Cluster Controller - I'm afraid I forget as to whether or not I did this
  • Register the cloud again - At this point, using the CLI is important
  • Discover the Node Controllers - euca_conf --discover-nodes

For whatever reason, the Cluster created at install time, and subsequent ones via the WebUI / GUI, had some sort of issue. I haven’t yet been able to diagnose much, nor find a canonical bug report, but this seems a potentially rather significant issue that may hamper people!

The above issue aside, my current experiences with it have been great. Now to get boto talking to it!