UPDATE: Geoff helpfully pointed out this morning that you can set the Warden netmask via the bare-metal.yml.example file. But that's no fun.
With BOSH spinning, I put together a small testing BOSH release (which you can find here), and deployed it. The deployment worked fine, and soon I had some test VMs running as Warden containers on the server.
Problem is, the networking was all mismatched.
Anyone familiar with BOSH-lite is accustomed to the 10.244.0.0/16 network space that it uses by default. This convention over configuration approach has lead to lots of BOSH releases shipping a Warden configuration out-of-the-box that spins up static IPs in the 10.244 network. No problem; BOSH-lite is intended to be used for development.
For reasons I'll not go into right now (maybe a future blog post), I want to be able to run several bare-metal BOSH servers using Warden containers. I don't want to invest the money in vSphere, and I don't have the patience to stand up Openstack. That leaves AWS, which is too expensive for my ambitions, and a bit overkill.
Which is why I turned to BOSH + vagrant in the first place.
The problem I ran into had to do with routing. Here's a highly simplified view of the network topology inside of my lab:
All wireless clients (laptops and phones included) live on the Unprivileged Access Network (the pink cloud), which routes everything to the Core Network (green) via a consumer-grade wireless router at 10.0.1.1.
The core network router is a Linksys WRT54G with a custom build of OpenWRT that allows it to manage routing, firewall duty, and termination of a handful of OpenVPN point-to-point VPNs that stitch my network into a larger one distributed across the Internet.
The important part of the diagram, however, is the 10.4.0.0/16 (yellow) network labeled "Service Network 1". This network exists entirely inside of the beefy server hardware that I am running BOSH-lite on. The Core Network OpenWRT router has static routes for the 10.4/16 subnet.
The Easy Solution
The easy solution, of course, is to just change the Service Network range from 10.4/16 to 10.244/16. Problem solved. Off to the deployments!
Unfortunately, that masks a not-so-subtle problem with the default configuration of BOSH-lite (which, as a develpoment platform, is not important enough to warrant discussion): you can't run more than one!
The Hacker Solution
Just for fun, let's try changing the BOSH deployment manifest to use networks in 10.4/16, and see if it works!
jumpbox $ ./test-dev manifest warden jumpbox $ sed -i -e 's/10.244/10.4/' manifests/test-warden-manifest.yml jumpbox $ bosh -n deploy Acting as user 'admin' on deployment 'test-dev' on 'Bosh Lite Director' Getting deployment properties from director... Deploying --------- Director task 59 Started unknown Started unknown > Binding deployment. Done (00:00:00) Started preparing deployment Started preparing deployment > Binding releases. Done (00:00:00) Started preparing deployment > Binding existing deployment. Done (00:00:00) Started preparing deployment > Binding resource pools. Done (00:00:00) Started preparing deployment > Binding stemcells. Done (00:00:00) Started preparing deployment > Binding templates. Done (00:00:00) Started preparing deployment > Binding properties. Done (00:00:00) Started preparing deployment > Binding unallocated VMs. Done (00:00:00) Started preparing deployment > Binding instance networks. Done (00:00:00) Started preparing package compilation > Finding packages to compile. Done (00:00:00) Started preparing dns > Binding DNS. Done (00:00:00) Started creating bound missing vms > small_z1/0. Failed: Creating VM with agent ID \ 'aacf0752-d306-4788-b6ee-aabbfc338d4b': Creating container: network already acquired: \ 10.4.2.8/30 (00:00:01) Error 100: Creating VM with agent ID 'aacf0752-d306-4788-b6ee-aabbfc338d4b': \ Creating container: network already acquired: 10.4.2.8/30 Task 59 error For a more detailed error report, run: bosh task 59 --debug
You can try any number of different networks, but if they aren't in 10.244/16, BOSH (or more specifically, Warden) will fail to provision the network, claiming that it is "already acquired".
Luckily, while the decision to use 10.244/16 is hard-coded into the BOSH-lite/warden distribution, it is explicitly called out in exactly one place: the startup script that runs the Warden supervisor.
boshbox # grep -nC6 10.244.0.0 /var/vcap/jobs/warden/bin/warden_ctl 29- exec /var/vcap/packages/warden-linux/bin/warden-linux \ 30- -disableQuotas=true \ 31- -listenNetwork=tcp \ 32- -listenAddr=0.0.0.0:7777 \ 33- -denyNetworks= \ 34- -allowNetworks= \ 35: -networkPool=10.244.0.0/16 \ 36- -depot=/var/vcap/data/warden/depot \ 37- -rootfs=/var/vcap/packages/rootfs_lucid64 \ 38- -overlays=/var/vcap/data/warden/overlays \ 39- -bin=/var/vcap/packages/warden-linux/src/github.com/cloudfoundry-incubator/warden-linux/linux_backend/bin \ 40- -containerGraceTime=5m \ 41- 1>>$LOG_DIR/warden.stdout.log \
If you change line 35 to specify the
-networkPool option as
10.0.0.0/8, and subsequently restart the Warden supervisor, you can provision against any subnet of 10/8, and even use different nets on different boxen.
boshbox # sed -i -e 'email@example.comfirstname.lastname@example.org/8@' \ /var/vcap/jobs/warden/bin/warden_ctl boshbox # monit restart warden