Why I ditched OpenStack, and tread lightly with Docker

Every morning on the way to the office (on the mornings I bother driving to the office) my route takes me past an odd building. It’s a building which belongs to a drafting school, and what’s interesting about it is that it looks like some sort of beautiful towering piece of Roman architecture. However as you round the bend which passes directly in front of it the truth becomes much more apparent – the building has been entirely encased in a carefully painted sheet. It’s a 10 story high tent containing inside it a hideous concrete monstrosity from the 1970s, and it’s quite convincing until you’re about 10 meters away. Considering what the building is used for I’m sure there’s some deliberate tongue in cheek meaning going on, and I like that it gives me a chuckle every time I drive past it but it also serves as a great reminder of our duty as engineers.

OpenStack was probably one of the most promising things happening over the last 2 years, until of course I actually tried to use it. At the time I was field testing various virtualisation ideas for a private cloud and OpenStack was definitely the top of the list of things which fitted the bill.

The reason it completely failed was much the same reason I’m not buying into the Docker hype just yet – it was a fragile unstable house of cards which constantly felt like it was about to crumble beneath my feet. Virtual machines would constantly get stuck in odd states where the API had no idea what to do, and I had to constantly kill off orphaned libvirt processes and then fiddle with the database to force machines into a stopped state to restart them, and then hope that data was still intact. Granted, most of this is probably because KVM is also quite horrible and fragile, but hopefully new versions would be better right? Of course they weren’t because the upgrade path was so convoluted it wasn’t worth the time and then they introduced a new networking layer which was ludicrously over complicated, architecturally expensive to implement, generally ill-conceived, and (as I was starting to expect from OpenStack at this point) it didn’t even work properly no matter what conflicting piece of documentation was followed. I even got a second set of eyes on it and the conclusion we both reached was quite simply “Yeah, this stuff is just completely broken”. If many more hours had been spent debugging it I’m sure we could have found the issue, logged a ticket with the relevant project and maybe (maybe) it would be fixed. But why bother when there are other options?

The problem is mostly that for some completely misguided reason people think implementing a cloud framework is all about emulating Amazon’s API and architecture. While I can sort of understand the starting point of this logic (taking advantage of client libraries like Boto) the reality is these libraries are not actually good enough to care so much about, or difficult enough not to simply reimplement if the foundation was solid enough for people to adopt. Amazons API is a horrible nightmarish mess of bloated XML, so why would anyone want to recreate it? The problem is now OpenStack is so busy chasing after compatibility with a meaningless entity that little time goes to addressing the mounting piles of technical debt and bugs which underpin the entire thing. Maybe it is better now, maybe it will get better in future, but unfortunately it wasted enough of my time then to not care much about its future.

This is the precise feeling I get with all the Docker hype. I do agree that containers are a good idea for many deployment scenarios and use cases, but in other areas people are solving problems which don’t exist with the wrong tools – like coming up with outrageous comparisons between Docker and Puppet, which is about as useful as comparing the International Space Station to an onion.

Containers are not a replacement or an alternative to configuration management, and if you think they are then you’re just seriously wrong or the work you’re doing is irrelevant to the situation in the first place. Either way, don’t send me your CV.

Misuse of Docker isn’t its biggest problem though, its production readiness is. Good Linux distros lag pretty darn far behind new stuff and this conflicts with anything which requires manipulation of that stack. Even Ubuntu which releases quite often is too broken for me to run in a production server environment, LTS releases are the sensible choice (and my current choice) and even those are always pretty ancient compared to the present and still sparking new compared to RHEL, CentOS or Debian stable.

And so this is how it ends up; a growing pile of bleeding-edge external repositories to keep up with nightly bug fixes and nightly bug introductions, and eventually this is your job as a system administrator. Chasing down hilarious bugs, like that somehow the machine I was testing LXC and Docker stuff on suddenly tried to install a funny version of grub2 and wound up hilariously broken and unbootable, or that it gums around with iptables so much my existing iptables scripts get confused or they blow away the docker changes leaving containers broken (something OpenStack annoyed me with too). When you hit these issues almost no one has encountered stuff before because the domain of use is actually so pitifully small, testing is conducted in a tiny vacuum with not nearly enough different system configurations, and the early adopters are pretty much people who just got lucky and don’t actually understand what’s going on.

Many of these things work great in a perfect world, where you have piles of money, get to pick whatever server hardware or cloud provider you like, and have a clean slate to re-arrange your network. Any experienced person knows that production environments are never this. They’re a hodgepodge of compromise, budget cuts, lack of people and lack of time, and the last thing people working on them need is something that could go horribly wrong at the drop of a hat with no one experienced enough around to fix it quickly at 3AM.

This is why I personally don’t build production systems on luck – I build them with well tested software which has shown itself to be reliable in even the most seemingly outrageous configurations. I especially don’t chose software based on vague testimony from fancy beanbag start-up companies because god only knows what they’re really dealing with there or how dark the rings are under their admins eyes – all you have to do is drive a bit closer and see if it’s actually a solid building or just a tent.

The Python cargo cult: Nothing invented anywhere.

I’ve worked with a wide variety of software teams in the past, lately however I’ve experienced something somewhat stranger. Software teams who, as it appears, can’t actually build software – and it’s a growing problem.

It’s trivial for example to pick up Django, and then dig through Github for plugins and libraries for doing almost anything. Most of this is alright, and some of it is good on principal, but the net result can be a sort of cargo cult programming. On the surface companies can seem great at churning out functional systems for their clients, the real proof in the pudding however is when it comes to moderately more complex integrations or dealing with things when they go wrong.

Dependency bloat

As someone in charge of build systems and releases for a number of projects, the most annoying thing to see are useless dependencies. There are cases where the Python standard library isn’t great, when you’re doing complex things many times over in critical portions of a project. On the other hand there are hundreds of basement library replacements written by some guy who was building an image gallery for his pet cat – and somehow this 10,000 line behemoth which requires Matplotlib and god knows what else has been selected to be inherited into every database model in a system earmarked for a stock trading website or something.

Now you’ve got something poorly made, no one knows how it actually works, if it’s thread safe or will result in local deadlocks with high concurrency, or secure, or emailing your /etc/passwd file to a Russian hacker group. Better yet it’s already tightly coupled to everything so removing it is going to be expensive, it’s like cancer once you have it you’re stuck with it – and much like smoking cigarettes filled with asbestos people are doing it to themselves, deliberately. But hey, “we’ve always done it like that…”

Why God, why?

Pre-empting feature creep

One reason is that people try to pre-empt feature creep by picking the largest most flexible libraries they can find because there’s an invalid assumption that this could make other things easier at a later point.

There can be some truth to that, but it’s usually not the case. It’s impossible to entirely predict where a product will go but adding bloat early on is not the solution – ensuring you have enough lose coupling to deal with replacing any component is the far more elegant solution.

Think of it like this; at all times any dependency (granted, not something like Django or Rails itself) should be replaceable with minimal modification to the core logic of your code.

Not-invented-here syndrome.

As with everything since the advent of language related conferences, themes develop at times and one of them was a vast campaign against NIH with almost everyone going on about it.

NIH is great when it comes to certain things, like crypto – you would (and should) be terrified to see what goes on inside banks. It starts like this, Alice goes to university and does an intro to crypto course then gets a job at a bank. Alice knows her stuff, I mean she spent $100,000 on her education so surely she can write a hashing library for her boss. This is of course idiotic as there are many well tested and established crypto libraries and unless you’re the kind of genetic freak who devotes their life to crypto it is something to avoid at all costs, along with similar things like DIY neurosurgery. So then Alice’s bank gets hacked and leaks customer information to the world, so everyone turns around and over-compensates by inventing nothing ever so they can forward the blame to an external party.

Screen Shot 2014-06-03 at 1.22.23 PM

NIH has another side effect – things which are crummy never get rewritten. PIL (the imaging library for Python) for example is a disaster, Pillow fixes many aspects but the API is ugly, clumsy, slow, has a terribly fragile installation process, and has a really large amount of tightly coupled dependencies. Thanks to this attitude of “If it exists, I should use it” few people have embarked on building a ground up replacement.

This is sadly the resulting culture, programmers where “good enough” is “good enough”, and when it isn’t just apply tape. The argument being that if something at least exists, it should be modifiable for your use case. The complete failing of this argument is that the net result is fragmented forks of poorly architected systems littering the internet with no possible way to merge them and no way to reasonably maintain the systems unfortunate enough to use them. When I reach the limitations of any library I do a careful analysis and ask myself a tough question “Is it worth it to fix this, and go through the hassle of getting a patch accepted or will it actually be cheaper to build an alternative which matches my requirements more exactly”.

And I’m not even talking about something as difficult as writing your own SSL library or building a database ORM, I see this being done all over the place with basic image manipulation and POSIX path handling. Seriously? When did appending to strings become so difficult and time consuming that people felt it necessary to import Python Unipath just to concatenate two paths on a single line of settings.py?

Swapping NIH syndrome for cargo cult programming and dependency bloat in some sort of systematic program of laziness and cluelessness has created an entire market of developers who decided that they were all too busy to understand the systems they were using, and instead just rely on supergluing a random collection of Github projects together - then when it all falls in a heap just shrug.

Just because you shouldn’t build a crypto library yourself and use it in a production system, doesn’t mean you should give up on ever learning how to.

Testing separates products from experiments

It’s the internet, that wondrous place where for any human dilemma there is a pile of subjective nonsense out there to help you assuage your dissonance or just validate whichever opinion you already chose.

Don’t use Java, use Python. Don’t use Python, use Node.js. Be more agile. Sleep less, no wait sleep more, no wait sleep polyphasically. Ditch asynchronous event driven systems for synchronous ones because you’re too busy or just not smart enough to architect software that way.

I still remember when it was all about Extreme Programming and Pair Programming (something I have enjoyed in rare contexts but would drive me crazy if I had to do it every day). Now it’s about Agile and wasting time with daily Scrum meetings, and in a few years time there will be another new fad. Perhaps we’ll all have to type with our feet, or realise that on a machine code level a method call is still basically just a GOTO statement.

Of course, people try to stubbornly push development and business methodologies all the time without any evidence and they’re usually ignorant regardless of what they’re pushing, because for them it’s usually about selling training workshops not helping you to write better software. If you look at any real business it’s inevitably a complete blur of cherry picking what works from each of them and differs widely between projects and teams.

We have frameworks for “people with deadlines”, people who like square things, people who like round things. One has to wonder where the science part of computer science went to – of course we know exactly where it went, fact of the matter is programming is now easy. More people can do it, you don’t need to know anything about SQL, atomicity, concurrency or scalability to build a large service – and once you’re successful you can just throw money at the problem.

When I was in school it was difficult to produce anything slightly complex without OOP or even an Internet to download frameworks and tools from never mind the level we produce things at now, it was inspiring and magical though and I guess that’s why it seems increasingly more difficult to find passionate developers. It’s a good thing that everyone can write software easily, it makes the world a better place and it helps everyone to be productive through technology, but that also means if you’re looking for a real software developer you have to look a lot further than someone who can write code that “just works”.

That benchmark is knowing how it works, understanding precisely why it works, and more importantly how to test and prove that.

This gets more to the crux of the matter, if we keep listening to people who are basically just thumb sucking something different rather than something sensible, tested and proven then we’re at the mercy of blind luck rather than science. I’m referring of course to TDD or Test-Driven Development. Seems some people think this is just a new religion and base the argument of not using it on people who spout long-winded diatribe about how everyone should do it and if you don’t then you’re going to the dark place below ground. Everyone forgets that an alternative to hand waving is reading actual studies from people who’ve done some actual research, because they’re scientists and not salesmen or people suffering from sunk cost fallacy.

Having an empirical way of testing the functionality and completeness of programs is pretty much mandatory to producing software which actually works no matter how sophisticated that mechanism is. I’ve written a lot of bad software with few tests, software that was too difficult to test sensibly anyway, and software I was just too lazy to write tests for. That doesn’t make me a hypocrite, none of that software has seen the light of day or gone further than an experiment.

Sometimes an experiment sets out to prove an outcome, sometimes it starts as a hypothesis and then we measure the outcome at the end, even if sometimes that measurement happens in production environments. Products are not experiments though, they begin with objectives based on the outcome of experiments and end at a functional implementation which passes those tests, and since the days of compiling and linking are mostly over for 90% of developers out there tests are a lot more important to protect against regression.

It’s impossible to argue then that when you set out to build anything that you don’t have any measurements to evaluate when you’re done, and that’s all TDD is – formalising that up front, something which is crucial to communicating objectives precisely to multiple people working on the same task, anything less is just an experiment.

Writing client plugins with twistd

Lurking and occasionally trying to help people in #twisted has taught me one thing – no one ever reads the documentation, I mean really read it cover to cover. Seriously, if you’re not used to all the asynchronous design patterns and you just start trying to bulldoze your way through writing a Twisted application then you’re going to get very frustrated and fail. The other thing I realise is that people stop at trying to use reactor.run() and building a crap pile around it. Seriously, use plugins.

But if you’re writing some kind of client, plugins are less obvious. Fortunately, writing something that implements IService is not difficult, and very rewarding in terms of clean code which is properly testable.

So lets start by creating twisted/plugins/ticktock_plugin.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from zope.interface import implements
 
from twisted.python import usage
from twisted.plugin import IPlugin
from twisted.application.service import IServiceMaker
 
import ticktock
 
class Options(usage.Options):
    optParameters = []
 
class TickTockServiceMaker(object):
    implements(IServiceMaker, IPlugin)
    tapname = "ticktock"
    description = "A clock"
    options = Options
 
    def makeService(self, options):
        return ticktock.makeService()
 
serviceMaker = TickTockServiceMaker()

Now lets write a Service, called ticktock.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from twisted.application import service
from twisted.internet import task
 
class LoopingService(service.Service):
    def __init__(self):
        self.t = task.LoopingCall(self.tick)
        self.n = True
 
    def tick(self):
        if self.n:
            print "Tick"
        else:
            print "Tock"
 
        self.n = not self.n
 
    def startService(self):
        self.running = 1
        self.t.start(1.0)
 
    def stopService(self):
        self.running = 0
        self.t.stop()
 
def makeService():
    return service.LoopingService()

Now you can run your service with twistd which takes care of logging and all sorts of other things you shouldn’t try doing yourself.

1
2
3
4
5
6
7
8
9
10
11
$ twistd -n ticktock
2014-04-13 18:50:36+0200 [-] Log opened.
2014-04-13 18:50:36+0200 [-] twistd 13.2.0 (/home/colin/ticktock/ve/bin/python 2.7.3) starting up.
2014-04-13 18:50:36+0200 [-] reactor class: twisted.internet.epollreactor.EPollReactor.
2014-04-13 18:50:36+0200 [-] Tick
2014-04-13 18:50:37+0200 [-] Tock
2014-04-13 18:50:38+0200 [-] Tick
2014-04-13 18:50:39+0200 [-] Tock
2014-04-13 18:50:40+0200 [-] Tick
2014-04-13 18:50:41+0200 [-] Tock
2014-04-13 18:50:42+0200 [-] Tick

Obviously instead of a looping call you can throw in an IRC client, DDoS your enemies, or whatever you were planning to do with it.

And yes, there is already a timer service class available in Twisted.

CCP and customer support

A while ago Eve Online released a convenient ‘launcher’ for their game which I’ve played on and off for almost a decade. Except it wasn’t very convenient, it tried to wrap the update process and from the moment it was released it has never worked properly and I’ve experienced the same bug with it since day one resulting in a highly tedious process to update the client. Sadly I only have some ineffective logs to go by so I can’t speculate on why it fails every single time, but I filed bug report after bug report with as much information as possible saying this was seriously impacting my ability to play. I reported it on their forums to no avail, and I filed support issues which were never replied to. I found workaround after workaround (which I also posted on their message boards for others) and then my last workaround to the broken launcher stopped working too, I could no longer play the game I was paying for so I gave up and I cancelled my two subscriptions.

During cancelling my subscription I was asked for a reason, so I wrote pretty much the above account. All I got in response was a mail reminding me to resubscribe, and then (in reverse order)…

From: Colin Alston
Date: Sun, Nov 3, 2013 at 1:23 PM
Subject: Re: Re: EVE Online - Subscription Renewal Reminder 
To: EVE Online Customer Support <support@eveonline.com>

Your reply isn't "Can we fix our launcher and get you back as a customer"? Wow...

On Sun, Nov 3, 2013 at 1:20 PM, EVE Online Customer Support <support@eveonline.com> wrote:
Hello, Senior GM Huginn here,

I'm sorry to hear that you're still experiencing issues with the EVE launcher.

What are the user names of the accounts you want us to cancel the subscription for?

Best regards,
Senior GM Huginn
CCP Customer Support | EVE Online | DUST 514
-----------------------------------------------------------------------
MESSAGE HISTORY:
Original ticket @ 2013-11-03 11:19:

Your records should indicate that I deliberately cancelled my subscription
because CCP aren't interested in fixing their broken launcher which I've
filed multiple bug reports about.

On Sun, Nov 3, 2013 at 12:03 PM, EVE Online support
wrote:

> [image: EVE Online] 
>
>
> We are contacting you to remind you that according to our records you have
> 6 days left on your non-recurring subscription to EVE Online.

 

Wow…

SSH ports, the great obscurity debate

So there are two posts http://www.danielmiessler.com/blog/putting-ssh-another-port-good-idea and http://www.adayinthelifeof.nl/2012/03/12/why-putting-ssh-on-another-port-than-22-is-bad-idea/

Well clearly we have some very different arguments, which are of course all nonsense. Security by obscurity is still valid security, as long as it isn’t your only security – that much is certainly valid. As to what port SSH runs on, it doesn’t matter in the slightest if your SSH daemon is insecure, it’s outright trivial to know if an SSH daemon listening on any port. What people seem to fail at most with security is that it hinges on the principal of risk, and with so much software out there people are overwhelmed and paranoid about all these cumulative risks causing them to make rash decisions which don’t improve security but make it more difficult to work with those systems. Reality is that there has not been a reasonable SSH exploit in the last 15 years, other than the great Debian faux pas with ssh key generation.

Personally I can’t be bothered to move SSH to a different port, it only provides a minimal level of cover for the unlikely possibility of an SSH zero day but brings with it a substantially greater amount of inconvenience. What bothers me more is that both these articles fail at explaining how to really secure SSH, since if your daemon is open to the public internet in any form then you’re already in a difficult position. This is simple enough though to achieve a reasonably high barrier to the most common means of exploitation – enforce the use of sudo for root access and the use of SSH keys for login.

1
2
PermitRootLogin no
PasswordAuthentication no

Of course if the keys to accounts are not secured by users in some way (their machines are exploited) then you’re quite screwed no matter what you’ve done. If you require security over and above that then a bastion host is a reasonable option with SSH traffic restricted to that IP address, and/or providing a VPN with two factor authentication into a DMZ with SSH access (you can do this with OpenVPN).

Security is about a wider scope of architecture, and changing service ports is a serious waste of time, as is even debating something so silly. The only secure machine is one which is not powered on.

Amazon Auto Scaling with Puppet, PuppetDB and Haproxy

In a previous post I  talked about how we bootstrap EC2 instances into Puppet with an rc.local script inside a stock AMI, this has worked great but turns out it’s somewhat difficult when using something like Auto Scaling groups. Auto Scaling sadly provides very little metadata to work with, the entire idea really hinging off preconfigured AMI’s – something I absolutely hate. The other issue with that is you’re in the same place trying to organise hosts off the bat by their hostnames.

After days of researching all the angles I finally stumbled on a presentation done by Pinterest (people like Ryan Park who publish their ops work are seriously awesome) which had some clues about evolving my rc.local setup. A while ago I changed our AMI to rather fetch the bootstrap script from a webserver on the Puppet master and keep it in the same repo as our Puppet modules, this saves rebuilding the AMI if we need to change the Puppet host or anything like that – or in this case adding crazy hacks. This is kinda similar to how the Pinterest stuff was handled, with the exception of not requiring the ec2 utils which would put keys inside an AMI and that would make me pretty uncomfortable.

So the first step is a new bootstrap script called by an rc.local of wget http://puppet.acme.com/ec2-bootstrap.sh -O /tmp/ec2-bootstrap.sh; bash /tmp/ec2-bootstrap.sh or some such, obviously wherever you host your scripts could be totally different.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#!/bin/bash
 
id=`/usr/bin/curl -s http://169.254.169.254/latest/meta-data/instance-id | cut -c 3-`
FQDN=`/usr/bin/curl -s http://169.254.169.254/latest/user-data | grep hostname | sed 's/.*=//' | sed 's/ //'`
IP=`/usr/bin/curl -s http://169.254.169.254/latest/meta-data/local-ipv4`
 
FQDN=`eval echo $FQDN`
HOSTNAME=`echo ${FQDN} | awk -F"." '{print $1}'`
 
if [ "$FQDN" == "" ]; then
   echo "No hostname found in user metadata"
   exit 0
fi
 
echo $FQDN > /etc/hostname
 
cat<<EOF > /etc/hosts
# This file is automatically genreated by ec2-hostname script
127.0.0.1   localhost
$IP  $FQDN $HOSTNAME
 
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
EOF
 
hostname $FQDN
 
# Install puppet
dist=`lsb_release -cs`
echo "deb http://apt.puppetlabs.com/ $dist main dependencies" > /etc/apt/sources.list.d/puppet.list
/usr/bin/apt-key adv --keyserver keys.gnupg.net --recv-keys 4BD6EC30
/usr/bin/apt-get update
/usr/bin/apt-get -y --force-yes install puppet
 
wget http://puppet.acme.com/puppet.conf -O /etc/puppet/puppet.conf
 
/usr/bin/puppet agent --onetime --no-daemonize --logdest syslog
 
echo "#!/bin/sh -e" > /etc/rc.local
echo "exit 0" >> /etc/rc.local

What manner of witchcraft is this? Well we grab “hostname=blah” from the user-data field on an EC2 host, but notably we evaluate it in the context of this scripts scope, then we get the IP, build a hosts file and set the hostname up right, and then it goes through the motions of installing puppet and kicking off a manual Puppet agent run (these days I cron Puppet, running the agent as a daemon sucks) and then finally blank out the rc.local script so we never run it again.

Now when we setup an Auto Scaling group, using a VPC subnet (you should use VPC… it saves a pile of headaches).

1
2
3
4
5
6
7
8
$ as-create-launch-config vpclc --image-id ami-YOURAMI \
    --instance-type m1.large --region eu-west-1 --key YOURKEY \
    --user-data 'hostname=prd-web-${id}.aws.acme.net'
 
$ as-create-auto-scaling-group vpcasgroup --launch-configuration vpclc \
    --availability-zones "eu-west-1c" --min-size 1 --max-size 5 \
    --desired-capacity 1 --vpc-zone-identifier "subnet-SOMEVPCSUBNET" \
    --region eu-west-1 --tag "k=Name, v=prd-web-N.aws.acme.net, p=true"

Yup! We stuck a piece of shell script in the user-data, and now the bootstrap script will dynamically replace it with the unique part of the instance ID that we got from the EC2 API and still work with other hosts too.

Of course out in the real world where things are almost never done properly and never really in our control, using ELB is unfortunately a pain in the ass. Half the time I ask people to add a CNAME to one of our servers, a load balancer, etc, they dig the IP address and add an A record – *sigh*. On top of this a CNAME can’t exist on a domain apex, which throws ELB out the window entirely. Route53 can deal with this, but if you think a giant corporate is going to delegate me their entire domain name you’re dreaming.

Enter haproxy, an amazing piece of software that we use in a thousand places at Praekelt. So the question is, since there’s no DNS above (adding automatic updates to Route53 is something I have yet to bother with) how to get these into a load balancer without using ELB which can update dynamically with Auto Scaling. Well, using PuppetDB like I also previously wrote about.

A sample init.pp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class loadbalancer {
   $myhosts = query_nodes('hostname~"prd-web-', ipaddress)
 
   file {'/etc/haproxy/haproxy.cfg':
      ensure  => present,
      content => template('loadbalancer/haproxy.cfg.erb')
   }
 
   service {'haproxy':
      ensure    => running,
      subscribe => File['/etc/haproxy/haproxy.cfg']
   }
 
   package {'haproxy':
      ensure => latest
   }
}

Sample haproxy.cfg.erb

1
2
3
4
5
6
7
8
listen webstuff
   bind *:80
   mode http
   option httpchk GET / HTTP/1.1\r\nHost:\ www.acme.com
   appsession JSESSIONID len 32 timeout 3600
<% myhosts.each_with_index do |ip, i| %>
   server web<%= i %> <%= ip %>:80 check port 80 weight 1 maxconn 1500 inter 10000
<% end %>

And there it is, a rough construct of using Auto Scaling somewhat sanely with Puppet, and your own load balancer.

Open Source in government

On several occasions I’ve seen the Open Source community (more usually, those fanatical about the concept rather than authors within the space) petition for government to stop spending money on software licenses. At its core this is a good idea but along the way I’ve witnessed many do’s and don’ts with this approach. What has worked for me, and gotten not just several bits of open source into government systems but the funding to do it, is the ethos I document here.

Rule number 1: Don’t be a bully

And it’s the only rule. Don’t be a jerk. Yes, it’s your TAX money, no one cares, but the government does actually want your help, and in my experience they will take it every time – but some people there are just as afraid of losing their job as anyone else out there, and extremely afraid of being made to look incompetent. Don’t tell them what to do, guide people to your way of thinking. Get people in your corner and then run with it. Lobbying and making a huge fuss right out the gate will make people oppose you and think you’ve got some agenda, but if they think you’re going to make them look good – you’re golden.

Do NOT try to push Linux on the desktop…

The first thing people want is for every government workstation to run Linux. As someone who writes open source software, and works with it every day, I don’t use Linux on my workstation for a bunch of reasons mostly out of my control. The first is that I usually work on my MacBook, and OS X is great. The second is that X doesn’t cooperate with my desktop. I’m sure someone with more patience could hack it to pieces and make it work, I’m sure I could if I could be bothered – but I don’t, because I actually still require Windows for a few things like playing games and managing servers with their really poorly written management interfaces, one of which only works on IE8. My point is this, attacking the Windows desktop will get you shot down.

   … Because compatibility

Through many years of “my nephew James knows computers” built systems which are only compatible with IE6, Adobe based forms systems, Access databases, VBScript tied into Excel based tools, the list goes on. As someone who spent many years working with government to design and implement open source systems (yes, they really do use some stuff already), these nightmarish systems that most of us only read about on The Daily WTF exist in abundance. We’re talking about the stuff of nightmares here, and very solidly entrenched stuff.

… Because cost

If you know OEM’s you’ll know that somehow they can build cheaper computers than it seems we can build ourselves when paying out our nose for a standalone copy of Windows. Genuinely through various schemes, subsidies and discounts I can tell you that government pays extremely little in the grand scheme of things for licensing Windows. It’s a nominal cost, made more nominal through Microsoft bulk licensing programmes. It would cost them a whole lot more in man hours to replace Windows on their desktops than it would to just continue using it, never mind re-training a few million staff members who are less technically competent than your grandmother.

Build it and they will come

I propose a different tactic, focus on the backend and people facing systems. There is benefit to open source in government, but it’s open source between government. The UK for example has a pretty good system for managing drivers licenses – but South Africa’s is terrible, they spent billions getting ‘eNaTiS’ running and it is a pathetic failure. That’s not entirely the governments fault, it’s also because of the incompetent fraudsters who claim they can build these systems in the first place – the Johannesburg municipal billing crisis being another great example. And then there’s the DNA sequence databases used around the world for crime fighting – why the hell are they all proprietary monsters that can’t talk to one another? Do we not want to solve crime? Why couldn’t we just go “Hey, England, how about you give us a copy of that system you’ve got”. Why did we have to pay Germany 2 billion euros to get a license plate recognition system? Well, it’s not as if there’s an open source system for that.

Problem is we can’t just tell government they have to use open source, come up with a list of reasons why, and then not actually show them how. Government represents an arbitrage opportunity to these dishonest cowboy businesses which have never done anything before, and then disappear into the night leaving increasingly terrible systems in their wake, those are the people we really need to fight and publicly denounce. So, pitch in a tender with an R1.00 price tag if you want them to go open source. These are actually fairly easy systems to build if they’re well thought out, but we first need the right people to start building them.

If we investigate the requirements of these systems and build well designed open source alternatives, then we can actually start talking about asking our governments to use them. I think at that point though we won’t have to do much talking.

Why touch screen information boards annoy the heck out of me

We’ve all seen these things lately, right?

Woolies Store

No, not people shopping at Woolworths. I’m talking about the giant white obelisk like thing poking out of the floor that everyone has to navigate around. These touch screen information directory nightmares which “help” you find a store, I hate them and I wish they would die, and I wish the people making them and selling them to shopping centers would do something useful for humanity, like jump off a bridge.

1. They are trying to solve a problem which did not exist
And now their only purpose in life is to keep the companies which make them in business. A cheap piece of paper stuck onto a board with shop labels and an index of shop names was more than simple enough. It worked fine. There was a sticker showing you where you were, and a map to show you where you were going. If that was honestly too complicated for you, then I’m not sure how you arrived at the store, or how you obtained money to purchase anything from it, and I suggest you visit an optometrist.

2. They’re an invasion of privacy
Sure that’s going to sound nuts at first but think about it, these things are massive, bright, and have crazy huge fonts. So the stores you’re looking for are getting printed for anyone to see. That’s not necessarily dangerous, but it could be under some weird circumstances. In any event, I’m uncomfortable with the fact that someone I don’t know knows where I’m going in a big shopping center. They could follow me to the bank, or to a jewellery store – they could do that anyway, but all they need to do now is hang around the information board to pick their victims.

3. They don’t work without electricity
So neither do most shops these days, but still… they needlessly waste electricity.

4. Only one person can use it at a time
If there was some confused sod at the old dead tree of store locations, you could just lean over their shoulder and invade their personal space until they left or you managed to see what you needed to see. Now you have to wait for the guy to finish searching through the worlds worst case/hyphen/apostrophe sensitive indexing system to find what he wants – or worse browse some colossal, and stupidly specific, nested index of store “genres”. Hmm, is Dion Wired under “electronics”, “home”, “gaming” or “appliances”? That sort of varies based on why you’re going there… Perhaps the guy in a hooded jumper lurking around can help.

5. They’re almost never calibrated correctly
Seriously, every one I’ve ever tried to use had the touch panel so poorly calibrated you had no idea where to press, or you basically had to smash your fist into the thing to get a response, or the software was just terrible. Plus I’m sure after a few thousand people use them they’re ready for the trash.

Just what on earth was wrong with having a sheet of paper? I’m a tech guy, but seriously if it ain’t broke…

(Any reference to Nicolway center is coincidental, it’s a cool place, and these damn things are almost everywhere)