Every morning on the way to the office (on the mornings I bother driving to the office) my route takes me past an odd building. It’s a building which belongs to a drafting school, and what’s interesting about it is that it looks like some sort of beautiful towering piece of Roman architecture. However as you round the bend which passes directly in front of it the truth becomes much more apparent – the building has been entirely encased in a carefully painted sheet. It’s a 10 story high tent containing inside it a hideous concrete monstrosity from the 1970s, and it’s quite convincing until you’re about 10 meters away. Considering what the building is used for I’m sure there’s some deliberate tongue in cheek meaning going on, and I like that it gives me a chuckle every time I drive past it but it also serves as a great reminder of our duty as engineers.
OpenStack was probably one of the most promising things happening over the last 2 years, until of course I actually tried to use it. At the time I was field testing various virtualisation ideas for a private cloud and OpenStack was definitely the top of the list of things which fitted the bill.
The reason it completely failed was much the same reason I’m not buying into the Docker hype just yet – it was a fragile unstable house of cards which constantly felt like it was about to crumble beneath my feet. Virtual machines would constantly get stuck in odd states where the API had no idea what to do, and I had to constantly kill off orphaned libvirt processes and then fiddle with the database to force machines into a stopped state to restart them, and then hope that data was still intact. Granted, most of this is probably because KVM is also quite horrible and fragile, but hopefully new versions would be better right? Of course they weren’t because the upgrade path was so convoluted it wasn’t worth the time and then they introduced a new networking layer which was ludicrously over complicated, architecturally expensive to implement, generally ill-conceived, and (as I was starting to expect from OpenStack at this point) it didn’t even work properly no matter what conflicting piece of documentation was followed. I even got a second set of eyes on it and the conclusion we both reached was quite simply “Yeah, this stuff is just completely broken”. If many more hours had been spent debugging it I’m sure we could have found the issue, logged a ticket with the relevant project and maybe (maybe) it would be fixed. But why bother when there are other options?
The problem is mostly that for some completely misguided reason people think implementing a cloud framework is all about emulating Amazon’s API and architecture. While I can sort of understand the starting point of this logic (taking advantage of client libraries like Boto) the reality is these libraries are not actually good enough to care so much about, or difficult enough not to simply reimplement if the foundation was solid enough for people to adopt. Amazons API is a horrible nightmarish mess of bloated XML, so why would anyone want to recreate it? The problem is now OpenStack is so busy chasing after compatibility with a meaningless entity that little time goes to addressing the mounting piles of technical debt and bugs which underpin the entire thing. Maybe it is better now, maybe it will get better in future, but unfortunately it wasted enough of my time then to not care much about its future.
This is the precise feeling I get with all the Docker hype. I do agree that containers are a good idea for many deployment scenarios and use cases, but in other areas people are solving problems which don’t exist with the wrong tools – like coming up with outrageous comparisons between Docker and Puppet, which is about as useful as comparing the International Space Station to an onion.
Containers are not a replacement or an alternative to configuration management, and if you think they are then you’re just seriously wrong or the work you’re doing is irrelevant to the situation in the first place. Either way, don’t send me your CV.
Misuse of Docker isn’t its biggest problem though, its production readiness is. Good Linux distros lag pretty darn far behind new stuff and this conflicts with anything which requires manipulation of that stack. Even Ubuntu which releases quite often is too broken for me to run in a production server environment, LTS releases are the sensible choice (and my current choice) and even those are always pretty ancient compared to the present and still sparking new compared to RHEL, CentOS or Debian stable.
And so this is how it ends up; a growing pile of bleeding-edge external repositories to keep up with nightly bug fixes and nightly bug introductions, and eventually this is your job as a system administrator. Chasing down hilarious bugs, like that somehow the machine I was testing LXC and Docker stuff on suddenly tried to install a funny version of grub2 and wound up hilariously broken and unbootable, or that it gums around with iptables so much my existing iptables scripts get confused or they blow away the docker changes leaving containers broken (something OpenStack annoyed me with too). When you hit these issues almost no one has encountered stuff before because the domain of use is actually so pitifully small, testing is conducted in a tiny vacuum with not nearly enough different system configurations, and the early adopters are pretty much people who just got lucky and don’t actually understand what’s going on.
Many of these things work great in a perfect world, where you have piles of money, get to pick whatever server hardware or cloud provider you like, and have a clean slate to re-arrange your network. Any experienced person knows that production environments are never this. They’re a hodgepodge of compromise, budget cuts, lack of people and lack of time, and the last thing people working on them need is something that could go horribly wrong at the drop of a hat with no one experienced enough around to fix it quickly at 3AM.
This is why I personally don’t build production systems on luck – I build them with well tested software which has shown itself to be reliable in even the most seemingly outrageous configurations. I especially don’t chose software based on vague testimony from fancy beanbag start-up companies because god only knows what they’re really dealing with there or how dark the rings are under their admins eyes – all you have to do is drive a bit closer and see if it’s actually a solid building or just a tent.