For some it is common knowledge. For some it is a blackbox. For some it is a mystery. What? Our OpenTTD Infrastructure!
When people see a website, they always assume it is a simple website, with simple code behind it. How hard can it be?! Well, in this post I will try to explain a few ins and outs on it.History
When I took job as System Operator, almost 8 years ago, all I had was a PHP website from the previous (crashed) server, and the latest SVN checkout of a broken (unrecoverable) server. I volunteered to take over hosting, as back then I could offer free-of-charge hosting on my own personal server. There were only a few demands: never again should one crashed server bring openttd.org down for such a long time (days, maybe even weeks; my memory is not _that_ good
). And, much more important, never again should the SVN be lost. How and what this got lost exactly is an interesting story on its own; sufficient to say, people wanted money from OpenTTD, there were we had no money.
Days later openttd.org launched on a new server, with a new SVN. As it has been my job for more than 10 years now hosting servers, I of course started to add stuff .. stuff that would make life easier for developers.
So came dev-space: a place for developers to upload their patches etc. Maillist? Sure! IRC bot? Let me start up eggdrop, I am sure we can handle that. Owh, a bug tracker? Etc etc etc ... things spiral out of control fast. Then came the big one. Up to then, I believe it was 2005 or something, we compiled the binaries on our own machines, and send them to a single place. This was highly inefficient. If someone was asleep, binaries could arrive days after a release. Often this made a release bumpy, and plain annoying. Nightlies? Ugh, don't get me started. So, in a wild mood I said: how about a compile farm? I am sure I can cook something up.
MiHaMiX jumped on the wagon: I will host the CPU for it. Days later, we have a compile farm which poop'd out nightlies and releases on demand. And .. the complexity increased even more.
Years later, when rewriting openttd.org (the website), I noticed the need for a centralized account system. Back then (2009) you had to signup for FlySpray (bug-tracker), media-wiki, and then too: main webpage (BaNaNaS, WT3, ..). Omg. Euh ... LDAP! Where are you? So ... we now also have a centralized account system. Right .. so how complex is our system by now?
A few months ago (years now I guess), we got a XenServer. That made me extremely happy, as till then we were doing it on vserver, which is not as nice as I always hoped it would be. XenServer is.
For the last 3 years most of my life is occupied with a lot of things, OpenTTD not being one of them. Rubidium took over most day-to-day issues, where I was only named when something happened. Mostly I take care of things relative quickly, but a few remain behind: new WT, new BaNaNaS, new website ... I guess you all know that after logging in, you are not redirected anywhere, right? Bad TrueBrain! Well, as life goes on, it turns out, you have less and less time for stuff you want to do
From time to time I roll out small updates, most of them unnoticeable .. but nothing big. It also seems that the developers are out of new ideas to add. I still have plenty of things I would like to add, but my lack of free time kinda kills them
I also know most ideas come very quick. BaNaNaS for example was thought up and implemented in days. Not weeks. Days. That is how we roll baby
So, that brings us at today. Small piece of history. Lovely, aint it?Infrastructure
So, what is it that we have atm?
- XenServer (kindly sponsored by OVH.de) with the following VPSes:
- 4 VPSes for the compile farm (well, 5, but one is offline)
- Gateway (to filter all those bad people out)
- LDAP server
- MySQL server (poorly tuned; WTB: MySQL expert)
- Django (http://www.openttd.org
, ... all our normal webpages basically)
- FlySpray (bug-tracker)
- MediaWiki (wiki)
- Other (MasterServer, Content Service, ... more the OpenTTD client specific services run here)
- SSH Proxy
- Web Proxy
- Mirror network
- Several servers through-out the world (geolocation-wise) to help distributing our content from as close to you as possible.
So, lets talk a tiny bit about each of them really short.Compile Farm
For the Compile Farm we run Bamboo. This is very nice software. We have a controller, which hands out the jobs, prepares them, etc etc. As a nice extra it makes the 'docs' and 'source' targets, as .. why not. We have 2 Linux slaves, which are clones of each other (you figure out how that is possible
). It has 20 distributions installed in it (via jails), 4 debians, 2 gentoo, 2 generic, 1 macosx, the rest ubuntu. It produces both i686 as amd64. The last slave of course is Windows. It compiles the windows binaries.
After each compilation, these files are send to one of the other VPSes. From here on an rsync is started to the mirrors, which puts the files online and available.Gateway
We serve our content in both IPv4 as IPv6. As we tend to be ahead, we implemented IPv6 in our client long ago, but even before that we added it to our websites. Internally, this is a bit of a nightmare. How to route information safely, but still keeping up with the ideas of IPv6? (each service its own IP). Well, this gateway machine takes care of that. It routes the information correctly, over both IPv4 as IPv6, so other VPSes can handle it. They see the information coming as if it came directly from the outside world, not knowing it went via a gateway. This allows me to redirect any port and IP freely to which ever VPS I feel required. Saves a lot of downtime etc I promise you
For those who are interested: we use 'ipvsadm' for it mainly, in a DR setup. Look it up if you are curious, it is pretty smart s*** LDAP / MySQL
What is there to say? Both services run in a almost vanilla situation, and services us information (both are databases, FYI).Django
Most of our website stuff is written in Django. I like Django. It is a personal taste, and not really any other reason we use it. Many would ask: why not PHP? Well .. PHP is stateless, and it makes it rather hard to do some thing. Django is stateful, which means I can do some nice tricks to make sites really really quick. Because the websites don't run on the same machine as the binaries are stored, as example, we need to cache information we fetch. A bit more verbose: on the top of the website you see our latest stable version. This is fetched via http://finger.openttd.org/versions.txt
. This file is updated by all kinds of scripts; mostly by the Compile Farm. So because Django is stateful, it only requests this file once every 5 minutes. For PHP, I either have to cache the file locally (ugh, HACK!), or fetch it every time. Reasons like this makes me love Django. But again, personal taste. Not really another reason
Stuff like BaNaNaS and WebTranslator 3 are also written in Django. Same reason. Highly complex sites. Crazy s***. BaNaNaS is in the SVN of OpenTTD, go check it out for s*** and giggles. WT3 is "closed source", mostly because it is absolute crap. Many people don't like me for this choice; I can't blame them. But I learnt the hard way that when you release software, people demand support on it, if not alone in the form of: what did you mean here? I have a patch, can you apply it? Etc etc. The code is in such a bad state, and I have such little time, I said: no. Might change in the future ... I might even rewrite it .. if .. I find time of course FlySpray
Little to say. nginx + PHP(-fpm), serving FlySpray, an open source bug tracker. Sadly, not really supported, although there has been a new release (after 2 years) last month. It might pick up again? Does kinda need replacing with more modern software .. possibly Jira.MediaWiki
Owh, MediaWiki. How I hate you. It serves our wiki. I never did anything in this VPS. Rubidium has set it up, and maintains it, at least I think. Mediawiki for me is like Wordpress, Drupal, Joomla, ... so much used, so common, that there is a security problem every Thursday, which requires you to update, .... personally dislikes. You got to love them. But it is serving our wiki just fine, and fast. Same as FlySpray: nginx + PHP(-fpm), with a lot of memory, memcached, the s*** and the giggles. Boring VPS Other
We have a lot of custom services: ottd_master, ottd_content, balancer, mirror scripts, ... They all should be in this VPS ... they are not all there yet. But this VPS is serving them. All source of all servers can be found in our SVN. What more to say .... nothing rly SSH Proxy
One of my favourites. Very soon when building such a complex infrastructure, you have the issue: how do I get in different boxes? With some scripting and with some clever changes, this VPS redirects SSH calls to the right server. For example, if I connect to SSH, I end up at another server than you would. And our developers end up on yet another server. I can also tell it where I want to end up at ... for example, if I want to go to the django VPS, I can tell it so. It works very nice, works with SCP and everything. Rather proud of it. Sadly, nobody sees it, as it is transparent. Sad. Sad sad sad. It also handles the validation of each user, checks where it has access to and not, etc etc. It was built mostly to allow what dev-zone is doing now, so little need now. But the infrastructure remains .. who knows for what we need it Web Proxy
nginx. Where the gateway can route incoming IPs + Ports, web is differently. We have so many subdomains, we cannot put them all on a different IPv4 (we can on a IPv6 ofc). So, there this proxy comes in. It reads the HTTP header, and redirects accordingly. Mostly transparent. Mostly, as it does take care of gzip compression for example. No other httpd in our network does. But all pages served by openttd.org will be gzip'd when asked so. Very nice stuff. It doesn't cache or anything. Runs at 10 MiB of RAM. I love nginx. Hmmm.....Conclusion
Well, that sums it up I guess. Of course this is just an overview of what we run. The devil is in the details, so to say. A lot of glue exists to make sure everything behaves nicely. But if you know me a bit, most communicates with some kind of API of some form, allowing me to replace blocks and parts without damaging anything else. I keep updating and modernizing stuff where I see fit ... mostly without other devs knowing. They mostly find out when things go wrong, always fun things to explain Future work
At the moment I work on a new website ( http://www-test.openttd.org
). You might notice there is nothing different with the current active website. It means you are blind, but it is mostly true: nothing changed in the design. Everything changed at the background. We now have support for multi-language (in a working matter), very aggressive caching (look how fast it is), better navigation, more clear integration of all subdomains, and a few other nice additions (check the Server List jo!).You do this alone?
As final part, and it deserves its own spot: yes, I do this alone. Well, Rubidium does help out. But mostly I do it alone. Why you wonder? Simply put: because there never has been anyone qualified enough to show up. No clue why tbh .. either people are afraid, people think it is taken care of, or people think I bite. Well, I sometimes do bite ....
For the Django part lately I have Xaroth to help me out. It already helps a lot. Andythenorth is going to help out with BaNaNaS, which is great. I still do, and always will I guess, more people to help out. Another qualified (trust-worthy!) System Operator would be nice. Fresh air/ideas, always good. I guess the main issue would be: trust-worthy. It is hard to come by, sadly. We have a lot of visitors a day, and with that a lot of data that is being processed. We remove as much as we can as quick as we can in regards to personal (trackable) information, but it still leaves a lot of data .. yeah, a System Operators needs to be trusted. A lot.
But much more important what I need, and people tend to walk passed this a lot, is feedback. Without feedback, I don't know if anything needs improvement. I know the bugtrackers has some suggestions, which are ignored for months and months. But that mostly has 3 reasons: 1) it is something huge, 2) it is unrealistic, or 3), much more common: it is vague. 1 liners. 10 words. No screenshot. No context. So, if you ever want to help out, spot something off/wrong, see something weird .. please do give feedback. Of course, compliments are also welcome (fishing much?). But be clear. Explicit. Rather too much info, than to little. How nice, I managed to fiddle this in this thread too .. hihi
A quick example: do you know that on all our domains that has the OpenTTD logo in the header (go to http://www.openttd.org/
), has the logo off by 1 pixel? Look close. It looks wrong. Now go to the test site ( http://www-test.openttd.org/
). It is fixed there. This 'bug' has been there for more than a year now, and only 1 (!) person complained about it. Funny aint it?
Anyway, I am drifting off. Me? Never!Want to help out?
If you want to help out, drop by. IRC is easiest, possibly required. I (read: we) are always looking for fresh blood to help out, there is enough to do. Drop by, ask, talk ... and the above should give you enough idea what you should be capable off before doing so Closing up
To close this up, let me do that in a very open form (hihi, close .. open ...): if you have any questions regarding our infrastructure, just drop your question here. I promise I will do my best to answer. So yeah .. the floor is yours!
Tnx for reading