At the moment, the server hosting openttd.org and most of its related webservers, is offline.
A brief history of events:
0700 - Server goes down
1130 - I wake up (he, it was a busy night, don't judge me!), and got told about the outage
1150 - I inform LeaseWeb, our provider, that our server is down, and that a remote reboot doesn't help
1155 - Verbally I confirm my identity, and the issue is send to the engineers
1600 - I call in to check on progress. No engineer has been available, and the ticket has not been handled
1800 - With the weekend coming, I send them another email. No reply so far
2000 - We started to become a bit desperate. #openttdcoop guys have offered to host a temporary placeholder for us, which you can now enjoy. Masterserver is currently being hosted by Rubidium himself. Contentserver will remain offline, as we have little alternatives for that. Lets hope someone picks up our ticket soon, and pushes the 'on' button. Lets hope it really is that simple, and no real hardware damage happened
0100 - Send them another mail. Would at least expected a reply about an estimate when someone will be attending the ticket. Maybe the reply got lost. SMTP is not a 100% delivery protocol after all
0430 - Its alive. ALIVE. LeaseWeb emails me to let me know someone went down the machine and hit the power button. Now wtf went wrong?
0930 - Rubidium wakes up. Boots all services that didn't autoboot for one reason or the other
1130 - I wake up (he! Sue me). Investigating wtf happened ...
1145 - LeaseWeb logs in to our server to investigate with us
1215 - LeaseWeb concludes the same as we did: nothing seems wrong. Strange
Things you might want to know:
2 weeks ago, we had our first outage in 2 years. This is quiet impressive. 99.99% uptime for 2 whole years. I am pretty proud on that. As it goes with most outages, you reboot your server, and you continue.
Few days later, bam. Server down again. This time, remote reboot didn't help. It turned out that our server was shut down. Rebooting doesn't help for this. Nothing in the logs suggest any of this, and we are clueless to what happened. But, as it goes with computers, it happens. We checked the hardware as far as we can (HDDs, memory, ...). Nothing pops up. So .. we hope it is a one-time event, and continue.
Today, bam. Server down again. What the f*** .. this sounds more serious. Normally a NIC still replies to pings when a kernel panic happens. Nothing. Hmm. We should start to worry I guess. Therefore I also requested LeaseWeb, our provider, to look into this. Maybe there are hardware failures? I don't know.
After 7 hours, no engineer has looked at our server yet. Most likely they are really busy. Sadly enough, this means that most webservice are not working. With the weekend coming, we can only hope they look at it in the next few hours. I will at least try to keep you up-to-date. For now, you will have to do with lesser openttd.org. Hope you survive.
"Wait!", you say, "why don't you have mirrors!". Although we have our binaries mirrored, mirroring a highly dynamic website is not easy. Also, with 2 years of 99.99% uptime, there has never been a reason. Things like masterserver and contentserver are also really hard to mirror. So they never have been. Of course this is to us a wakeup call for us to start looking at options here.
"Wait!", you say, "but is the OpenTTD binaries and sourcecode safe?". Yes. All binaries are mirrored. All SVN data is also mirrored (to secret places


I hope you can bear with us. At least I wanted to let you guys know that we know of the problems, but that it is at the moment out of our control. I hope I can give you a more positive update very soon. Bear with us.
[Update 20:10] Temporary website is in the air: http://fallback.openttd.org/
[Update 11:30] All services are restored back to normal. Total downtime: 22 hours.
[Update 12:30] LeaseWeb concludes the same as we did: nothing seems wrong with the hardware, and it was not a clean shutdown. How the machine then ever got powered off completely .. well, we close the investigation for now, in the hope this doesn't happen AGAIN.
[Update 12:40] LeaseWeb outlines they are very sorry it took this long to fix the problem. It has been incredibly busy there. Happens, as with everything. Happens. At least it is fixed now
