OpenTTD.org downtime

OpenTTD is a fully open-sourced reimplementation of TTD, written in C++, boasting improved gameplay and many new features.

Moderator: OpenTTD Developers

TrueBrain
OpenTTD Developer
OpenTTD Developer
Posts: 1370
Joined: 31 May 2004 09:21

OpenTTD.org downtime

Post by TrueBrain »

(all times are in CET)

At the moment, the server hosting openttd.org and most of its related webservers, is offline.

A brief history of events:

0700 - Server goes down
1130 - I wake up (he, it was a busy night, don't judge me!), and got told about the outage
1150 - I inform LeaseWeb, our provider, that our server is down, and that a remote reboot doesn't help
1155 - Verbally I confirm my identity, and the issue is send to the engineers
1600 - I call in to check on progress. No engineer has been available, and the ticket has not been handled
1800 - With the weekend coming, I send them another email. No reply so far
2000 - We started to become a bit desperate. #openttdcoop guys have offered to host a temporary placeholder for us, which you can now enjoy. Masterserver is currently being hosted by Rubidium himself. Contentserver will remain offline, as we have little alternatives for that. Lets hope someone picks up our ticket soon, and pushes the 'on' button. Lets hope it really is that simple, and no real hardware damage happened
0100 - Send them another mail. Would at least expected a reply about an estimate when someone will be attending the ticket. Maybe the reply got lost. SMTP is not a 100% delivery protocol after all
0430 - Its alive. ALIVE. LeaseWeb emails me to let me know someone went down the machine and hit the power button. Now wtf went wrong?
0930 - Rubidium wakes up. Boots all services that didn't autoboot for one reason or the other
1130 - I wake up (he! Sue me). Investigating wtf happened ...
1145 - LeaseWeb logs in to our server to investigate with us
1215 - LeaseWeb concludes the same as we did: nothing seems wrong. Strange

Things you might want to know:

2 weeks ago, we had our first outage in 2 years. This is quiet impressive. 99.99% uptime for 2 whole years. I am pretty proud on that. As it goes with most outages, you reboot your server, and you continue.
Few days later, bam. Server down again. This time, remote reboot didn't help. It turned out that our server was shut down. Rebooting doesn't help for this. Nothing in the logs suggest any of this, and we are clueless to what happened. But, as it goes with computers, it happens. We checked the hardware as far as we can (HDDs, memory, ...). Nothing pops up. So .. we hope it is a one-time event, and continue.
Today, bam. Server down again. What the f*** .. this sounds more serious. Normally a NIC still replies to pings when a kernel panic happens. Nothing. Hmm. We should start to worry I guess. Therefore I also requested LeaseWeb, our provider, to look into this. Maybe there are hardware failures? I don't know.

After 7 hours, no engineer has looked at our server yet. Most likely they are really busy. Sadly enough, this means that most webservice are not working. With the weekend coming, we can only hope they look at it in the next few hours. I will at least try to keep you up-to-date. For now, you will have to do with lesser openttd.org. Hope you survive.

"Wait!", you say, "why don't you have mirrors!". Although we have our binaries mirrored, mirroring a highly dynamic website is not easy. Also, with 2 years of 99.99% uptime, there has never been a reason. Things like masterserver and contentserver are also really hard to mirror. So they never have been. Of course this is to us a wakeup call for us to start looking at options here.

"Wait!", you say, "but is the OpenTTD binaries and sourcecode safe?". Yes. All binaries are mirrored. All SVN data is also mirrored (to secret places :D), to more than one place. So OpenTTD itself as project is safe. Always. We made damn good sure of that, when we lost our SVN 5 years ago ;)

I hope you can bear with us. At least I wanted to let you guys know that we know of the problems, but that it is at the moment out of our control. I hope I can give you a more positive update very soon. Bear with us.

[Update 20:10] Temporary website is in the air: http://fallback.openttd.org/
[Update 11:30] All services are restored back to normal. Total downtime: 22 hours.
[Update 12:30] LeaseWeb concludes the same as we did: nothing seems wrong with the hardware, and it was not a clean shutdown. How the machine then ever got powered off completely .. well, we close the investigation for now, in the hope this doesn't happen AGAIN.
[Update 12:40] LeaseWeb outlines they are very sorry it took this long to fix the problem. It has been incredibly busy there. Happens, as with everything. Happens. At least it is fixed now :D
The only thing necessary for the triumph of evil is for good men to do nothing.
Arie-
Director
Director
Posts: 593
Joined: 20 Jan 2009 16:07

Re: OpenTTD.org downtime

Post by Arie- »

Anyone who wants to download something OpenTTD related, you'll have to browse the directories yourself, but still it works: ftp://ftp.snt.utwente.nl/pub/games/openttd/binaries/
Kogut
Tycoon
Tycoon
Posts: 2493
Joined: 26 Aug 2009 06:33
Location: Poland

Re: OpenTTD.org downtime

Post by Kogut »

TrueBrain wrote:"Wait!", you say, "but is the OpenTTD binaries and sourcecode safe?". Yes. All binaries are mirrored. All SVN data is also mirrored (to secret places :D), to more than one place. So OpenTTD itself as project is safe. Always. We made damn good sure of that, when we lost our SVN 5 years ago ;)
What with bugtracker?
Correct me If I am wrong - PM me if my English is bad
AIAI - AI for OpenTTD
TrueBrain
OpenTTD Developer
OpenTTD Developer
Posts: 1370
Joined: 31 May 2004 09:21

Re: OpenTTD.org downtime

Post by TrueBrain »

Kogut wrote:
TrueBrain wrote:"Wait!", you say, "but is the OpenTTD binaries and sourcecode safe?". Yes. All binaries are mirrored. All SVN data is also mirrored (to secret places :D), to more than one place. So OpenTTD itself as project is safe. Always. We made damn good sure of that, when we lost our SVN 5 years ago ;)
What with bugtracker?
Bugtracker, wiki and the others do not have backups. They are not primary to the project's survival. I guess the wiki can be considered primary, and I guess I will start backups for it soonish. No clue why I never did that before tbh .. :)

But no worries, the server will be back online :) And as all data is in RAID-Mirror, a lot has to go wrong for it to be lost :) My remark was only meant for those long enough here to still remember the last 'more than 6 hour unannounced outage', which resulted in days without server, and no SVN in the end :p This will not happen :)
The only thing necessary for the triumph of evil is for good men to do nothing.
Alberth
OpenTTD Developer
OpenTTD Developer
Posts: 4766
Joined: 09 Sep 2007 05:03
Location: home

Re: OpenTTD.org downtime

Post by Alberth »

TrueBrain wrote:which resulted in days without server, and no SVN in the end :p This will not happen :)
Even if all your backups fail, there are a lot of people that have a pretty much complete hg or git clone.
User avatar
orudge
Administrator
Administrator
Posts: 25223
Joined: 26 Jan 2001 20:18
Skype: orudge
Location: Banchory, UK
Contact:

Re: OpenTTD.org downtime

Post by orudge »

Arie- wrote:Anyone who wants to download something OpenTTD related, you'll have to browse the directories yourself, but still it works: ftp://ftp.snt.utwente.nl/pub/games/openttd/binaries/
There are also mirrors available at:

http://us.binaries.openttd.org/binaries/
http://gb.binaries.openttd.org/binaries/
http://hu.binaries.openttd.org/binaries/

I think those are all, but I may have forgotten others!
SHADOW-XIII
Tycoon
Tycoon
Posts: 14275
Joined: 09 Jan 2003 08:37

Re: OpenTTD.org downtime

Post by SHADOW-XIII »

TrueBrain wrote:0700 - Server goes down
1130 - I wake up (he, it was a busy night, don't judge me!), and got told about the outage
have you considered using Nagios? I got nagios sending me email/text* when server is not accessible for some time (20min I think) from other server
and configured my phone to keep ringing like crazy if got the message (even at night) :) ... although hours could be adjusted freely, in my case the website has to go 24h/day

this time wouldn't help but generally speaking, you could have nagios monitor not a server as a whole but single processes as well in case just apache/sql dies

*using premium service, mail to text
what are you looking at? it's a signature!
User avatar
Lord Aro
Tycoon
Tycoon
Posts: 2369
Joined: 25 Jun 2009 16:42
Location: Location, Location
Contact:

Re: OpenTTD.org downtime

Post by Lord Aro »

So basically, we (or you ;) ) are in fairly deep s***, but not quite as deep as 5 years ago. Correct?

Also, when did downtime actually occur? In the first post it states 7:00 CET, but i noticed it before 7:00 GMT and reported it on irc at around 7:10, as SmatZ will testify ;)
AroAI - A really feeble attempt at an AI

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration. --Edsger Dijkstra
TrueBrain
OpenTTD Developer
OpenTTD Developer
Posts: 1370
Joined: 31 May 2004 09:21

Re: OpenTTD.org downtime

Post by TrueBrain »

SHADOW-XIII wrote:
TrueBrain wrote:0700 - Server goes down
1130 - I wake up (he, it was a busy night, don't judge me!), and got told about the outage
have you considered using Nagios? I got nagios sending me email/text* when server is not accessible for some time (20min I think) from other server
and configured my phone to keep ringing like crazy if got the message (even at night) :) ... although hours could be adjusted freely, in my case the website has to go 24h/day

this time wouldn't help but generally speaking, you could have nagios monitor not a server as a whole but single processes as well in case just apache/sql dies

*using premium service, mail to text
I do this for free, in my free time, because I like it. Hell no I am going to let it interrupt my sleep :D Sorry ....

I already have those interruptions in my profession ;) (I work for an ISP :p Not LeaseWeb, mind you ;))
The only thing necessary for the triumph of evil is for good men to do nothing.
TrueBrain
OpenTTD Developer
OpenTTD Developer
Posts: 1370
Joined: 31 May 2004 09:21

Re: OpenTTD.org downtime

Post by TrueBrain »

Lord Aro wrote:So basically, we (or you ;) ) are in fairly deep s***, but not quite as deep as 5 years ago. Correct?

Also, when did downtime actually occur? In the first post it states 7:00 CET, but i noticed it before 7:00 GMT and reported it on irc at around 7:10, as SmatZ will testify ;)
Nah, we are not in deep s*** in any way. We just have to wait for an engineer to pick up our ticket, walk to the machine, and figure out what is wrong. I was just trying to be funny, which clearly failed ;)

7:00 CET is 6:00 GMT, so I guess that is possible. I don't do exact time. I don't care :p It was early morning .. do we really need to nitpick about the exact time it happened, and under which conditions? I can tell you it was snowing outside here at the time it happened :p Ghehe :) No, but seriously dude, nobody cares if it was 0600, 0700, 0800, or any time in between :) It went down. That matters ;)
The only thing necessary for the triumph of evil is for good men to do nothing.
User avatar
Lord Aro
Tycoon
Tycoon
Posts: 2369
Joined: 25 Jun 2009 16:42
Location: Location, Location
Contact:

Re: OpenTTD.org downtime

Post by Lord Aro »

Ok, that makes sense. I was tninking that 7:00 CET was 8:00 GMT
But, as you say, that doesn't matter...
Now all that needs to be done is to get an engineer playing OTTD...fixed in no time! :D (yes i do realise that isn't how it works)


N.B. I was also trying to be funny, but that obviously didn't get through either ;)
AroAI - A really feeble attempt at an AI

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration. --Edsger Dijkstra
TrueBrain
OpenTTD Developer
OpenTTD Developer
Posts: 1370
Joined: 31 May 2004 09:21

Re: OpenTTD.org downtime

Post by TrueBrain »

Humor on internet is a tricky thing ;)

And yeah ... we now need someone who works at Leaseweb who says: "WHAT? 12 hours and no reply? That can't happen to OpenTTD!" And fixes it :D
The only thing necessary for the triumph of evil is for good men to do nothing.
User avatar
kamnet
Moderator
Moderator
Posts: 8705
Joined: 28 Sep 2009 17:15
Location: Eastern KY
Contact:

Re: OpenTTD.org downtime

Post by kamnet »

12 hours and no reply? Sheesh, I thought my hosting provider was bad. :-/
User avatar
XeryusTC
Tycoon
Tycoon
Posts: 15415
Joined: 02 May 2005 11:05
Skype: XeryusTC
Location: localhost

Re: OpenTTD.org downtime

Post by XeryusTC »

Well, you can't really complain about it that much because AFAIUI hosting is basically free. Even though even in that case it is quite nasty that they let such a thing go on for such a long while without a proper response. I guess that there are bigger things going on at LeaseWeb than just the OTTD server ;)
Don't panic - My YouTube channel - Follow me on twitter (@XeryusTC) - Play Tribes: Ascend - Tired of Dropbox? Try SpiderOak (use this link and we both get 1GB extra space)
Image
OpenTTD: manual #openttdcoop: blog | wiki | public server | NewGRF pack | DevZone
Image Image Image Image Image Image Image
User avatar
orudge
Administrator
Administrator
Posts: 25223
Joined: 26 Jan 2001 20:18
Skype: orudge
Location: Banchory, UK
Contact:

Re: OpenTTD.org downtime

Post by orudge »

XeryusTC wrote:Well, you can't really complain about it that much because AFAIUI hosting is basically free.
It's far from free, hence why OpenTTD has had fundraising drives in the past. While LeaseWeb do sponsor the hosting, we still pay a good chunk for it. And, even if they did nothing else, one would hope they would be able to respond to a simple server restart request within a reasonable timeframe.
User avatar
kamnet
Moderator
Moderator
Posts: 8705
Joined: 28 Sep 2009 17:15
Location: Eastern KY
Contact:

Re: OpenTTD.org downtime

Post by kamnet »

XeryusTC wrote:Well, you can't really complain about it that much because AFAIUI hosting is basically free. Even though even in that case it is quite nasty that they let such a thing go on for such a long while without a proper response. I guess that there are bigger things going on at LeaseWeb than just the OTTD server ;)
I disagree, you can most certainly complain when you're not getting responses. Even a simple "we're up to our necks in thicknet and fighting off battle bots" would be a nice acknowledgment.
User avatar
ChillCore
Tycoon
Tycoon
Posts: 2868
Joined: 04 Oct 2008 23:05
Location: Lost in spaces

Re: OpenTTD.org downtime

Post by ChillCore »

TrueBrain wrote: ... 99.99% uptime for 2 whole years ...
While yesterday's events were a little annoying, that is a pretty stable service.


Anyway, the server is back. YAY. :)

Edit: spelling
Last edited by ChillCore on 04 Dec 2010 10:41, edited 1 time in total.
-- .- -.-- / - .... . / ..-. --- .-. -.-. . / -... . / .-- .. - .... / -.-- --- ..- .-.-.-
--- .... / -.-- . .- .... --..-- / .- -. -.. / .--. .-. .- .. ... . / - .... . / .-.. --- .-. -.. / ..-. --- .-. / .... . / --. .- ...- . / ..- ... / -.-. .... --- --- -.-. .... --- --- ... .-.-.- / ---... .--.

Playing with my patchpack? Ask questions on usage and report bugs in the correct thread first, please.
All included patches have been modified and are no longer 100% original.
TrueBrain
OpenTTD Developer
OpenTTD Developer
Posts: 1370
Joined: 31 May 2004 09:21

Re: OpenTTD.org downtime

Post by TrueBrain »

All services are restored as of around 0930. Investigation still on its way.
The only thing necessary for the triumph of evil is for good men to do nothing.
User avatar
Lord Aro
Tycoon
Tycoon
Posts: 2369
Joined: 25 Jun 2009 16:42
Location: Location, Location
Contact:

Re: OpenTTD.org downtime

Post by Lord Aro »

W00p w00p! :mrgreen:
Now we wait patiently to see what the problem was...
Could it be something as simple as some engineer hitting the power switch when doing something else nearby?
AroAI - A really feeble attempt at an AI

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration. --Edsger Dijkstra
TrueBrain
OpenTTD Developer
OpenTTD Developer
Posts: 1370
Joined: 31 May 2004 09:21

Re: OpenTTD.org downtime

Post by TrueBrain »

Final Update:

Both we as LeaseWeb have no clue what happened. As far as can be told without rebooting the machine, everything is how it should be. If it might happen again, they will shut down the machine completely, and take a better look at it. For now we are just happy it is back online again ;)

Software diagnostics only take you so far. Let's hope it was a cosmic ray that hit our server.

Either way, I would like to thank LeaseWeb for their time to look at the problem too.
The only thing necessary for the triumph of evil is for good men to do nothing.
Post Reply

Return to “General OpenTTD”

Who is online

Users browsing this forum: No registered users and 22 guests