OpenTTD.org Server Maintenance

OpenTTD is a fully open-sourced reimplementation of TTD, written in C++, boasting improved gameplay and many new features.

Moderator: OpenTTD Developers

Post Reply
TrueBrain
OpenTTD Developer
OpenTTD Developer
Posts: 1370
Joined: 31 May 2004 09:21

OpenTTD.org Server Maintenance

Post by TrueBrain »

This is a long post to tell you that openttd.org services will be offline in short windows of time for the next few days. In this post I will tell you a bit why etc.




As of yesterday OpenTTD is proud owner of a new server. To push some random stats to you:

Old Server:
- 16GB RAM
- 4 (virtual) cores

New Server:
- 32GB RAM
- 8 (virtual) cores

Now you say: so what? Well .. for a while now our old server was filled to the top. We couldn't really extend anymore in any sane way of form, and the CPU was hurting. So .. a few months back I send planetmaker out to find us something newer and better. And he came back with .. almost twice as good as the old \o/

The old server was kindly sponsored by OVH.de .. and the new one is too :D They are so awesome! The new server is from SoYouStart.com; first off, the management interface of SoYouStart is just awesome. Installing a server is easy, and you can see the progress. Much better than most others I have seen in my life. Small downside? None of us can find where to change your password :D

So, yesterday I got the login details of the new server. Sadly, they haven't gotten to adding our extra IPs yet. This is a bit annoying, as we use XenServer. And XenServer uses the only public IP you get with the server as management interface IP. And you cannot use them inside a VM. So .. I have no public IPs to assign to the new server atm. I am sure they will fix this next week, but as I had a few days off this weekend, I kinda wanted to get started.

Spend the whole day figuring out how to do that .. finally found a nice way. The new server now "signs on" to the old server as they are connected to a (one-way) L2 switch. And at the old server I tell which IPs to route to the new server .. and this works flawless :D Hehehe; epic s***.

The way OpenTTD webservices are setup, allows for this easily. You might not know, but OpenTTD runs around 15 small Virtual Machines, each with their own task. One machine runs mediawiki, the other FlySpray, the next the master server. Next to that, we have a VM for MySQL, one for LDAP, .. And finally, we have a Proxy VM for web. All your http calls go through this single VM. He on his turns decides which VM really knows how to handle the call, and forwards it. All this transparant and without you knowing. In front of everything is a single gateway machine. It protects us from all the evil people in the world. He has a strong firewall, and .. most improtant: it is the only VM that has the public IPs assign. The rest receive the signals via LVS, so they think they are directly connected to the internet .. but are not.

So what has all this to do with a new server? Everything. All internal VMs have internal IPs. So .. I can pick up 1 VM now, and move it to the new server. You won't notice, nor you will be the wiser. The only thing is that the signal bounces .. so there will be a bit more latency. This allowed me to still start the migration, without public IPs on the new server. Win :D



All that said and done .. I made this all work for IPv4. I did not bother with IPv6. In result, during this migration, slowly all IPv6 functionalities will be lost. When I move the last VM, the Gateway VM I talked about earlier, all IPv6 functionality will recover. Till that time .. I am really sorry. I might get bored, and fix it .. but .. not now :D
Seems I got bored enough, and I fixed up IPv6 too .. I hope :D



In the next few hours and days, I will slowly move over all the VMs we have. This is done by first making a snapshot, and moving that. This is 90% of the time. Next, I shutdown every service inside a VM, then rsync the remaining changes. Finally, I boot up the new VM, route the signals to it, and all functionality will be recovered. The downtime should be minimal (minutes, not hours).

I am not going to plan when I am going to migrate which service; you will simply notice (or not). I will update this thread from time to time where it stands.

There is one VM that has a disk allocated to him that is 600GB. That will be my biggest problem. Not sure yet how that will go .. but I will first move most of the others, than I will see how I can move him .. will take roughly 11 hours, its initial move. But hopefully the downtime will be in the minutes. This will be the biggest loss of service (SVN, dev-space, mail, frontpage, ottd_content, ..), so we will see how it turns out.



After moving of all the VMs, I will have to migrate the public IPs too. I have a smart idea for this, where both IPs will serve all services for a window of time, but we will see if this is possible (at all). Will update on that as I know more


If you have any questions (or suggestions), feel free to drop a reply here.
The only thing necessary for the triumph of evil is for good men to do nothing.
Transportman
Tycoon
Tycoon
Posts: 2792
Joined: 22 Feb 2011 18:34

Re: OpenTTD.org Server Maintenance

Post by Transportman »

TrueBrain wrote:There is one VM that has a disk allocated to him that is 600GB. That will be my biggest problem. Not sure yet how that will go .. but I will first move most of the others, than I will see how I can move him .. will take roughly 11 hours, its initial move. But hopefully the downtime will be in the minutes. This will be the biggest loss of service (SVN, dev-space, mail, frontpage, ottd_content, ..), so we will see how it turns out.
Is that only allocated space or is that 600 GB also really used? It sounds like it is so much.
TrueBrain wrote:It protects us from all the evil people in the world.
Those poor evil people, they just want to play OpenTTD :p
Coder of the Dutch Trackset | Development support for the Dutch Trainset | Coder of the 2cc TrainsInNML
TrueBrain
OpenTTD Developer
OpenTTD Developer
Posts: 1370
Joined: 31 May 2004 09:21

Re: OpenTTD.org Server Maintenance

Post by TrueBrain »

Transportman wrote:
TrueBrain wrote:There is one VM that has a disk allocated to him that is 600GB. That will be my biggest problem. Not sure yet how that will go .. but I will first move most of the others, than I will see how I can move him .. will take roughly 11 hours, its initial move. But hopefully the downtime will be in the minutes. This will be the biggest loss of service (SVN, dev-space, mail, frontpage, ottd_content, ..), so we will see how it turns out.
Is that only allocated space or is that 600 GB also really used? It sounds like it is so much.
(..)
In reality this is a full backup of the old old server, which I never managed to migrate away from completely. And no, it is not fully in use .. only around 250GB is allocated; it will depend heavily how the disk is used once it was created how the migration is going to be (only bytes that are acutally written to are send during migration).
The only thing necessary for the triumph of evil is for good men to do nothing.
TrueBrain
OpenTTD Developer
OpenTTD Developer
Posts: 1370
Joined: 31 May 2004 09:21

Re: OpenTTD.org Server Maintenance

Post by TrueBrain »

Most of the VMs are moved, with little to no downtime (~30s per service). This includes MySQL, LDAP, Web Proxy, SSH Proxy, CF, and most of the http related servers; all moved to their new home. The only remaining thing is the 600GB machine .. which will be a lot more tricky to move :D

After that, it is waiting for getting the new public IPs assigned, and migrating everything. For now all signals bounce over the old server, which gives funny bandwidth graphics :D
The only thing necessary for the triumph of evil is for good men to do nothing.
TrueBrain
OpenTTD Developer
OpenTTD Developer
Posts: 1370
Joined: 31 May 2004 09:21

Re: OpenTTD.org Server Maintenance

Post by TrueBrain »

And .. the last VM is also moved to the new server.

All signals are still routed via the old server; hopefully that will be fixed this week too :)
The only thing necessary for the triumph of evil is for good men to do nothing.
User avatar
Phreeze
Director
Director
Posts: 514
Joined: 12 Feb 2010 14:30
Location: Luxembourg

Re: OpenTTD.org Server Maintenance

Post by Phreeze »

do you have to replace the ovh banner now ? :)
TrueBrain
OpenTTD Developer
OpenTTD Developer
Posts: 1370
Joined: 31 May 2004 09:21

Re: OpenTTD.org Server Maintenance

Post by TrueBrain »

Yup; with another OVH banner :D

The last design one of my fellow devs made looked really slick; quiet like it. You will see soon enough :)
The only thing necessary for the triumph of evil is for good men to do nothing.
TrueBrain
OpenTTD Developer
OpenTTD Developer
Posts: 1370
Joined: 31 May 2004 09:21

Re: OpenTTD.org Server Maintenance

Post by TrueBrain »

Hihi, I sometimes can overdo stuff .. today was no exception.

We got our public IPs of the new box. Which of course is awesome. But how are we going to do the migration. Just update DNS, and hope for the best? I am sure that will result in tons of forum posts and bugs reports. I have little to no interest in that .. so I rather find a solution that makes it "just work".

As described earlier, our network setup is rather complex. A bit too complex maybe, but that is not the topic at hand :D


When a packet comes in, say, http://www.openttd.org port 80, it enters the Gateway. By IPVS it is picked up, send to Proxy-Web, where it arrives ... identical to how it arrived on the Gateway. If you know little about networks, you would think: who cares? Well, normally when you forward a packet within an internet network (Masquerade), a few things change. Most noticable, the destination address. Then the normal network stack handles the packet as it was send to your internal server.

Now a few service run in Masquerade mode. But most run in this direct routing mode.

In both cases, they reply to the source address, packet arrives at the default gateway, and he sends the packet back to you. All is well in paradise.


Now of course the question is: why do those two methods matter, and why do I care? The first, where nothing changes, allows me to have 1 VM that listens on multiple public IPs, and him knowing that. I can say: if you lookup IP A, you get this, if you look up IP B, you get that. And if you know a bit about HTTP in this case, you guessed it: that is required for Certificates. Older implemenations of HTTPS require 1 certificate per IP. In the first routing method this is easily possible.

In the second method, we have no way to know for which IP you came looking, without creating multiple internal IPs. And this gets messy real quick. As the internal IPs have no direct relation to the external, it depends on your gateway to make a sane translation. And while working on the internal VM, you have to keep looking up which public IP matches. This is so annoying, and an administrative nightmare. With Direct Routing, on the internal VMs it just looks like they are directly connected to the internet. And they react like this too.

So, I hate Masquerade, and I love Direct Routing. And as such, 90% of our network runs on DR, not on Masq.



So, where does this leave us with the new IPs? Well ...

The Old gateway picks up a packet. Sends it to Proxy - Web, and it can tell from which public IP it comes. It handles the packet, and sends it back to the Old gateway.
The New gateway picks up a packet from the new public IP. Sends it to Proxy - Web, and it can tell from which public IP it came; it is aliased to the old, so it sends back exactly the same in both cases. Here comes the weird part, but the part that makes it work: it handles the packet, and sends it back to the OLD gateway. The Old gateway sees it came from a public IP on the new server, and he sends it to the New gateway. He puts it back on the Internet, and .. it all "just" works.

This same method fails in Masq. Mainly, because you have absolutely no way to identify via which gateway had inserted the packet onto the network. The only way around it is by adding new internal IPs, and that ... is just sucky.

With that, as said, there were a few services running on Masq. I just made network changes to move them to DR too, and it seems to work.


Next up, I launched all services on the new Public IPs too. Now I can "just" make DNS updates, wait for the whole world to pick them up, and shut down the old server. Easy enough. You will never notice the IPs have changed, no downtime, no silly pages or what-ever.


Awesome :D

I like fiddling with networks ;)

Now up to doing the same for IPv6 ;)
The only thing necessary for the triumph of evil is for good men to do nothing.
User avatar
adf88
Chief Executive
Chief Executive
Posts: 644
Joined: 14 Jan 2008 15:51
Location: PL

Re: OpenTTD.org Server Maintenance

Post by adf88 »

I'm getting timeouts while trying to upload a task on the flyspray. I tried three times in last few minutes, with no success. Same IP as this post was sent from.
Attachments
flyspray-upload-timeout.png
(160.29 KiB) Downloaded 4 times
:] don't worry, be happy and checkout my patches
TrueBrain
OpenTTD Developer
OpenTTD Developer
Posts: 1370
Joined: 31 May 2004 09:21

Re: OpenTTD.org Server Maintenance

Post by TrueBrain »

Would you mind trying again? Should work now .. I hope :D
The only thing necessary for the triumph of evil is for good men to do nothing.
TrueBrain
OpenTTD Developer
OpenTTD Developer
Posts: 1370
Joined: 31 May 2004 09:21

Re: OpenTTD.org Server Maintenance

Post by TrueBrain »

IPv6 was also successfully done; all our services are now reachable by old and new IPs.


With that, I altered our DNS to start the migration to the new IPs. Within 5 minutes 50% of the traffic was already done via the new IPs. Lot faster than I personally expected ... normally DNS updates take FOR EVER :P


Now it is a waiting game to see if/when all traffic via the old IPs dies out. I can keep monitoring if systems still use the old IP of things I possibly forgot to change etc.
The only thing necessary for the triumph of evil is for good men to do nothing.
User avatar
adf88
Chief Executive
Chief Executive
Posts: 644
Joined: 14 Jan 2008 15:51
Location: PL

Re: OpenTTD.org Server Maintenance

Post by adf88 »

TrueBrain wrote:Would you mind trying again? Should work now .. I hope :D
It's fine now.

I'm a beginner in this matter, but I also do like to struggle with this whole network stuff :)

Have fun :P
:] don't worry, be happy and checkout my patches
TrueBrain
OpenTTD Developer
OpenTTD Developer
Posts: 1370
Joined: 31 May 2004 09:21

Re: OpenTTD.org Server Maintenance

Post by TrueBrain »

A moment ago I cut the old IP, and no longer information is forwarded from the old to the new IP. This should be of no consequence to anyone what-so-ever.

With this, the whole migration is completed. I will give it a few more days to monitor if any problems arrise. Possible some system had an IP hardcoded, you never know ...
The only thing necessary for the triumph of evil is for good men to do nothing.
Post Reply

Return to “General OpenTTD”

Who is online

Users browsing this forum: Semrush [Bot] and 5 guests