nVIDIA CUDA, can we make the server do all calculations?

Gussoh · Post by **Gussoh** » 01 Apr 2008 15:07

Hi!

We are a group of devoted openttd players studying at Chalmers University. We are currently doing a project called "Optimizing games with CUDA". CUDA is a framework for using newer nVIDIA graphic cards for arbitrary, extremely parallel, code.

We are looking for a game to optimize and thought of openttd since we sometimes are having problems with openttd-games with more than a few thousand vehicles. The slowest computers cannot play and so the company with the players with the fastest computers win.

We can probably optimize the openTTD-code using CUDA but I guess it wont do any good since all calculations are done on all clients (If I am not mistaken?). This is since the slowest computers wont have a nVIDIA 8000-series graphics card anyway.

Is there a way of letting the server do all calculations?

And if there is not; do you perhaps have any other suggestions of games we might optimize? We are looking for open source-games using C or C++ which are CPU-demanding and with active developers. Its especially interesting if the game is not very graphics intensive.

Frostregen · Post by **Frostregen** » 01 Apr 2008 15:36

Just a quick thought:
One reason to let clients and servers calculate each step is to reduce network traffic.

If the server calculates everything alone, he has to send position updates/status values etc from every vehicle, station, tree, industry etc to each client, every tick.
You could end up using all saved client cpu power just for sending/recieving all this stuff over network.
(If there are no bandwidth problems before...)

But even high powered pc's will come down to a halt when map is to large, and/or too many vehicles are on the map.
So it could still be interesting to speed up OpenTTD with CUDA.

Post by **Rubidium** » 01 Apr 2008 17:18

The amount of game state change is *way* too much to send over the network, and only sending the part where the 'user' is looking at would make it lag in such a way that playing becomes pretty hard; everytime the person opens a window it needs to be populated from the server and so on.

For a big fat internet connection that is not a problem, but for someone with 32-64 kilobytes of upload it becomes impossible to be the server because the bandwidth to update the clients would be too small.

Gussoh · Post by **Gussoh** » 01 Apr 2008 22:17

Sounds reasonable.

We have been thinking some more and thinking about if we should try to do it anyway. Just to show that we can have a game with like 15000 vehicles, all using advanced path finding. Are there any other limits to how many vehicles there can be in a game?

Post by **Rubidium** » 01 Apr 2008 22:48

What would be amazing would be a 2048x2048 all water map, with two islands in the northern and southern corner and then having 15000 ships running between them using YAPF as pathfinder, that runs smoothly.

~65000 is the limit on the number of vehicles. This includes the wagons, smoke, electric sparks and some others.

egladil · Post by **egladil** » 01 Apr 2008 23:10

May I ask what course this project is part of since I'm a Chalmers student myself?

NikiB · Post by **NikiB** » 02 Apr 2008 10:30

Interesting, if you need people for testing builds, ive got a 8600gt inside (winxp, 2,33 ghz ic2duo, I can install and use linux too of needed).

skidd13 · Post by **skidd13** » 02 Apr 2008 11:02

Well AFAIK the main CPU load is caused by the pathfinder. So if you modify the yapf code in a way you can choose from configure if you want to use CUDA or not I'm pretty sure that someone of the devs will care about.

Gussoh · Post by **Gussoh** » 02 Apr 2008 13:39

egladil wrote:May I ask what course this project is part of since I'm a Chalmers student myself?

We are doing the bachelor thesis.

Zephyris · Post by **Zephyris** » 02 Apr 2008 15:15

This sounds very interesting, I also have a 8600GT which just sits idle when I play OpenTTD at the moment! Would the code changes lay the groundwork for any kind of parallel processing of pathfinders? Groundwork for multi-threading for multi-core processors would be a very nice side effect...

Bilbo · Post by **Bilbo** » 02 Apr 2008 15:35

I have 8800GT + Q6600 quad core. OpenTTD uses only 25% of CPU power (50% while saving game). So, openttd would benefit surely from parallelization as nowaday most new computerrs are multicore and when you plug in things like CUDA or Cell chip (CBE) you have tens or hundred of threads that can be run in parallel (Imagine running like 8192x8192 map with 60000 vehicles completely fluently - now my computer sometimes choke with only approx 3000 vehicles on such a large map :). Currently one of the problems is that things (like in pathfinding) depend on Random(), so they depend on order in which they are called, thus they are not really parallelizable.

For things, where random does not need to be THAT much random (like pathfinding) the random could be replaced by some "static" random function that will use the x,y,z and time coordinates, XOR them together and use as a seed. the random function would get deterministcic, but will still change with progression in time or space (so trains having to choose between two equal paths will still split in about 50% to left and in 50% to right), so it should be good and it will allow parallelization.

Post by **Rubidium** » 02 Apr 2008 16:08

Imagine the following: two trains waiting for the same junction.

On computer A: two threads run the pathfinder and train B's pathfinder was faster, causing B to go first and A has to wait.

On computer B: two threads run the pathfinder and train A's pathfinder was faster, causing A to go first and B has to wait.

One fine example of the thing we call desync. The logic of the pathfinders is so enormously influenced by the paths that other trains take that when one computer runs a thread a little quicker than another you have an instant desync. How to stop this? Make all trains do their pathfinding and wait till everyone is done. Then move the trains and *recheck* whether they may actually move, because another train might have entered the signal block. This will cause trains to take worse routes, especially the ones that are having a higher vehicle ID.

So there is basically NO place where you can efficiently implement multithreading without completely rewriting the game from scratch or you want to sacrifice multiplayer or you want to make the effectiveness of the pathfinders worse and the speed on singlecore machine considerably slower (because much 'may I enter' checks need to be redone as described above).

There have even been attempts to split the drawing and the rest of the game logic, but that gave a few percent (as in 2 or 3) advantage on multicore machines and a considerable slowdown (more than 10 percent) on single core machines.

SmatZ · Post by **SmatZ** » 02 Apr 2008 17:55

There was an attempt to make OTTD multithreaded:

Vehicles can be divided into 4 groups:
aircraft, ships, (trains + RVs), others

They do not interfere with each other. So all PF and vehicle movement could be done in a separate thread. Each vehicle class could have it's own randomiser.

Also, rendering can be done in a separate thread - but only the part where sprites are drawn. One could use a separate process, so memory is copied when it is modified by the other process... That would make whole rendering faster, but starting a new process each tick isn't that nice.

But there are places that have to be synchronised - at least GetSprite(), VehiclePosHash, ~Vehicle(), Vehicle(), sound functions, depot lists, ...

Also, on big maps, RunTileLoop() takes a bug part of CPU time... but it interferes with all vehicles (floods can affect each vehicle including ships (PF)).

The results: (depends on tested game of course)
10% speed increase on dual-core processor
10% speed decrease on single-core processor
+ all the problems with multithreading...

doktorhonig · Post by **doktorhonig** » 03 Apr 2008 11:14

SmatZ wrote:bug part

We don't want that part anyway.

Bilbo · Post by **Bilbo** » 03 Apr 2008 18:45

Rubidium wrote:Imagine the following: two trains waiting for the same junction.
...

Well, last time I tinkered with valgrind and openttd, about 50% of CPU power was eaten just by "pushing wagons around" (examining position of wagons, examining slope underneath them, calculating acceleration, calculating how many pixels will the train move forward and actually pushing the train around). That one could be done at least partially in parallel (at least examining the map and storing some cached values in vehicle struct)

Also, I don't see how two trains can compete for single junction in pathfinder. Either the train enters junction at one moment and then it "only" have decide which of the 2 (3,4..) ways to take. No train can possibly interfere with that decision, as due to way signals work, in each of the direction is at least part of the track free. If you mean competition when more trains are waiting at red to go into one block, there is no need to actually run a pathfinder - all the trains wish to go forward and they will decide where to actually go at nearest junction after the signal. Yes, these "waitings on red" would have to be solved serially first

Although PBS/YAPP may change this a bit ...

Not everything is parallelizable, but I think with relatively little changes the game could be parallelized to use the cores more effectively

Bilbo · Post by **Bilbo** » 03 Apr 2008 18:47

SmatZ wrote: Also, on big maps, RunTileLoop() takes a bug part of CPU time... but it interferes with all vehicles (floods can affect each vehicle including ships (PF)).

Well, the tileloop can be run in multiple threads (each runing tile loop on part of map), once all threads finish, the vehicle movement code starts.

The slowdown in singlecore processors could be solved like:

if (singlecore) {
run_code(1..x)
} else {
for i=1 to N {
run_thread(run_code(x/i+1..x/i+x/N))
}
wait_for_threads
}

then there probably shouldn't be the 10% slowdown ...

Post by **Rubidium** » 03 Apr 2008 18:52

A train entering a junction (as in the signal block of the junction) influences the state of all pathfinders that will be run after it. It might make other trains to take another route than when the state would not have been influenced. When doing this in parallel the influences on the pathfinders can come on different times (earlier or later depending on the scheduling of the OS), which just causes desyncs. And possibly only after a few game weeks or months when the vehicle entered a station or so, which makes them very hard to track.

Bilbo wrote:then there probably shouldn't be the 10% slowdown ...

No, just a tenfold increase in the maintainability of the code because vast blocks of code need to be rewritten to actually support that. Not to mention the changes to the random subsystem needed to actually support that.

Bilbo · Post by **Bilbo** » 05 Apr 2008 17:47

I think the pathfinding system and train moving could be improved to the point where it can run more in parallel, but yes, that would mean lot of work probably to support it (change parts of pathfinder and the random function ...).

For example, quite easily you can use one thread per each player's railroad system, as trains of different players should not be able to influence themselves (collision with RV's could be solved by running all the RV's code first). This would help multiplayer games and games with AI. To support some multithreading within one player would be more difficult though.

athanasios · Post by **athanasios** » 05 Apr 2008 22:42

... and kill some nice patches that are currently frozen.

ThePenguin · Post by **ThePenguin** » 06 Apr 2008 23:42

I have an 8800GT and a quad core and it would be really nice to see openttd multithreaded/rendered on the gpu.

Transport Tycoon Forums

nVIDIA CUDA, can we make the server do all calculations?

nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Re: nVIDIA CUDA, can we make the server do all calculations?

Who is online