
I have attached 3 things to this post:
1. Patch with some low-level optimizations, especially for linux/gcc configuration.
2. Patch that introduces a cache with station locations to speed up expensive FindStationsAroundTiles function.
3. A tool that helps in performance testing and that can be employed for automated tests.
I had some problems while attaching patch 3, so i've uploaded it here:
students.mimuw.edu.pl/~kn248436/opos/tester.tgz
Below you will find some a bit more detailed descriptions of them.
I have made some performance tests to show that the optimizations work.
The test cases were three savegames and the command I launched was:
openttd -x -v null:ticks=10000 -s null -m null -g [SavegameName]
and then I measured cpu time using time command.
The results were (in seconds, averaged):
1024x1024 map with huge cities and lots of trains:
none: 113
mod1: 108
mod2: 100
both: 97
2048x2048 empty map with medium number of medium-sized cities:
none: 47
mod1: 46
mod2: 44
both: 43
2048x2048 empty map with lots of large cities:
none: 100
mod1: 94
mod2: 74
both: 73
My system configuration is:
Linux 2.6.33-ARCH #1 SMP PREEMPT Sun Apr 4 10:27:30 CEST 2010 x86_64 Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz GenuineIntel GNU/Linux
4GB RAM
All three savegames are included in the test tool (attachment 3). They are 1024x1024-huge.sav, 2048x2048-notPopulated.sav, 2048x2048-populated.sav respectively.
Here you can see the reports generated with attached test tool:
http://students.mimuw.edu.pl/~kn248436/ ... Report.pdf
http://students.mimuw.edu.pl/~kn248436/ ... Report.txt
Now a bit more detailed description of the patches.
1.
I have introduced some gcc builtin functions: __builtin_ffs, __builtin_popcount, __builtin_clz and used them to improve some functions in core/bitmath_func.hpp
I have also realized (using callgrind) that some of the functions called really often but being very short (like GetTileType) were not inlined. So I introduced a macro, OTTD_ALWAYS_INLINE that expands under gcc to
__attribute__((always_inline)) so as to raise the probability of inlining for some functions. I have also moved some functions to headers from .cpps.
Although new functions became inlined after my modifications the size of the binary has decreased on my system. Before modifications it has 5766276 bytes and after them - 5759997.
2.
I have realized that one of the most expensive functions in code being often called is FindStationsAroundTiles. It did a bit weird thing: it scanned the whole area of - for example - 10x10 tiles (or more), one by one, to get all stations there.
I have introduced some kind of a cache for stations positions so as to improve it.
At first I divide the whole map on parts sized by default (can be changed, but this seems to be the best choice for my computer) 2^6 x 2^6 tiles. For each part I store a list of TileAreas related to stations located there.
So when I need to scan an area for stations I take all map parts intersecting with this area, take all stations stored in list related to the part and check if they are located in requested area.
Despite this solution is not very sophisticated, it works very well, especially on maps without many stations (the structures with staiton areas are empty so the process is trivial), however on very crowded maps it also gives a significant performance gain without significant memory overhead. What's more, high setting of station catchment radius does not significantly affect the game speed.
The code that updates (when adding/removing/deleting station) the cache may look a bit primitive and like it could be done much faster - but the updating happens so rarely that I consider it good enough. Only lookup time seems for me to be performance-critical.
3.
I have created a tool that may be helpful when running performance tests and can be good for automated regression tests for example. See README inside for more details. I hope you will find it useful
