Page 3 of 5

Posted: 01 Jan 2007 10:32
by KUDr
halplus: don't worry. Optimizing overall performance issues will help you too. And B. N. SmatZ! is doing it well.

Posted: 10 Jan 2007 01:36
by SmatZ
Thank you, KUDr!

I had some time today, so I began studying how viewport is being drawn.

I am not sure if it was intended, but I get o lot of "out of sprite memory errors". These errors appear only when zoom is minimized (eg. zoomed out)

dbg: [sprite] Out of sprite memory

Problem is (svn 8007) at line 1276 of viewport.c. The size of area scales 4 times for every zoom out, not 2 times.

When the line is changed to
if (((bottom - top) * (right - left) << (2*vp->zoom)) > 180000) {

there are no errors anymore

(I feel this should be changed to get rid of those debug messages in release)

Posted: 15 Jan 2007 14:20
by SmatZ
Ok, so here comes updated version. Measured times are in vehicles.cpp - in the best scenario, I could get game 2.5times faster. The more vehicles, the better acceleration. Also, Full Animation and Full Detail should be turned off.

There were not many optimizations, just memory requierements were cut down.
I see, there were changes that Vehicle* is used instead of VehicleID. I reverted it back to VehicleID - if we can use more memory, it is everytime better to have more items in _vehicle_position_hash than having pointers there instead (pointers are 8 bytes on 64 bit architectures, VehicleID is just 2 bytes long).

The main difference is that two hash tables are used, and so, there is much much faster VehicleFromPos().

There is not much to say about it, as I have already done in previous posts.

Posted: 15 Jan 2007 14:30
by peter1138
B. N. SmatZ! wrote:I see, there were changes that Vehicle* is used instead of VehicleID. I reverted it back to VehicleID - if we can use more memory, it is everytime better to have more items in _vehicle_position_hash than having pointers there instead (pointers are 8 bytes on 64 bit architectures, VehicleID is just 2 bytes long).
In my tests, a Vehicle* was faster than a GetVehicle() pool lookup...

Posted: 15 Jan 2007 14:43
by SmatZ
peter1138 wrote: In my tests, a Vehicle* was faster than a GetVehicle() pool lookup...
Yes, this is for sure. How much was it faster?
Problem is the size of _vehicle_position_hash. When it is array of pointers, it can be two or four times as big as if it were array of VehicleID. If there could be used more memory, it is better to icrease HASH_BITS.

But, yes, I agree with you that GetVehicle is not a fast way to access Vehicles.

If there were not memory limit, we could just add Vehicle* to the Tile struct and stop using the _vehicle_position_hash.

edit:
I will try to combine Vehicle* / VehicleID and post results and diffs.

What are the memory limits I should stay in?

Posted: 15 Jan 2007 15:26
by SmatZ
So, I changed the code to include your changes.
It is really faster, 10.3s against 10.6s in the best case.

I also removed the table of benchmarks.

So, there is a diff!

edit:

What is your opinion about includit if (v->tile == tile) into VehicleFromPos, to lower number of indirect calls?
Is there any case where this would fail?
edit:
I tested that, seems to work, measured time 10.2s.

edit2:
Also, there are probably many useless checks in TrainController. When the vehicle is a wagon, there don't have to be updated most of values. Also, with the exception of reversing train, PreviousVehicle==NULL is the same as IsFrontEngine(). For example, testing if a wagon can enter given tile, while its locomotive could. Or, vehicle's sprite is changed everytime it moves - not only when it changes its direction.
I did some rewrite of that in 0.4.8, and it seemed to work.
What is your opinion?
edit:
In current svn, many of these things are already updated.

(btw - latest svn seems to be buggy. I know, they are supposed to be...)

Posted: 15 Jan 2007 20:53
by SmatZ
I post four diffs in this post. All are built against svn 8137.

Original testtime is 26.5 seconds. Full Anim = Full Detail = off.
Savegame is too big to post here... It is a 2048x2048 map with 1 300 trains and 50 000 vehicles in total.
All test were done with HASH_BITS=HASH_BITS_GUI=8. Increasing this value will bring better performance.

If you test this diffs, prepare some savegame with many vehicles, where processing train_tick takes significant time!

1) gui.diff
Divided hash pool to two parts. First is used by ViewportAddVehicles(), second by VehicleFromPos(). Has a impressive impact for games with many vehicles over original. Screen scrolling is faster, too. Size of hash pool is settable by HASH_BITS and HASH_BITS_GUI in vehicles.cpp. Testfile time 10.6 sec.

2) gui_b.diff
Modified version of gui.diff. Contains changes already done by peter. Disadvantage is a bit more memory required (two or four times more than in gui.diff). Testfile time 10.3 sec.

3) tile.diff
Another change, moved test if ( v-> tile == tile ) from proc into VehicleFromPos(). Some indirect calls are saved. Testfile time 10.2sec.

I hope you find these diffs useful. Let me know if so, or if not.

Posted: 15 Jan 2007 20:54
by SmatZ
4) wagon.diff
Updated TrainController() in train_cmd.cpp, so some functions are not called for wagons. This code is not working well, problem is with vehicles leaving depots, reversing trains and maybe in other cases. It is just a quick fix to demonstrate this update. Testfile time 9.7sec.

Posted: 18 Feb 2007 05:30
by Quark
B. N. SmatZ! wrote:4) wagon.diff
Updated TrainController() in train_cmd.cpp, so some functions are not called for wagons. This code is not working well, problem is with vehicles leaving depots, reversing trains and maybe in other cases. It is just a quick fix to demonstrate this update. Testfile time 9.7sec.
any updates?

Posted: 06 Mar 2007 00:25
by SmatZ
Actually, I haven't made anything more yet. I was waiting for some response.

Versions 1), 2), 3) seem to be working well, but I don't have enough knowledge how network games work, and if there would be some problem. (I didn't find any while playing with patched both server and clients - without patched server, desync would appear, as there is a little change when signal colors are changed)

I showed the way how hashing could (and I think it should) be remade, but devs didn't include it (maybe it collides with mempool remake, I don't have any information), so... there is probably nothing more I could do (too much time for 'nothing')

Posted: 02 Jun 2007 10:48
by madman2003
I tried to apply your patch to the latest source, but something is going wrong (no signal is ever set red), have you tried lately?

Posted: 06 Jun 2007 13:47
by SmatZ
So I rebuilt diffs against svn 10048. I did not any changs to the code, but it seems to work.
I didn't update wagon.diff as there are too many changes...

1) gui.diff
Divided hash pool to two parts. First is used by ViewportAddVehicles(), second by VehicleFromPos(). Has an impressive impact for games with many vehicles over original. Screen scrolling is faster, too. Size of hash pool is setable by HASH_BITS and HASH_BITS_GUI in vehicles.cpp. Testfile time 10.6 sec.

2) gui_b.diff
Modified version of gui.diff. Contains changes already done by peter. Disadvantage is a bit more memory required (two or four times more than in gui.diff). Testfile time 10.3 sec.

3) tile.diff
Another change, moved test if ( v-> tile == tile ) from proc into VehicleFromPos(). VehicleFromPos() should return vehicles really at given pos, as its name says... Some indirect calls are saved. Testfile time 10.2sec.

edit:
wagon.diff

There were changes saving number of called functions.
Some functions got called only when entered new tile, some only for engines, some when crossed signals to update signal block.
Also, sprite got updated only when direction of vehicle changed.
I don't remember if there were more changes...

I hope somebody was/will be interested in this code to implement it into trunk. (especially wagon.diff, that is not included now)

edit2:
In fact, the way the Vehicle hash_table is done, and because a very big block is scanned at a time, it is very uneffective. Maybe I could compare it to not using hash_table at all and searching all vehicles instead.

Too bad is that anyone of devs is interested :-(

edit3:
There is most likely a bug in train_cmd.cpp
bool ssign=false; should be moved before the for()
Problem should be with trains leaving tunnel not affecting semaphore. (but I didn't encounter that problem)

Posted: 06 Jun 2007 16:00
by SmatZ
I did the test of hash efficiency - using the original hash method, for every ViewportAddVehicles() are searched rouhly 500 vehicles, 10 of them are drawn.
For every VehicleFromPos() are searched roughly 300 vehicles before the one is found.

It is better than running without hash tables, but really far from perfect...

With my update, there were one or two vehicles searched before the correct was found in VehicleFromPos() - even with HASH_BITS == 8! Reason is the searched area is not hundreds of tiles large as in original version.

The problem with ViewportAddVehicles() still remains, maybe I could look at it... probably the size of sprite cache is too small.

I am running game with ~ 30 000 vehicles, so this is where come these numbers from.

Posted: 07 Jun 2007 07:27
by TheJosh
B. N. SmatZ! wrote: but I don't have enough knowledge how network games work, and if there would be some problem.
As long as code is in a DoCommandP, you wont get desyncs.

As for testing network code, just run the program twice and play yourself. Thats how I test patches. sometimes i hack the source so i can use cheats in multiplayer as well, because often my testing requires a lot of money

Posted: 07 Jun 2007 07:40
by peter1138
TheJosh wrote:As long as code is in a DoCommandP, you wont get desyncs.
Sorry, but... "lol"...

Posted: 07 Jun 2007 08:41
by Rubidium
TheJosh wrote:As long as code is in a DoCommandP, you wont get desyncs.
Who, but you, did *ever* say that? It's not true. Both code inside as outside DoCommandP can cause desyncs *if* not programmed with our network protocol in mind.
B. N. SmatZ! wrote:Testfile time 10.6 sec.
Compared to what time without this hashing? And how is that determined?

What about doing some proper profiling, so you can actually say (and prove): "this function used X% and now uses only Y% under Z conditions"?

Posted: 07 Jun 2007 09:17
by peter1138
I'm interested in proper profiling and possible side effects of this. It does look pretty desync-safe (not that looks mean much :))

I have no doubt that it is faster, as it only checks one hash table entry instead of many-many-lots, but does this now mean that vehicles can only collide if they're on the same tile? Perhaps that is so anyway, of course.

Posted: 07 Jun 2007 09:40
by LordOfThePigs
Rubidium wrote:
B. N. SmatZ! wrote:Testfile time 10.6 sec.
Compared to what time without this hashing? And how is that determined?
B. N. SmatZ a few posts higher wrote: Original testtime is 26.5 seconds. Full Anim = Full Detail = off.
It's likely that the original time didn't change very much.

Posted: 07 Jun 2007 19:02
by SmatZ
Whoa, so many replies - actually I didn't expect anyone will read it again :)

As LordOfThePigs wrote, I didn't any changes, so the old times are still valid.
It is running (afair) 5 days of attached savegame
AI was disabled commenting AI_RunGameLoop() in openttd.cpp:
// AI_RunGameLoop();
Also, Full Detail and Animation was disabled.

edit: I couldn't attach savegame, you can download it at http://88.146.45.107/ttd


There are 3 major changes:
1) two hash arrays, one for VehicleFromPos and second for ViewportAddVehicles. The problem in original code is that all vehicles' coordinates' hashes are from on-screen position. So a very large area must be scanned to get vehicle from given coordinate because of hills. Hills change on-screen coordinates, while the tile remains the same.

Code: Select all

	const int xl = GB(pt.x - 174, 7, 6);
	const int xu = GB(pt.x + 104, 7, 6);
	const int yl = GB(pt.y - 294, 6, 6) << 6;
	const int yu = GB(pt.y +  56, 6, 6) << 6;
Area of roughly 10*20 = 200 tiles is searched every time VehicleFromPos is called (until correct vehicle is found, if any). With hash_size of 2*6 bits = 4096 hash entries, rouhly 1/20 of vehicles is searched.
With divided GUI and Pos hash tables, only 1 hash_etnry is searched, leading to 200 faster searches. Only one tile is searched.

2) Corresponding to change 1), a little change had to me made. In old code, semaphores are updated before hash table is updated. So when vehicle enters new tile and only the new tile would be searched, and the hash_table contains old coords of vehicle, the semaphores wouldn't go red. This is the reason why this code is not network compatible - semaphores are updated after vehicle is moved (at the end of vehicle_move in TrainController)

3) in tile.diff, VehicleFromPos() verifies if vehicle is at given tile, so it doesn't have to be verified in *Proc functions, given a little speedup


Also, hash_table_size is settable - I don't know original TTD hash size, but when now maps are up to 256 times larger, hash table size should be resized, too.

Everything is mentioned in previous posts...

Posted: 08 Jun 2007 17:43
by madman2003
I've done some short profiling, based on the third patch.

I definately like the improvement, makes my patch kind of useless, but improvements are good anyway.

Is this being considered for trunk inclusion?