Testing different AIs

Post by **planetmaker** » 24 Jul 2011 11:40

For quite some time I've been toying thoughts on how to compare AIs. Maybe we could come up with a somewhat 'standardized' test scheme to at least get an idea on how to compare AIs.

Brumi wrote: I think if we would like to measure the stability of AIs, it cannot be done in some scientific way, rather in an empirical way, based on users' experiences. Some stability measurement would be useful in my opinion, but whose responsibility would it be to do that?

Indeed that is the crucial or fundamental question in this which needs an answer before the 'inactivity' of an AI can be judged.

There have been sometimes requests for including an AI again with OpenTTD. Now, that's difficult... we (as OpenTTD core authors) wouldn't have written it, but possibly would be called liable for its mistakes - who shall fix them then? - , if any AI would be included. But even without doing that, it certainly might make sense to establish some criteria which could judge the quality of an AI:

Stability: Each AI could be left running on a few pre-made scenarios and of course it should not crash and do something there
Competitiveness: Success in (possibly the same) scenarios on an economical, monetary scale. Additionally it'll need IMHO a game where several AIs compete with eachother Possible scenarios (non-comprehensive) for this test series which a state-of-the-art AI should be able to survive at least (I know it excludes all one-transport-mode AIs, however well they perform there ):
- air only map
- road only map
- rail only map
- ship only map
- map with all vehicles available, newgrf industries, newgrf trainset with 'AI unfriendly' consist conditions, orig. accel
- map with all vehicles available but costs re-balanced via base costs, very steep TF costs, realistic accel
The order of this maps in such competition could be randomized, ~6 AIs assigned to each map run and judgement made after ~10 or 15 game years (in my experience that usually suffices).
"Beauty". The most subjective term. I.e. use of all transport types in a 'reasonable' manner. Employing a somewhat good path finder for road and rail, re-use of existing infrastructure (i.e. not spamming the map with rail and road when all infrastructure is mostly empty)

As last note for now: any such map should IMHO be made to be loaded with the current stable release, so testing gets easier and can be done in principle by anyone who doesn't have a nightly version of OpenTTD

EDIT: three first simple test cases
* ship only
* train only
* air only
Load the game, open console, type in start_ai <name of AI> and have it run for 10 years. See also http://wiki.openttd.org/AI:Test_AIs
But maybe that table is better moved to the AI comparison page. Or linked there or whatever.

Zuu · Post by **Zuu** » 24 Jul 2011 12:00

I don't think we should be afraid of organizing a competition or a voting where different user AIs can compete about which one is going to be included in the next stable. After all that is how the title game selection works.

I do however see the issues for the dev team that can arise when shipping an AI that might contain a subtle bug that only shows after OpenTTD is released and the author disappears. In that case, perhaps a middle way is to show the last AI score in bananas? (and eg. "unrated" for new AIs that hasn't been tested)

Felix Atagong · Post by **Felix Atagong** » 24 Jul 2011 12:14

I completely agree and this is something I have once proposed as well (Bananas: cleanup, categorisation, all-in-one packs...).

I like, for instance, playing the game starting from +/- 1830 and some AI don't perform and go bankrupt because:
they want to use a vehicle that doesn't exist yet (aeroplane, train)
they only use the normal loading bays instead of the drive-through loading bays for passengers and freight (and horse-carts only use drive-through).

The 'default' test should (IMO) be a test with a clean game with 'default' settings (but with the different climates) and no extra GRF. If an AI passes there it is has passed the test as being 'alive'.

Later on more subtle testing can be added with the most common newGRF, although, then once again, what is a 'stable' GRF and what is not...

FooBar · Post by **FooBar** » 24 Jul 2011 12:16

Maybe for the "competitiveness" a comparison can be made on the basis of operating profit (i.e. the amount of money an AI has after say 15 years). The AI with the biggest profit will get 10 points, the one with the lowest will get 0 points and the rest are scaled in between. Do this per scenario and then take the average.
AIs that just made a huge investment should be accounted for, so rather than the exact amount at January 1st, the maximum in the last year or so should be taken.

As for "stability", you could do something similar based on game year reached without crashing. An AI that doesn't crash will get 10 points. The AI that took the longest time before it crashed will get 5 points, the first to crash 0. Other 'crashers' are again scaled in between. An AI that didn't do anything will get 0 points here, even though it could be considered very stable.

For "beauty", a list of criteria should be drafted. If an AI meets a criteria it gets a point for that and if it doesn't it gets no point. Criteria could be "doesn't build unnecessary rail" (i.e. what the original AI used to do), "reuses roads as much as possible" (give AIs a scenario with all towns connected by road and see what it does with those), "builds with landscape and not against it", "doesn't use unnecessary space" (e.g. stations too big), as well as other things I can't think of right now. Possibly also 10 points to earn here. AIs that don't do anything will get 0 points here as well.

From this each AI can be given an overall rating. If an AI manages to obtain 30 points, it can be considered the "most perfect AI". AIs with at least 25 points can be considered very good. AIs with at least 20 points good. Between 10 and 19 an AI is moderate and below 10 it is bad.

As long as the testing procedure is documented carefully, really anybody can do such a test.

Alberth · Post by **Alberth** » 24 Jul 2011 12:27

I like the idea of standardizing tests for AI.

Could some beauty points be given to fewer number of vehicles used? I don't like AIs that fill all roads with trucks to the point that everything just gets stuck.

(I was also thinking to make an 'effectiveness' test, by adding a scenario with a low vehicle limit. That however measures something different.)

Post by **planetmaker** » 24 Jul 2011 14:47

Well, let's maybe just start with some standards. The test games can always be improved. In my previous posting I added three savegames which support one vehicle type only, here's one which supports only planes.

Simple test: start. Use console with "start_ai <name of AI>" and see what happens, how things go in ~10 years. The scores for these basic tests could possibly be scaled a bit by economic success. But I'm missing that scale so far

I started a wiki page with the test descriptions here: http://wiki.openttd.org/AI:Test_AIs

Zuu · Post by **Zuu** » 24 Jul 2011 15:08

While it is true that most AI games often really make your computer want to cry after 10-15 years, that is not very much for testing the aibility of AIs to maintain and replace vehicles. A work around on this could be to create a specific NewGRF with rather short vehicle life times that could test this at a shorter time period.

Still, I think this is something that can be left for future improvements. Better to get of the ground with the basics first.

Post by **planetmaker** » 24 Jul 2011 15:33

Zuu wrote:While it is true that most AI games often really make your computer want to cry after 10-15 years, that is not very much for testing the aibility of AIs to maintain and replace vehicles. A work around on this could be to create a specific NewGRF with rather short vehicle life times that could test this at a shorter time period.

Very good point indeed. We should keep that in mind; and otoh it could be easily incorporated into those scenarios, if they were left running for (much) longer; but that'd need (IMHO) some automatized testing or very eager testers then as it'd take very long to test everything. And Rubidium reminded me of some nearly forgotten test scenarios from the beginnings of the NoAI time: http://www.tt-forums.net/viewtopic.php?p=703535#p703535

Good test scenarios which go beyond what we have, and possibly configured with NewGRFs which make things not easy, are definitely welcome

This thread IMHO would be a good place to post them

Just make sure that all NewGRFs are available from bananas.

Brumi · Post by **Brumi** » 24 Jul 2011 15:34

Zuu wrote:While it is true that most AI games often really make your computer want to cry after 10-15 years

That's not true if you set the construction speed of competitors to a lower setting. I did my toyland AI test with 14 AIs on the slow setting, and my computer could cope with it for 100 years.

I did the tests for AdmiralAI. Interestingly it wasn't profitable after 10 years in the train-only game, but later it became profitable. I gave it 5 points.

In my opinion, almost none of the AIs will crash in these games, because this type of test (running it alone for a relatively short period without saving and loading in-between) is what supposedly most AI authors do. We wouldn't leave it that way if it crashed in a simple test

Just as an example, the most reported AdmiralAI crash (it may be the only one) happens when electric railway is introduced, and the game has been saved and loaded before.

EDIT:
Now I did the tests for AIAI and AroAI as well. AIAI only built airports after the city airport became available.
I think the rv-only map is a bit too harsh on bus AIs. AroAI went bankrupt after 10 years in business.

Lord Aro · Post by **Lord Aro** » 24 Jul 2011 18:09

Brumi wrote:AroAI went bankrupt after 10 years in business.

Boo.

Guess i need to implement freight then

Brumi · Post by **Brumi** » 24 Jul 2011 18:37

I did some more tests, problems are coming

ChooChoo crashed in those three games where trains weren't enabled. In the train game, it had freight trains only by the year 1960, so it got 5 points despite the fact that it is a train networking AI.
Chopper couldn't do anything, as the first helicopter appears a bit later. So now it is 0 points for Chopper, but I'm sure it would be different if the game didn't start in 1950.
Convoy was unprofitable by the year 1960, it also managed to run profitably for a few years like AroAI, but eventually it was going into bankruptcy. The map is definitely harsh for bus AIs.
Denver & Rio Grande failed because 90 deg turns were disabled for trains.
DictatorAI did well at least

@Zuu: I assumed that you had done your test with version 25 of CluelessPlus, I added the version to the table. Please correct it if you did it with some development version.

Michiel · Post by **Michiel** » 24 Jul 2011 18:52

Good idea guys!

Brumi wrote:I did some more tests, problems are coming
ChooChoo crashed in those three games where trains weren't enabled. In the train game, it had freight trains only by the year 1960, so it got 5 points despite the fact that it is a train networking AI.

I guess I should check whether trains are enabled and at least die gracefully if they aren't

The train game is a bit of a bummer though, I would like for it to start doing "interesting" things sooner. No promises but I'll at least have a look at the test game.

Edit: ah, I see. It plops down its cargo lines within a year, but there's not enough towns to satisfy its seed crossing algorithm (it'll only build in spots where it estimates it will be able to connect at least 3 towns). Removing that constraint might result in lots of ugly point to point lines on another map, so as usual, it's a tradeoff.

Post by **planetmaker** » 24 Jul 2011 19:10

There was loooong time ago an AI tournament which TrueBrain made and where he wrote some nice small scripts which work jointly with a slightly modified OpenTTD (updated diff)

I have not yet gone through the whole scripts. Provided I understood how that works:
The results are the ranking based on many games played for one year with three participants; the scores are from single player games with two different maps, too

Base results (after 1 years):
1. AdmiralAI: 285 / 735
2. SimpleAI: 270 / 735
3. AIAI: 190 / 735
4. NoCAB: 223 / 735
5. ChooChoo: 220 / 735
6. PAXLink: 206 / 735
7. AroAI: 215 / 735
8. Convoy: 160 / 735
9. PathZilla: 135 / 735

Detailed scoring: http://pastebin.com/jmKGKTvj - ignore the two AIs with negative scores; they were wrongly configured

I shall give it a try with a longer run....

EDIT: Here we go, sorry only 5 AIs, or I wouldn't be able to use this machine tonight anymore. I'll try to initialize a longer run with many more on the DevZone:

Code: Select all

|-- Tournament (group) running (5 AIs)
  |-- Match running (NoCAB, AdmiralAI, PAXLink, CluelessPlus, AIAI) with seed 1209793896
  |-- Match running (NoCAB, AdmiralAI, PAXLink, CluelessPlus, AIAI) with seed 2142556532
Base results (after 12 years):

  1. NoCAB: 393 / 735
    Details Round 1: 491 / 735
         0 /  100 -                         ship profit (       0 /        0)
       100 /  100 -                         road profit (  578924 /   578924)
       100 /  100 -                        train profit (  799971 /   799971)
       100 /  100 -                          air profit (  285283 /   285283)
         0 /   50 -    units of delivered cargo by ship (       0 /        0)
        25 /   50 -    units of delivered cargo by road (   52548 /   103984)
        50 /   50 -    units of delivered cargo by rail (   38225 /    38225)
        50 /   50 -     units of delivered cargo by air (   37704 /    37704)
       -50 /  -50 -                         loan amount (  350000 /   350000)
         7 /   10 -                         cash amount ( 1519352 /  1960329)
        25 /   25 -                       company value ( 4256792 /  4256792)
        50 /   50 -                         cargo types (      10 /       10)
        34 /   50 -                 average town rating (     341 /      496)
         0 / -100 -                            bankrupt (       0 /        1)
    Details Round 2: 295 / 735
         0 /  100 -                         ship profit (       0 /        0)
        16 /  100 -                         road profit (  154920 /   944683)
       100 /  100 -                        train profit (  192745 /   192745)
       100 /  100 -                          air profit (   79082 /    79082)
         0 /   50 -    units of delivered cargo by ship (       0 /        0)
         1 /   50 -    units of delivered cargo by road (    7116 /   189122)
        24 /   50 -    units of delivered cargo by rail (    8550 /    17358)
        50 /   50 -     units of delivered cargo by air (   16972 /    16972)
       -50 /  -50 -                         loan amount (  350000 /   350000)
         0 /   10 -                         cash amount (   91886 /  1076228)
         5 /   25 -                       company value (  663478 /  2913922)
        27 /   50 -                         cargo types (       5 /        9)
        22 /   50 -                 average town rating (     222 /      504)
         0 / -100 -                            bankrupt (       0 /        1)

  2. AIAI: 294 / 735
    Details Round 1: 254 / 735
         0 /  100 -                         ship profit (       0 /        0)
        65 /  100 -                         road profit (  379470 /   578924)
         0 /  100 -                        train profit (       0 /   799971)
        60 /  100 -                          air profit (  173691 /   285283)
         0 /   50 -    units of delivered cargo by ship (       0 /        0)
        50 /   50 -    units of delivered cargo by road (  103984 /   103984)
         0 /   50 -    units of delivered cargo by rail (       0 /    38225)
        16 /   50 -     units of delivered cargo by air (   12392 /    37704)
       -49 /  -50 -                         loan amount (  340000 /   350000)
        10 /   10 -                         cash amount ( 1960329 /  1960329)
        15 /   25 -                       company value ( 2584747 /  4256792)
        40 /   50 -                         cargo types (       8 /       10)
        47 /   50 -                 average town rating (     470 /      496)
         0 / -100 -                            bankrupt (       0 /        1)
    Details Round 2: 335 / 735
         0 /  100 -                         ship profit (       0 /        0)
       100 /  100 -                         road profit (  944683 /   944683)
         0 /  100 -                        train profit (       0 /   192745)
        50 /  100 -                          air profit (   40146 /    79082)
         0 /   50 -    units of delivered cargo by ship (       0 /        0)
        50 /   50 -    units of delivered cargo by road (  189122 /   189122)
         0 /   50 -    units of delivered cargo by rail (       0 /    17358)
        12 /   50 -     units of delivered cargo by air (    4151 /    16972)
         0 /  -50 -                         loan amount (       0 /   350000)
         5 /   10 -                         cash amount (  571604 /  1076228)
        25 /   25 -                       company value ( 2913922 /  2913922)
        50 /   50 -                         cargo types (       9 /        9)
        43 /   50 -                 average town rating (     434 /      504)
         0 / -100 -                            bankrupt (       0 /        1)

  3. AdmiralAI: 175 / 735
    Details Round 1: 89 / 735
         0 /  100 -                         ship profit (       0 /        0)
        14 /  100 -                         road profit (   86465 /   578924)
         5 /  100 -                        train profit (   45754 /   799971)
        18 /  100 -                          air profit (   52582 /   285283)
         0 /   50 -    units of delivered cargo by ship (       0 /        0)
        13 /   50 -    units of delivered cargo by road (   29000 /   103984)
        11 /   50 -    units of delivered cargo by rail (    8574 /    38225)
         1 /   50 -     units of delivered cargo by air (     924 /    37704)
       -50 /  -50 -                         loan amount (  350000 /   350000)
         0 /   10 -                         cash amount (   13828 /  1960329)
         2 /   25 -                       company value (  418802 /  4256792)
        35 /   50 -                         cargo types (       7 /       10)
        40 /   50 -                 average town rating (     401 /      496)
         0 / -100 -                            bankrupt (       0 /        1)
    Details Round 2: 262 / 735
         0 /  100 -                         ship profit (       0 /        0)
        53 /  100 -                         road profit (  501943 /   944683)
        34 /  100 -                        train profit (   65970 /   192745)
        17 /  100 -                          air profit (   13453 /    79082)
         0 /   50 -    units of delivered cargo by ship (       0 /        0)
        21 /   50 -    units of delivered cargo by road (   79776 /   189122)
        50 /   50 -    units of delivered cargo by rail (   17358 /    17358)
        17 /   50 -     units of delivered cargo by air (    5811 /    16972)
       -50 /  -50 -                         loan amount (  350000 /   350000)
         0 /   10 -                         cash amount (   38420 /  1076228)
        20 /   25 -                       company value ( 2442349 /  2913922)
        50 /   50 -                         cargo types (       9 /        9)
        50 /   50 -                 average town rating (     504 /      504)
         0 / -100 -                            bankrupt (       0 /        1)

  4. CluelessPlus: 111 / 735
    Details Round 1: 132 / 735
         0 /  100 -                         ship profit (       0 /        0)
        63 /  100 -                         road profit (  369014 /   578924)
         0 /  100 -                        train profit (       0 /   799971)
         0 /  100 -                          air profit (       0 /   285283)
         0 /   50 -    units of delivered cargo by ship (       0 /        0)
        35 /   50 -    units of delivered cargo by road (   73930 /   103984)
         0 /   50 -    units of delivered cargo by rail (       0 /    38225)
         0 /   50 -     units of delivered cargo by air (       0 /    37704)
       -50 /  -50 -                         loan amount (  350000 /   350000)
         5 /   10 -                         cash amount ( 1065702 /  1960329)
         9 /   25 -                       company value ( 1699807 /  4256792)
        20 /   50 -                         cargo types (       4 /       10)
        50 /   50 -                 average town rating (     496 /      496)
         0 / -100 -                            bankrupt (       0 /        1)
    Details Round 2: 90 / 735
         0 /  100 -                         ship profit (       0 /        0)
        28 /  100 -                         road profit (  270779 /   944683)
         0 /  100 -                        train profit (       0 /   192745)
         0 /  100 -                          air profit (       0 /    79082)
         0 /   50 -    units of delivered cargo by ship (       0 /        0)
        17 /   50 -    units of delivered cargo by road (   64475 /   189122)
         0 /   50 -    units of delivered cargo by rail (       0 /    17358)
         0 /   50 -     units of delivered cargo by air (       0 /    16972)
       -50 /  -50 -                         loan amount (  350000 /   350000)
        10 /   10 -                         cash amount ( 1076228 /  1076228)
        15 /   25 -                       company value ( 1761598 /  2913922)
        27 /   50 -                         cargo types (       5 /        9)
        43 /   50 -                 average town rating (     434 /      504)
         0 / -100 -                            bankrupt (       0 /        1)

  5. PAXLink: -100 / 735
    Details Round 1: -100 / 735
         0 /  100 -                         ship profit (       0 /        0)
         0 /  100 -                         road profit (       0 /   578924)
         0 /  100 -                        train profit (       0 /   799971)
         0 /  100 -                          air profit (       0 /   285283)
         0 /   50 -    units of delivered cargo by ship (       0 /        0)
         0 /   50 -    units of delivered cargo by road (       0 /   103984)
         0 /   50 -    units of delivered cargo by rail (       0 /    38225)
         0 /   50 -     units of delivered cargo by air (       0 /    37704)
         0 /  -50 -                         loan amount (       0 /   350000)
         0 /   10 -                         cash amount (       0 /  1960329)
         0 /   25 -                       company value (       0 /  4256792)
         0 /   50 -                         cargo types (       0 /       10)
         0 /   50 -                 average town rating (       0 /      496)
      -100 / -100 -                            bankrupt (       1 /        1)
    Details Round 2: -100 / 735
         0 /  100 -                         ship profit (       0 /        0)
         0 /  100 -                         road profit (       0 /   944683)
         0 /  100 -                        train profit (       0 /   192745)
         0 /  100 -                          air profit (       0 /    79082)
         0 /   50 -    units of delivered cargo by ship (       0 /        0)
         0 /   50 -    units of delivered cargo by road (       0 /   189122)
         0 /   50 -    units of delivered cargo by rail (       0 /    17358)
         0 /   50 -     units of delivered cargo by air (       0 /    16972)
         0 /  -50 -                         loan amount (       0 /   350000)
         0 /   10 -                         cash amount (       0 /  1076228)
         0 /   25 -                       company value (       0 /  2913922)
         0 /   50 -                         cargo types (       0 /        9)
         0 /   50 -                 average town rating (       0 /      504)
      -100 / -100 -                            bankrupt (       1 /        1)

Base results (after 12 years):

1. NoCAB: 393 / 735
2. AIAI: 294 / 735
3. AdmiralAI: 175 / 735
4. CluelessPlus: 111 / 735
5. PAXLink: -100 / 735

Brumi · Post by **Brumi** » 24 Jul 2011 19:53

Great!

TrueBrain's test seems to be quite sophisticated, I hope you will be able to get it going...

Anyway, just to finish my work, I did some more tests, there are only 3 more AIs to go.
PAXLink and MailAI couldn't do anything, as they needed both of their vehicle types. I had to change some parameters of Terron to make it use aircraft in the aircraft-only game, because by default it is configured not to use small airports, and not to start with planes.

krinn · Post by **krinn** » 24 Jul 2011 21:49

That's great idea, i can't really work all summer on my AI, so except bugfix i might not work on it, but this is really motivating to add trains (not enough for ships yet, but i hate ships), and i see it's also doing the effect on other (like lord aro to add freight)

Michiel wrote:I guess I should check whether trains are enabled and at least die gracefully if they aren't

As i said to someone already, please don't !
Why stopping your AI when your AI just need time to get going ? And even the event never came, i highly prefer an AI that does nothing than stopping, just because the AI might have opportunity later to work (why stopping a train only AI when no train are avaiable on a scenario where trains might came out at date X)

Please, just issue a message if you feel the need for (so user might know why the AI is doing nothing), but still, wait with hope a good event appears (for you introduction of trains) and don't stop your ai.
Even with a bad setting on, no AI should stopped working, they should just loop until the offending setting is off, this won't bug the user and maybe it's a wanted need by the user (temp disable a setting to do something or test or anything, and re-enable it later).

well, my point of view sure, but i really hate AI to stop or worst, not running at all because of a setting i've unset while they in fact might work when i agree to enable it.
LOL this remind me i already complain with choochoo refusing to start because noconstruction.road_stop_on_town_road is off

looks like you didn't catch why i was asking that

GeekToo · Post by **GeekToo** » 24 Jul 2011 22:24

Very nice idea, indeed.
A thought about the rating system: in my opinion a crash should be punished more severely: it now is a -5 pt penalty. In my view, crashing is the worst sign of low quality/lack of testing. So a fancy AI with plane/train/plane/bus/road veh and ships that crashes is according to that view worse than a road vehicle only AI that is rock stable. I think it should be "rewarded" with -20 pt.
Once the AI is stable, the smartness of it, being able to use different kind of vehicles in every circumstance, and make a profit should make a difference.
Let's not only look a pure performance, the most important feature should be that is is fun to play against it, so it does not flood roads with incredible number of vehicles, pollute the map with abundant infrastructure building, block roads with drive-thru stations with tens of vehicles in front of it. Once that is achieved, "nice" looking networks are rated over point to point connections etc, not before the other points are satisfied.

Post by **planetmaker** » 25 Jul 2011 14:45

So, finally after 985 minutes or 16hours 25 minutes of relentless fast-forward, the first real test run of the AI tournament system with the original settings as was used before has been finished with a series of games lasting 50 game years for 17 AIs, a total of 24 games:

1. AIAI: 394 / 735
2. Trans: 296 / 735
3. SimpleAI: 328 / 735
4. Terron: 317 / 735
5. NoCAB: 69 / 735
6. DictatorAI: 316 / 735
7. OtviAI: 311 / 735
8. RoadRunner: 217 / 735
9. PathZilla: 97 / 735
10. CluelessPlus: 194 / 735
11. trAIns: 165 / 735
12. Rondje: 180 / 735
13. Convoy: 73 / 735
14. AroAI: 246 / 735
15. ChooChoo: 206 / 735
16. PAXLink: -100 / 735
17. FanAI: -100 / 735

One notable thing is that the scores need tweaking as an AI can accumulate huge negative scores, if it is running at a loss a certain transport type - like NoCAB does with its trains and which basically is the reason it gets away so bad (but it also runs an annual loss of 2 million in trains - balanced by 2.1 million profit with road vehicles

) The scores are the average of all games the AI took part in; the attached log shows the average for each AI and map type separately. The tournament was run on two maps with slightly different settings:

Map1: reduced breakdowns, no disasters, permissive local authorities, realistic train acceleration, plane speed at 1/4 on a 1024*256 temperate map started in 1950
Map2: normal breakdowns, disasters on, normal local authorities, original train acceleration, plane speed at 1/1 on a 256*256 arctic map started in 1990

If you want to re-create the maps, use the attached cfgs and random seeds as in the log file with OpenTTD r22687; No NewGRFs were used.

So... what do I take from this? Scoring needs IMHO be tweaked still. And it will make sense to feed the tournament specially crafted scenarios.

And sorry, Yexo. I could have sworn to have included AdmiralAI, but it slipped me

FooBar · Post by **FooBar** » 25 Jul 2011 15:20

planetmaker wrote:One notable thing is that the scores need tweaking as an AI can accumulate huge negative scores, if it is running at a loss a certain transport type

Maybe limit negative points in case of a loss on a transport type to -100? I don't think an AI that claims to use a certain transport type but then only makes a loss on it should go unpunished, but a very large negative value isn't very fair either. As the maximum points to gain on those categories is 100, losing a maximum of 100 seems fair to me.

Also, I don't think an AI should be awarded points for "units of delivered cargo" if the profit of the corresponding transport type is negative. In the particular case of NoCAB: what's the point of transporting the most cargo by rail of all AIs if it can't make a profit out of it?

But all in all a system like this is one of the best ways to compare AIs I think.

Zuu · Post by **Zuu** » 25 Jul 2011 15:45

Why isn't the result list sorted by score? Or have I missed something?

Brumi · Post by **Brumi** » 25 Jul 2011 15:55

Indeed, I think this is the best way to measure their perfonrmance. Currently I'm wondering how these results should appear on the 'Comparison of AIs' page, maybe we could calculate some mean score for the AIs? All these stats aren't really useful for new players, I think the stats should be summarised to some extent.

To test stability further, can the script be modified so that it saves the game at some point and then reloads it? I think it would be useful to test those AIs this way that claim to be save/load compatibile.

Zuu wrote:Why isn't the result list sorted by score? Or have I missed something?

planetmaker wrote:The results are the ranking based on many games played for one year with three participants; the scores are from single player games with two different maps, too

Transport Tycoon Forums

Testing different AIs

Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Re: Testing different AIs

Who is online