Aegir:
There are unlimited possibilites. There could be more bluish night in winter, a certain reedening during sunrise. It would be also very easy to use more than four stages or a monthly pallette. This could be even in the config files.
Also the artic tree obey summer and winter. The 12 temperate trees do not. Sice at least five of them are nearly identical to artic one, there could be more seasons or more winter trees. It would be fun to try this with the japanese trees.
I though about those, but was too lazy. And digging through the UI code is never fun but so much less with OTTD
(Since I am head developer of Simutrans at the moment I rarely play OTTD because it is no challenge: cities growth too easy, braindead passengers, use a coal line and make more many than you can spend, ... you certainly know.)
About 32Bit/16Bit matter, or which bandwidth matters:
32Bit need 4x the bandwidth. First time to copy to backbuffer, second time to copy to the screen. And even though AGP has in priciply a hugh memory bandwidth, writing into an AGP card by CPU is sloooow. It is about 4-10 times slower than a main memory cache miss.
Perhaps you should think about what memory bandwidth is needed. 32*64+8=16k, some transparent pixel compression => 8k per image. The game has 10000 of them => 80 MB, ok.
1024*768*32/8=3MB. Desireable 25 frames per second + double refresh (as written above) => 25*3*2=150 MB/s. This does not sound too bad for a computer board whose theoretical limit is around 800MB/s or higher. But unfourtunately about half of these are cache misses: when an image is drawn into the back buffer. And the maximum bandwidth for unbuffered main memory access is around 50MB/s and has not increased much in the last years.
(Usually 16Bytes are loaded, depends very much on memory architechture and organisation. There is a program from german c't, where you can also check the memory access speed beyond level 2 caches.)
Anyhow, the above calculation does not take into account the use of a list of regions, which do not need an update and so on. So with clever programming, the actual amount might be only lower.
In Simutrans, which uses such strategy, once I put the back buffer also on the video card. The advantage of faster switching between buffer was completely aten up by the bandwidth increase. It ran about 4x slower than in main memory. Copying 16Bit data to a 32Bit screen takes some time (1ms/frame) but is done in hardware on a decent card. Apparently it took less time than it took to copy the higher amount of memory.
And 16Bit still make the use of a palette possible. This way a darkening/company color replace of images could be calculated very fast.
The downside is, that of course some color transitions will look less smooth. But comparing this with the current OTTD graphics ...