Last version: Git 043131c
Changelog for g043131c
Original description of the patch and "The Gory Details (TM)" could be found here.
What's this patch about in short? Take a look (pics are clickable):
Known 32bpp-anim-aa blitter versions:
- 2012/08/15 - Git 043131c
- 2012/08/03 #1 - Git 1ce8aac8
- 2012/08/03 #0 - Git 8fbe2f58
- 2012/08/01 - Git 6a63343
- Versions from 6a63343 and up to 043131c.
You could configure blitter through the openttd.cfg file.
To enable usage of the blitter put a line containinginto the [misc] section of the openttd.cfg file. Make sure you've got only one line starting with "blitter = " in this section.
Code: Select all
blitter = "32bpp-anim-aa"
You could change three parameters to tweak the blitter for your needs:
Anti-aliasing level for all sprite pixels except for palette-animated ones.
It is a main thumb to tweak. Higher level means higher quality but worse performance. It could be set by adding/changing the following line in the [misc] section of the openttd.cfg file:Instead of 4 you could put any number you want chosen from 1, 2, 4, 8, 16 or 32. Putting any other number would have the same effect as putting nearest lower number from the above sequence.
Code: Select all
blitter-32bpp-aa-level = 4
Anti-aliasing level for palette-animated pixels.
It is a secondary thumb to tweak. Higher level means higher quality but worse performance. It could be set by adding/changing the following line in the [misc] section of the openttd.cfg file:Instead of 16 you could put any number you want between 1 and (blitter-32bpp-aa-level * blitter-32bpp-aa-level). Putting any number higher than the squared value of the blitter-32bpp-aa-level would have the same effect as putting squared value of the blitter-32bpp-aa-level. Performance loss related to this one could be even higher than due to setting blitter-32bpp-aa-level too high depending on the screen/window resolution you play the game with, on the amount of palette-animated pixels that are visible at the given moment (a lot of palette-animated water coupled with high value for blitter-32bpp-aa-anim-slots => extremely low performance) and on whether you have multi-core CPU and enable multithreaded palette-animate by using the setting described next.
Code: Select all
blitter-32bpp-aa-anim-slots = 16
Amount of threads to use for updating palette-animated pixels.
It could be set by adding/changing the following line in the [misc] section of the openttd.cfg file:Setting it to "-1" or "1" would disable usage of the threaded palette-animation. Setting it to "0" would instruct the blitter to try to determine the amount of cores your CPU has and use amount of threads that would suit best (two for dual-core CPU, 8xCPU cores for multicore CPU, no threads for single core CPU). Setting it to any positive integer number between 1 and 127 would instruct the blitter to use that amount of threads. If the blitter was instructed to auto-detect best amount of threads to use and failed to do so for some reason (it may fail on some platforms in rare cases) it would fall back into using "safe" default of 2 threads. If the threading is not available on the target platform or in case threads creation process fails for some reason (for example due to requesting too many threads to be handled on your platform) blitter tries to fall back into using non-threaded mode.
Code: Select all
blitter-32bpp-aa-anim-threads = 8
- Performance isn't stunning even with not so high AA levels like 2x or 4x when used with bare 8bpp GFX baseset. Easiest way to "fix" it is to install some GRFs that supply unmasked 32bpp sprites for tiles. Good idea would be to use "Ben Robbins Fields Ground with lines" and "Ben Robbins Ground with lines" GRFs or older "32bpp megapack" compiled into NewGRF. Using zBase would also do but you should expect visual glitches (as of zBase r123) due to the a small but nasty bug in this quickly emerging baseset.
- Blitter could be incompatible with some 8bpp GRFs that use so-called recolour sprites for the palette ranges other than used by the original DOS/WIN or OpenGFX basesets. I hadn't seen any GRF to hit this bug while been doing "in-house" play testing of the blitter but it doesn't mean that such "incompatible" GRF does not exist in the entire universe.
- Rendering produced by the blitter isn't as good as would be produced by real SSAA approach due to some tricks used to eliminate glitches that are warranted to happen if not targeted by these tricks. It could be mitigated by hand-crafting all sprites with yet another "special trick" like it was done for radio tower and oil refinery flame torch tower in the NewGRF attached to the second post in this forum thread.
- In g043131c:
- Threaded and even non-threaded palette-animation caused excess lag in mouse cursor updates. Palette-animation threading in the form that it was implemented in the older releases of the patch was wrong and extremely inefficient. But that wasn't the worst thing related to palette-animation in 32bpp-anim-aa blitter. Main flaw was that palette animation wasn't optimized to be fast enough even for "no antialiasing" case. Both problems were covered and with version g043131c it should be possible to perfectly play with 4 ANIM SLOTS without any performance-related problems on any dual-core CPU from the last 5 years (excluding netbook-targeted CPUs like Intel Atom and AMD Cxx/Exxx/Zxx). With fast multicore CPU you could use higher number of anim slots (16 anim slots had been tested to work with sufficiently well on 8 "core" AMD FX 8120 CPU) to gain additional increase in render quality.
- Do not use GCC-specific array allocation on stack that prevented blitter to be successfully compiled by MSVC. It should be noted that I do not use MSVC and I do not test blitter with it so it is unknown if it is possible to compile blitter with MSVC with this fixes in place. Reports are welcome.
- A lot of other fixes here and there to make this blitter release "the best blitter release ever (TM)".
- In g1ce8aac8:
- Threading-related deadlocks with pthreads (affected platforms are: linux, freebsd).
- Division by zero (and eventual OTTD crash) when applying "transparency" effect to the palette animation buffer with the source pixel being fully transparent.
Affected: unknown number of GRFs, was spotted when trying to play with zBase as a baseset without any other active GRFs and trying to enable "transparency" for trees.
Note: current fix is actually a workaround for another possible bug in Encode() that still waiting to be pinpointed and - if it is really a bug - fixed.
- Fixed typo in one comparison to make it work as was initially intended. Typo caused comparison to always evaluate to "true" possibly making blitter a bit more slower that it is now with this typo fixed. Don't expect "magic hyperboost" though, difference should be really minor.
- In g8fbe2f58:
- Patch updated to be compatible with both vanilla 1.2.1 sources and current trunk (as of r24450)
- Add a mode that would allow for faster AA performance when blitting sprites in BM_NORMAL and BM_TRANSPARENT mode at the expense of the memory usage. Would greatly help for the use case when user have a "bare" 8bpp baseset without any additional 32bpp GRFs supplying "basement" sprites (I could bet that most of the currently played installations of the OpenTTD fell into this category).
- Profile and optimize blitter even more. A lot had already been done on this front. I've spent about three weeks finding bottlenecks and then optimizing (sometimes - by means of a totall rewrite) palette animation and threading the code but there's still more to be done. ATM I'm working on extracting a subset of the OTTD codebase that would allow me to create an "isolated environment" where I could reliably benchmark Draw() and Encode() performance. I beleive that there's a huge field for optimizing there.
- Try to implement multithreaded Draw() rotine and check if it helps to gain some more speed for "8xAA + 64 anim AA slots" and higher cases. I suspect that benefits would be less than the time spent on ITC and context switches but who knows? No test - no gain.
- When and if multithreaded Draw() implementation would be tried: give compiler and platform-specific lockless threads ITC a chance, i.e. try to use atomic increments/decrements + spinlocks for synchronizing threads instead of relying on the OS and threading lib to do locks and signaling. I had already tried it on Win32/64 and on linux/pthreads when I've been reimplementing threading for palette animation and it had proved to be beneficial over using ITC through OS/pthreads services for cases when the synchronizing period is less than the thread execution slice (~10ms on Win32/64, varries greatly on the linux depending on the system load and kernel process scheduler). Downside is the CPU hogging due to using spinlocks and possibility to deadlock if threads affinity is changed in the unexpected way by the third party (taskset, windows task manager, e.t.c.).
- I'm thinking about hacking in yet another anti-aliased blitter implementation which would use real SSAA instead of SBAA approach. Don't know if I would ever try to really implement it though.
Testing is needed.
Suggestions are welcome.
Bug reports are expected to be filed as replies to this thread. PMs would serve as well for this purpose.
Thanks for spending your time reading this and trying this blitter.