zBase (32bpp base set by Zephyris)

Yexo · Post by **Yexo** » 21 Aug 2012 16:48

LeXa2 wrote:Upd. Yet another question to devteam (I know it's offtop here but as we're anyways discussing zBase-related offtop...): is the grfcodec's issue tracker located here - http://dev.openttdcoop.org/projects/grf ... query_id=5 - a correct place to post grfcodec's bug reports to? As I posted a report there days ago and it doesn't seems that there'd been any reaction (like comment or issue assinment to one of the devs, e.t.c.) up to now.

Feeling better now I set it to "confirmed"? I'll take a better look later. But yes, that's the correct place

Zephyris · Post by **Zephyris** » 22 Aug 2012 08:23

don't want to sound rude nor do I find it uninteresting. But I'm afraid it starts to become tl;dr to make a good decision without re-reading many lengthy postings all again. And without a concise summary of your suggested changes, the current implementation and the implications of your changes, especially also for example wrt speed impacts on both an empty game and maybe coop game 200 or 201..

If the suggested changes to recolouring will take a long time to get fully tested for performance impact that starts to risk impacting people making graphics...I vote for a divisor of 128...

Fanda666 · Post by **Fanda666** » 23 Aug 2012 11:36

Halo ) Is this a bug or is this normal?

Zephyris · Post by **Zephyris** » 23 Aug 2012 13:32

That is a bug... Thanks for spotting it, I will have a think about how I might be able to solve it.

Bad_Brett · Post by **Bad_Brett** » 23 Aug 2012 19:43

Nice project! Man, you work fast (or maybe I'm slow?). Anyway, great work!

Lord Aro · Post by **Lord Aro** » 23 Aug 2012 21:23

Bad_Brett wrote:Man, you work fast (or maybe I'm slow?).

Meet Zephyris, the fastest drawer in the West!

Bad_Brett · Post by **Bad_Brett** » 23 Aug 2012 21:32

Lord Aro wrote:Meet Zephyris, the fastest drawer in the West!

I think I'll include that comment somewhere as an easter egg.

Zephyris · Post by **Zephyris** » 23 Aug 2012 21:44

Love it! Unfortunately it won't be so fast for the next couple of weeks :s See attached...

Bad_Brett · Post by **Bad_Brett** » 23 Aug 2012 21:57

Zephyris wrote:Love it! Unfortunately it won't be so fast for the next couple of weeks :s See attached...

Ouch! What happened?

Arie- · Post by **Arie-** » 24 Aug 2012 08:56

I searched this thread and found making good trees apparently is a difficult job. I still have one suggestion though, make the trees a little bit bigger. Why? Because when comparing 8bpp and zBase I find the zBase doesn't give the feeling of a dense forest when lots of trees are at one spot, 8bpp does give this experience. I'm sorry I have to make this post in two, but I've got 4 screens to illustrate this.

Arie- · Post by **Arie-** » 24 Aug 2012 08:57

And post two.

Zephyris · Post by **Zephyris** » 24 Aug 2012 10:48

I find the zBase doesn't give the feeling of a dense forest when lots of trees are at one spot, 8bpp does give this experience.

I see what you mean, this should be easy to improve

Bad_Brett wrote:Ouch! What happened?

Ironically enough it was a tree; cutting a big branch down and my finger got crunched.

SkeedR · Post by **SkeedR** » 24 Aug 2012 13:32

Just want to let you know that I'm loving your work Zephyris. I hope your hand is back in action soon too

LeXa2 · Post by **LeXa2** » 24 Aug 2012 18:29

Yexo wrote:
LeXa2 wrote:Upd. Yet another question to devteam (I know it's offtop here but as we're anyways discussing zBase-related offtop...): is the grfcodec's issue tracker located here - http://dev.openttdcoop.org/projects/grf ... query_id=5 - a correct place to post grfcodec's bug reports to? As I posted a report there days ago and it doesn't seems that there'd been any reaction (like comment or issue assinment to one of the devs, e.t.c.) up to now.
Feeling better now I set it to "confirmed"? I'll take a better look later. But yes, that's the correct place

Sure it feels better now, thanks

. Knowing that a report had been sent to a proper place is always good.

Well, back to CC recolouring mess. As always it took twice as long to implement/test/benchmark/optimize comparing to the time I've been originally expecting for this work to be done. All that IRL/laziness/tonns of math transformations to be done by hand to derive optimized formulas for monotonic cubic spline interpolation/e.t.c.

Here are results.

1. Linear interpolated implementation:

Code: Select all

	static const int DEFAULT_BRIGHTNESS = 128;
	static inline uint32 AdjustBrightness(uint32 colour, uint8 brightness)
	{
		/* Shortcut for normal brightness */
		if (brightness == DEFAULT_BRIGHTNESS) return colour;
		if (brightness == 255) return 0xFFFFFFFF;

		Colour tmp = { colour };

		if (brightness < DEFAULT_BRIGHTNESS) {
			int nom = 256 * brightness / DEFAULT_BRIGHTNESS;
			tmp.r = (nom * tmp.r) >> 8;
			tmp.g = (nom * tmp.g) >> 8;
			tmp.b = (nom * tmp.b) >> 8;
		} else {
			int nom = 256 * (brightness - DEFAULT_BRIGHTNESS) / (255 - DEFAULT_BRIGHTNESS);
			tmp.r += ((255 - tmp.r) * nom) >> 8;
			tmp.g += ((255 - tmp.g) * nom) >> 8;
			tmp.b += ((255 - tmp.b) * nom) >> 8;
		}

		tmp.a = 255;
		return tmp.data;
	}

Speed: is faster than the AdjustBrightness() implementation in current trunk. I've done benchmarks on my PC equipped with AMD FX 8210 CPU for "the worst case imaginable": full screen @1650x1080 resolution covered with 32bpp masked pixels belonging to palette animate range with brightness non equal to DEFAULT_BRIGHTNESS, which effectively would force engine to redraw entire screen ~33 times per second with 32bpp_anim blitter. I know that this case is totally unrealistic but without any doubts is "the worst case". With AdjustBrightness implementation from current trunk PaletteAnimate() throughput is ~70-90MPix/s (varying from run to run due to randomness of the input dataset used for benchmarking) allowing for 40-55 FPS max. Linear-interpolated version performs faster at ~85-110MPix/s allowing for 50-75 FPS max.

Visuals:

2012-08-24-LinearCC-Compare.png: (1.57 MiB) Downloaded 3 times

2. Cubic interpolated implementation:

Code: Select all

	static const int DEFAULT_BRIGHTNESS = 128;

	static inline uint8 CubicInterpolate(uint8 brightness, uint8 y)
	{
		int32 l = y;
#define x DEFAULT_BRIGHTNESS
		if (brightness <= x) {
			if ((y >= (x * 255 / (5 * 255 - 4 * x))) && (y <= (5 * x * 255 / (255 + 4 * x)))) {
				l = 1 + ((((((brightness - x) * brightness) / x) * (255 * (x - y)) + (2 * x * (255 - x) * y)) / x) * brightness) / (2 * x * x);
			} else {
				if (y <= x) {
					l = 1 + (brightness * ((y * (2 * ((brightness * (brightness - x)) / x) + x)) / x)) / x;
				} else {
					l = 1 + (brightness * ((((3 * 255 * x - y * (2 * x + 255)) * (brightness * (brightness - x)) / x) + y * x * x) / x)) / (x * x);
				}
			}
		} else {
			int32 p = brightness - x;
			if ((y >= (x * 255 / (5 * 255 - 4 * x))) && (y <= (5 * x * 255 / (255 + 4 * x)))) {
				l = y + (p * ((255 * (y - x) * ((p * (p - 2 * (255 - x))) / (255 - x)) + (255 - x) * ((255 - 2 * x) * y + 255 * x)) / (255 - x))) / (2 * (255 - x) * (255 - x));
			} else {
				if (y > x) {
					l = y + (p * (((255 - y) * (2 * (((p - 2 * (255 - x)) * p) / (255 - x)) + 3 * (255 - x))) / (255 - x))) / (255 - x);
				} else {
					l = y + (p * (((y * (3 * 255 - 2 * x) - 255 * x) * ((p * (p - 2 * (255 - x))) / (255 - x)) + y * 3 * (255 - x) * (255 - x)) / (255 - x))) / ((255 - x) * (255 - x));
				}
			}
		}
#undef	x
		return (uint8)l;
	}

	static inline uint32 AdjustBrightness(uint32 colour, uint8 brightness)
	{
		/* Shortcut for normal brightness */
		if (brightness == DEFAULT_BRIGHTNESS) return colour;
		if (brightness == 255) return 0xFFFFFFFF;
		if (brightness == 0) return 0;

		Colour tmp = { colour };
		tmp.r = Blitter_32bppBase::CubicInterpolate(brightness, tmp.r);
		tmp.g = Blitter_32bppBase::CubicInterpolate(brightness, tmp.g);
		tmp.b = Blitter_32bppBase::CubicInterpolate(brightness, tmp.b);
		tmp.a = 255;
		return tmp.data;
	}

Speed: 22-35MPix/s allowing for 12-20 FPS for the same test case described earlier - i.e. is 2x slower than implementation from current trunk and is ~3x slower than linear-interpolated implementation. This one being slower compared to simpler approaches isn't a surprise really and TBH having it performing at ~15FPS for unrealistic "worst case that won't happen ever" means that it is generally suitable for normal use - one would never get the entire screen filled with the masked 32bpp pixels with non-default brightness in the real gameplay session. Is it OK to use it instead of simplier linear-interpolated approach is an open question. IMhO it's important to have better visuals as long as speed is in OK range but that's just IMHO.

Visuals:

2012-08-24-Linear-vs-Cubic-CC-Compare.png: (590.53 KiB) Downloaded 3 times

3. RGB->HSL->Interpolate->RGB approach

This one is here just for completeness sake and it is what I was thinking about when been posting the original proposal. It had turned out that scaling L value in HSL model do not produce the expected visual results and due to the non-linear nature of HSL<->RGB conversion it's hard to implement it with sufficient precision in pure integer math. Problems with this approach are clearly visible at the bottom part of the above image - one could easily spot banding and colour tone artefacts making this approach unreasonable to use for CC purposes.

Code below is non-optimized PoC I've used to test if this approach works. It performs slowly - ~1.5-2x times slower compared to RGB Cubic interpolated recolouring. I see no point on spending time optimizing it as visual results won't be pleasant making this work useless.

Code: Select all

	static const int DEFAULT_BRIGHTNESS = 128;

	static inline uint8 CubicInterpolate(uint8 brightness, uint8 y)
	{
		int32 l = y;
#define x DEFAULT_BRIGHTNESS
		if (brightness <= x) {
			if ((y >= (x * 255 / (5 * 255 - 4 * x))) && (y <= (5 * x * 255 / (255 + 4 * x)))) {
				l = 1 + ((((((brightness - x) * brightness) / x) * (255 * (x - y)) + (2 * x * (255 - x) * y)) / x) * brightness) / (2 * x * x);
			} else {
				if (y <= x) {
					l = 1 + (brightness * ((y * (2 * ((brightness * (brightness - x)) / x) + x)) / x)) / x;
				} else {
					l = 1 + (brightness * ((((3 * 255 * x - y * (2 * x + 255)) * (brightness * (brightness - x)) / x) + y * x * x) / x)) / (x * x);
				}
			}
		} else {
			int32 p = brightness - x;
			if ((y >= (x * 255 / (5 * 255 - 4 * x))) && (y <= (5 * x * 255 / (255 + 4 * x)))) {
				l = y + (p * ((255 * (y - x) * ((p * (p - 2 * (255 - x))) / (255 - x)) + (255 - x) * ((255 - 2 * x) * y + 255 * x)) / (255 - x))) / (2 * (255 - x) * (255 - x));
			} else {
				if (y > x) {
					l = y + (p * (((255 - y) * (2 * (((p - 2 * (255 - x)) * p) / (255 - x)) + 3 * (255 - x))) / (255 - x))) / (255 - x);
				} else {
					l = y + (p * (((y * (3 * 255 - 2 * x) - 255 * x) * ((p * (p - 2 * (255 - x))) / (255 - x)) + y * 3 * (255 - x) * (255 - x)) / (255 - x))) / ((255 - x) * (255 - x));
				}
			}
		}
#undef	x
		return (uint8)l;
	}

	static inline uint32 AdjustBrightness(uint32 colour, uint8 brightness)
	{
		/* Shortcut for normal brightness */
		if (brightness == DEFAULT_BRIGHTNESS) return colour;
		if (brightness == 255) return 0xFFFFFFFF;
		if (brightness == 0) return 0;

		Colour tmp = { colour };

		int32 cmax = max(max(tmp.r, tmp.g), tmp.b);
		int32 cmin = min(min(tmp.r, tmp.g), tmp.b);
		int32 cr = cmax - cmin;
		int32 l = (cmax + cmin);
		int32 h = (cr == 0) ? 0 : ((cmax == tmp.r) ? ((720000 + (120000 * (tmp.g - tmp.b) / cr)) % 720000) : (
				(cmax == tmp.g) ? (240000 + 120000 * (tmp.b - tmp.r) / cr) : (480000 + 120000 * (tmp.r - tmp.g) / cr)));
		int32 s = (cr == 0) ? 0 : 1280000 * cr / (256 - abs(l - 255)); /* Scaled up by 5000 */

		l = Blitter_32bppBase::CubicInterpolate(brightness, l >> 1);

		cr = s * (129 - abs(l - 128)) / 64; /* Scaled up by 10000 */
		cmin = 10000 * l - cr / 2 + 5000; /* Must be (L - 0.5f*CR). We scale up to do divide as late as possible and add 1*Scale/2 to force rounding up. */
		cmax = cr * ((120000 - abs((h % 240000) - 120000)) / 1200) / 100; /* H scale: 120000. CR scale: 10000. Result scale: 10000 * (120000 / 1200) / 100 = 10000 */
		h = (h / 120000) % 6; /* 0 <= h < 6 */
		tmp.r = min(255, max(0, (( ((h & 6) == 2) ? 0 : ((h == 0) || (h == 5) ? cr : cmax)) + cmin) / 10000));
		tmp.g = min(255, max(0, (( (h > 3)        ? 0 : ((h == 1) || (h == 2) ? cr : cmax)) + cmin) / 10000));
		tmp.b = min(255, max(0, (( (h < 2)        ? 0 : ((h == 3) || (h == 4) ? cr : cmax)) + cmin) / 10000));
		tmp.a = 0xFF;

		return tmp.data;
	}

Conclusion
I think that replacing current AdjustBrightness() implementation with the proposed scheme would made a lot sense. If devteam would find speed of a big concern then using Linear-Interpolated approach would speedup things a bit. OTOH AdjustBrightness() in it's slower form (i.e. with brightness != DEFAULT_BRIGHTNESS) is a rarely called thing - it's not typical for the entire screen to be filled with 32bpp masked pixels - thus it is perfectly possible to sacrifice speed to have better visuals. Alternatively it could be made a user-configurable option in openttd.ini as linear-interpolated approach is GRF-wise compatible with cubic-interpolated approach (i.e. GRF that look OK with linear-interpolated recolouring would also look good with cubic-interpolated approach and vise verse).

P.S. In attachment there's a source code for the "speedtest" benchmark I use to test the speed of the PaletteAnimate(). It's an unholy mess of the code but it serves for my needs and isn't really that complicated. Makefile is suited to be used with Cygwin or MSYS but is easily adjustable to be used on Linux or *BSD. One wishing to test the speed of various AjustBrightness() implementations should play with comments on lines 892-894 of the speedtest.cpp. Having only one line uncommented at a time is a good idea, results are reported in form of APS which stands for "palette-Animations-Per-Second" meaning count of times per second the engine would be able to perform PalleteAnimate() for the entire screen if the entire screen.

2012-08-24-OTTD-CCrecolour-speedtest.7z: (11.47 KiB) Downloaded 161 times

Arie- · Post by **Arie-** » 25 Aug 2012 06:57

Zephyris wrote:
I find the zBase doesn't give the feeling of a dense forest when lots of trees are at one spot, 8bpp does give this experience.
I see what you mean, this should be easy to improve

I've taken another look at my screenshots and it might not be the trees causing this. I think it might be caused by the high contrast between the ground colour and colour of the trees. And get well soon!

ArmEagle · Post by **ArmEagle** » 25 Aug 2012 09:09

Arie- wrote:
Zephyris wrote:
I find the zBase doesn't give the feeling of a dense forest when lots of trees are at one spot, 8bpp does give this experience.
I see what you mean, this should be easy to improve
I've taken another look at my screenshots and it might not be the trees causing this. I think it might be caused by the high contrast between the ground colour and colour of the trees. And get well soon!

Also, I'm not sure which trees these are. But do they have the right offset. It could be that some trees are too close to each other, making gaps appear in other places.

Bad_Brett · Post by **Bad_Brett** » 25 Aug 2012 18:06

Zephyris wrote: Ironically enough it was a tree; cutting a big branch down and my finger got crunched.

...And yet some demand bigger trees. The nerve of these people!

Seriously though, hope you get better soon. It's really inspiring that you work on your own replacement set. It gives me new ideas and it's a lot easier and makes it more fun to keep up the pace. By the way, have you decided what vehicles/industries to include? Will it just be a replacement set or will you include more features?

Alberth · Post by **Alberth** » 25 Aug 2012 19:19

Bad_Brett wrote:Will it just be a replacement set or will you include more features?

zbase IS a baseset, there is no room for doing anything else than a replacement.

bokkie · Post by **bokkie** » 25 Aug 2012 19:28

But OpenGFX+ or eGRVTS 2.0 are probably not that much more work when this is finished, are they?

As others have said; get well! To support you, I've printed all the e-mail I got today even though I can read from the screen just fine

. Stupid trees...

Zephyris · Post by **Zephyris** » 26 Aug 2012 09:36

eGRVTS 2.0 are probably not that much more work when this is finish

Some things, bits of opengfx+, would be quite easy. Some things, eGRVTS 2, would be much harder, eGRVTS has many times more sprites than the base set!

Transport Tycoon Forums

zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Re: zBase (32bpp base set by Zephyris)

Who is online